Skip to content

Isolate test_files.py outputs per test to fix parallel-run failures#402

Merged
A-CGray merged 2 commits into
mainfrom
fix/parallel-test-files-isolation
May 20, 2026
Merged

Isolate test_files.py outputs per test to fix parallel-run failures#402
A-CGray merged 2 commits into
mainfrom
fix/parallel-test-files-isolation

Conversation

@A-CGray
Copy link
Copy Markdown
Member

@A-CGray A-CGray commented May 13, 2026

Summary

  • Root cause: TestSolutionFileNamesUnsteady used assertSetEqual against the shared tests/output_files/ directory. When testflo -n 2 runs tests concurrently, one test's cleanup (_cleanup_output_files) deleted files that another test had just written, and vice versa — causing spurious set-equality failures.
  • Why Azure didn't see it: .github/test_real.sh forces N_TEST=1 on Azure Pipelines, so tests always ran sequentially there.
  • Each test now writes into its own subdirectory of tests/output_files/ named after the full test ID (via unittest.TestCase.id()), eliminating all cross-test file pollution.
  • Introduced _OutputFilesTestBase shared by both test classes, with assertOutputFiles(expected) using set-equality throughout — replacing the ad-hoc per-file assertTrue(os.path.isfile(...)) checks in TestSolutionFileNames.
  • Added a KEEP_OUTPUT_FILES = False flag at the top of test_files.py; set it to True to keep output subdirectories on disk after a run for manual inspection.

Test plan

  • testflo -v unit_tests/test_files.py (sequential) — all 6 pass
  • testflo -v -n 2 unit_tests/test_files.py (parallel) — all 6 pass

🤖 Generated with Claude Code

A-CGray and others added 2 commits May 13, 2026 09:26
Each test now writes to its own subdirectory of tests/output_files/
(named after the full test ID), preventing parallel test runs from
polluting each other's file sets. Introduce _OutputFilesTestBase with
shared setUp/tearDown and assertOutputFiles(expected) so both test
classes use set-equality assertions rather than ad-hoc per-file checks.
This fixes test failures seen with testflo -n 2 that did not occur
on Azure CI (which forces N_TEST=1).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Setting KEEP_OUTPUT_FILES = True at the top of test_files.py skips
tearDown cleanup so that output subdirectories and their .cgns/.plt
files remain on disk after the test run for manual inspection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@A-CGray A-CGray requested a review from a team as a code owner May 13, 2026 13:50
@A-CGray A-CGray requested review from anilyil and sanjan98 May 13, 2026 13:50
@A-CGray
Copy link
Copy Markdown
Member Author

A-CGray commented May 13, 2026

For future reference, here is the prompt I gave to claude to generate this fix and PR:

/plan Recently we have been seeing the following test failures:

(mpi 1) /home/***/repos/adflow/tests/unit_tests/test_files.py:TestSolutionFileNamesUnsteady_0_unsteady_three_digits.test_unsteady_file_names_large_timestep  ... FAIL (00:00:0.73, 224 MB)
Traceback (most recent call last):
  File "/home/***/repos/adflow/tests/unit_tests/test_files.py", line 177, in test_unsteady_file_names_large_timestep
    self.assertSetEqual(new_files, self._expected_large_timestep_output_files(ap.name))
AssertionError: Items in the first set but not the second:
'mdo_tutorial_000_surf_Timestep0002.cgns'
'mdo_tutorial_000_surf_Timestep0002.plt'
'mdo_tutorial_000_vol_Timestep0002.cgns'

(mpi 1) /home/***/repos/adflow/tests/unit_tests/test_files.py:TestSolutionFileNamesUnsteady_0_unsteady_three_digits.test_unsteady_file_names  ... FAIL (00:00:3.16, 317 MB)
Traceback (most recent call last):
  File "/home/***/repos/adflow/tests/unit_tests/test_files.py", line 159, in test_unsteady_file_names
    self.assertSetEqual(new_files, self._expected_output_files(ap.name))
AssertionError: Items in the second set but not the first:
'mdo_tutorial_000_vol_Timestep0002.cgns'
'mdo_tutorial_000_vol_Timestep0001.cgns'
'mdo_tutorial_000_surf_Timestep0001.plt'
'mdo_tutorial_000_surf_Timestep0001.cgns'
'mdo_tutorial_000_vol_Timestep0000.cgns'

These failures don't occur when we run tests on Azure, but do occur when we run tests on our build bots. If you look at the .github/test_real.sh script you'll see that we run tests sequentially on azure but in parallel elsewhere which I think is the cause of the issue. These two failing tests do very specific checks for the files they expect to have written that fail not only if any of the expected files are missing, but also if any other files are present. My theory therefore is that when the tests are run in parallel they both write/cleanup their output files in the same place which messes up the checks.

My proposed solution would be that, for all the tests related to writing output files, we create a subdirectory inside the output_files directory with the same name as the test, and write the files from that test there. Also, I want you to standardise the method for defining the expected test file names and cleaning them up across all the tests in test_files.py, right now it is handled differently in TestSolutionFileNames and TestSolutionFileNamesUnsteady, I'll let you decide which one best matches python best practices. Finally, I would like you to add a flag in that test file that enables and disables the deletion of the output files because often when debugging you want to be able to open those files after the tests have run.

Make all these changes in a branch created off of the mdolab main branch, make atomic commits as you make changes, then open a PR to the mdolab repo once you're done.

To test your changes work, make sure you run testflo -n 2 so that the tests are run in parallel.

@A-CGray A-CGray requested review from sabakhshi and removed request for anilyil May 13, 2026 13:53
@A-CGray A-CGray merged commit 8155e98 into main May 20, 2026
13 checks passed
@A-CGray A-CGray deleted the fix/parallel-test-files-isolation branch May 20, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants