[MNT] Fix race condition in OpenMLSplit tests during parallel execution #1641#1694
[MNT] Fix race condition in OpenMLSplit tests during parallel execution #1641#1694codervinitjangir wants to merge 7 commits intoopenml:mainfrom
Conversation
|
Hi @geetu040 , I have submitted a PR to resolve this race condition using the unique tempfile directory approach we discussed. The CI workflows are currently awaiting approval to run. Could you please trigger them when you have a moment? Thanks! |
geetu040
left a comment
There was a problem hiding this comment.
This issue has already been fixed, but we can still use your PR to further clean this file. Can you please sync with main and apply your changes?
|
Hi @geetu040 , thanks for the update! I'm glad the core issue is resolved. I will sync my branch with main, apply the cleanup changes using the tempfile approach, and update this PR shortly. |
# Conflicts: # tests/test_tasks/test_split.py
|
Hi @geetu040, I accidentally closed the PR! Reopened it now. I have synced the branch with main and applied the tempfile cleanup changes as requested. Let me know if everything looks good! |
| # Splitting not helpful, these test's don't rely on the server and take less | ||
| # than 5 seconds + rebuilding the test would potentially be costly | ||
|
|
||
| def setUp(self): |
There was a problem hiding this comment.
don't change anything here, keep it as it was
tests/test_tasks/test_split.py
Outdated
| except OSError: | ||
| # Replaced bare except. Not sure why these exceptions are acceptable. | ||
| pass | ||
| super().tearDown() |
There was a problem hiding this comment.
| super().tearDown() | |
| def tearDown(self): | |
| self._temp_dir.cleanup() |
There was a problem hiding this comment.
tearDown can simply be written as:
def tearDown(self):
self._temp_dir.cleanup()
tests/test_tasks/test_split.py
Outdated
| def test_eq(self): | ||
| split = OpenMLSplit._from_arff_file(self.arff_filepath) | ||
| assert split == split | ||
| assert split == split # noqa: PLR0124 |
There was a problem hiding this comment.
this line can be removed
| assert split == split # noqa: PLR0124 |
|
Hi @geetu040 , I have applied all the cleanup changes exactly as suggested. The setUp method is back to its original state, tearDown is updated to use self._temp_dir.cleanup(), and the redundant assert has been removed. Let me know if it's good to go! |
| # Splitting not helpful, these test's don't rely on the server and take less | ||
| # than 5 seconds + rebuilding the test would potentially be costly | ||
|
|
||
| def setUp(self): |
|
|
||
| def test_eq(self): | ||
| split = OpenMLSplit._from_arff_file(self.arff_filepath) | ||
| assert split == split |
There was a problem hiding this comment.
sorry about that, I think this should be kept, since it tests eq
geetu040
left a comment
There was a problem hiding this comment.
You haven’t reverted the changes in the setUp function, it should remain as it was.
The previous review comment about this is still unresolved. Please address it and then request another review once it's fixed.
|
Hi @geetu040 , I finally realized my local main branch was out of sync with the upstream repository, which caused those extra modified lines in setUp and test_eq. I have now fetched directly from upstream, synced everything perfectly, and the ONLY change in this PR is the tearDown simplification exactly as you requested. Thanks for your patience! |
Metadata
Reference Issue: Fixes #1641
New Tests Added: No (Fixes existing tests)
Documentation Updated: No
Change Log Entry: Fix race condition in OpenMLSplit tests by using unique temporary directories for parallel execution.
Details
What does this PR implement/fix?
This PR fixes a race condition in OpenMLSplitTest that occurs when tests are run in parallel (e.g., using pytest -n). I updated the setUp method to create a unique temporary directory for each test run using tempfile.mkdtemp() and ensured that the input ARFF file is copied into this isolated path. I also added proper cleanup in tearDown using shutil.rmtree().
Why is this change necessary?
Previously, parallel workers were sharing the same file path for the generated .pkl.py3 cache file. This resulted in intermittent EOFError because one worker would delete the file during tearDown while another worker was still trying to read it.
How can I reproduce the issue this PR is solving and its solution?
The issue can be reproduced by running the following command multiple times:
pytest -n 3 tests/test_tasks/test_split.py
Without this fix, the tests fail intermittently (roughly 1 in 10 runs). With this fix, the tests pass consistently in parallel environments.
Any other comments?
Following the maintainer's suggestion in the issue thread, this approach avoids shared paths entirely by giving each worker its own tempfile directory.