-
Notifications
You must be signed in to change notification settings - Fork 3
SlurmClusterExecutor - create a separate working directory for each job #698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Even when the user does not define a specific working directory, otherwise the submission script and the error log gets overwritten.
|
Warning Rate limit exceeded@jan-janssen has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 0 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThe logic for setting the working directory when submitting a job has been updated. Instead of always using the cache directory, the code now derives a subdirectory name from the command's last argument, creates that subdirectory under the cache directory if it doesn't exist, and sets it as the working directory. Additionally, a test was added to verify executor behavior when the Changes
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #698 +/- ##
==========================================
+ Coverage 96.81% 96.89% +0.08%
==========================================
Files 29 29
Lines 1317 1319 +2
==========================================
+ Hits 1275 1278 +3
+ Misses 42 41 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
executorlib/task_scheduler/file/queue_spawner.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (16)
- GitHub Check: notebooks
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
- GitHub Check: unittest_win
- GitHub Check: unittest_flux_mpich
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
- GitHub Check: unittest_old
- GitHub Check: unittest_flux_openmpi
- GitHub Check: unittest_openmpi (macos-latest, 3.13)
- GitHub Check: notebooks_integration
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
- GitHub Check: unittest_mpich (macos-latest, 3.13)
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.12)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.13)
| folder = command[-1].split("_i.h5")[0] | ||
| cwd = os.path.join(cache_directory, folder) | ||
| os.makedirs(cwd, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add input validation and improve robustness of folder name extraction.
The current implementation has several potential issues:
- Missing input validation: No check if
commandis empty, which would causeIndexErroroncommand[-1] - Brittle string parsing: The code assumes
command[-1]contains "_i.h5", but if it doesn't, the entire string becomes the folder name - Path security: The folder name comes from user input and could contain path traversal characters like "../"
Consider this more robust implementation:
- folder = command[-1].split("_i.h5")[0]
- cwd = os.path.join(cache_directory, folder)
- os.makedirs(cwd, exist_ok=True)
+ if not command:
+ raise ValueError("Command list cannot be empty")
+
+ # Extract folder name safely
+ last_arg = command[-1]
+ if "_i.h5" in last_arg:
+ folder = last_arg.split("_i.h5")[0]
+ else:
+ # Fallback: use basename without extension
+ folder = os.path.splitext(os.path.basename(last_arg))[0]
+
+ # Sanitize folder name to prevent path traversal
+ folder = os.path.basename(folder) # Remove any path components
+ if not folder or folder in (".", ".."):
+ folder = "default_job"
+
+ cwd = os.path.join(cache_directory, folder)
+ os.makedirs(cwd, exist_ok=True)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| folder = command[-1].split("_i.h5")[0] | |
| cwd = os.path.join(cache_directory, folder) | |
| os.makedirs(cwd, exist_ok=True) | |
| if not command: | |
| raise ValueError("Command list cannot be empty") | |
| # Extract folder name safely | |
| last_arg = command[-1] | |
| if "_i.h5" in last_arg: | |
| folder = last_arg.split("_i.h5")[0] | |
| else: | |
| # Fallback: use basename without extension | |
| folder = os.path.splitext(os.path.basename(last_arg))[0] | |
| # Sanitize folder name to prevent path traversal | |
| folder = os.path.basename(folder) # Remove any path components | |
| if not folder or folder in (".", ".."): | |
| folder = "default_job" | |
| cwd = os.path.join(cache_directory, folder) | |
| os.makedirs(cwd, exist_ok=True) |
🤖 Prompt for AI Agents
In executorlib/task_scheduler/file/queue_spawner.py around lines 53 to 55, add
validation to ensure the command list is not empty before accessing command[-1]
to prevent IndexError. Check if the last element contains the expected "_i.h5"
substring before splitting; if not, handle it gracefully or raise an error.
Sanitize the extracted folder name to remove or neutralize any path traversal
characters like "../" to avoid security risks. Use a safe method to construct
the folder path and then create the directory with os.makedirs as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
executorlib/task_scheduler/file/queue_spawner.py(2 hunks)tests/test_fluxclusterexecutor.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- executorlib/task_scheduler/file/queue_spawner.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/test_fluxclusterexecutor.py (3)
executorlib/executor/flux.py (1)
FluxClusterExecutor(209-387)executorlib/standalone/serialize.py (1)
cloudpickle_register(9-28)tests/test_cache_fileexecutor_mpi.py (1)
mpi_funct(20-25)
⏰ Context from checks skipped due to timeout of 90000ms (14)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
- GitHub Check: unittest_flux_mpich
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
- GitHub Check: unittest_openmpi (macos-latest, 3.13)
- GitHub Check: unittest_win
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.12)
- GitHub Check: unittest_mpich (macos-latest, 3.13)
- GitHub Check: unittest_flux_openmpi
- GitHub Check: notebooks_integration
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
- GitHub Check: notebooks
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
🔇 Additional comments (1)
tests/test_fluxclusterexecutor.py (1)
43-45: Changes to existing test look good.The modifications to the existing test method appear to be formatting or annotation changes and don't affect the test logic.
| def test_executor_no_cwd(self): | ||
| with FluxClusterExecutor( | ||
| resource_dict={"cores": 2}, | ||
| block_allocation=False, | ||
| cache_directory="executorlib_cache", | ||
| ) as exe: | ||
| cloudpickle_register(ind=1) | ||
| fs1 = exe.submit(mpi_funct, 1) | ||
| self.assertFalse(fs1.done()) | ||
| self.assertEqual(len(os.listdir("executorlib_cache")), 2) | ||
| self.assertEqual(fs1.result(), [(1, 2, 0), (1, 2, 1)]) | ||
| self.assertTrue(fs1.done()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Test may not correctly validate the separate working directory behavior.
The new test method is good but has a few concerns:
-
Directory structure validation: Both tests check
len(os.listdir("executorlib_cache")) == 2, but if the new logic creates subdirectories whencwdis not specified, shouldn't the directory structure be different between the two tests? -
Single job limitation: The test only submits one job, but the PR objective mentions creating "separate working directories for each job". Consider adding a test that submits multiple jobs to verify they get separate directories.
Let me verify the expected behavior by checking the implementation:
Consider adding a test that submits multiple jobs to better validate the separate directory functionality:
+ def test_executor_multiple_jobs_no_cwd(self):
+ with FluxClusterExecutor(
+ resource_dict={"cores": 2},
+ block_allocation=False,
+ cache_directory="executorlib_cache",
+ ) as exe:
+ cloudpickle_register(ind=1)
+ fs1 = exe.submit(mpi_funct, 1)
+ fs2 = exe.submit(mpi_funct, 2)
+
+ # Verify both jobs complete successfully
+ self.assertEqual(fs1.result(), [(1, 2, 0), (1, 2, 1)])
+ self.assertEqual(fs2.result(), [(2, 2, 0), (2, 2, 1)])
+
+ # Verify separate directories were created
+ cache_contents = os.listdir("executorlib_cache")
+ self.assertGreater(len(cache_contents), 2, "Expected separate directories for each job")🏁 Script executed:
#!/bin/bash
# Description: Check how the working directory logic works when cwd is not specified
# Expected: Find the logic that creates subdirectories based on command arguments
ast-grep --pattern 'def execute_with_pysqa($$$) {
$$$
}'
# Also check for any directory creation logic
rg -A 10 -B 5 "cache_directory.*cwd|cwd.*cache_directory"Length of output: 2974
Enhance directory‐structure assertions and add multi‐job coverage
The queue_spawner now creates a separate subdirectory under executorlib_cache for each task when no cwd is specified, whereas with an explicit cwd all output lands directly in that directory. The existing test_executor_no_cwd only checks that there are two entries, but doesn’t verify they are directories—and the “with cwd” test likewise only checks the count, not that they’re files. Also, we should submit more than one job to ensure each job still gets its own workspace.
Please update tests/test_fluxclusterexecutor.py around lines 47–58 as follows:
• In test_executor_no_cwd, after listing executorlib_cache, assert that each entry is a directory:
entries = os.listdir("executorlib_cache")
self.assertEqual(len(entries), 2)
for e in entries:
self.assertTrue(
os.path.isdir(os.path.join("executorlib_cache", e)),
f"Expected '{e}' to be a subdirectory for a task"
)• In the corresponding “with cwd” test, assert that entries are files, not directories:
entries = os.listdir("executorlib_cache")
self.assertEqual(len(entries), 2)
for e in entries:
self.assertTrue(
os.path.isfile(os.path.join("executorlib_cache", e)),
f"Expected '{e}' to be a file in the shared cwd"
)• Add a new test to submit two separate jobs and verify each gets its own subdirectory:
+ def test_executor_multiple_jobs_no_cwd(self):
+ with FluxClusterExecutor(
+ resource_dict={"cores": 2},
+ block_allocation=False,
+ cache_directory="executorlib_cache",
+ ) as exe:
+ cloudpickle_register(ind=1)
+ # Submit two distinct jobs
+ fs1 = exe.submit(mpi_funct, 1)
+ fs2 = exe.submit(mpi_funct, 2)
+
+ # Ensure both complete correctly
+ self.assertEqual(fs1.result(), [(1, 2, 0), (1, 2, 1)])
+ self.assertEqual(fs2.result(), [(2, 2, 0), (2, 2, 1)])
+
+ # Verify separate subdirectories for each task of each job
+ entries = os.listdir("executorlib_cache")
+ # Expect 4 subdirectories: 2 tasks × 2 jobs
+ self.assertEqual(len(entries), 4)
+ for e in entries:
+ path = os.path.join("executorlib_cache", e)
+ self.assertTrue(os.path.isdir(path), f"{path} should be a directory")This will ensure both the per‐task directory logic and multi‐job isolation are properly validated.
🤖 Prompt for AI Agents
In tests/test_fluxclusterexecutor.py around lines 47 to 58, enhance the
test_executor_no_cwd by asserting that each entry in the "executorlib_cache"
directory is a subdirectory, not just checking the count. Similarly, in the
corresponding "with cwd" test, assert that entries are files rather than
directories. Additionally, add a new test that submits two separate jobs and
verifies each job gets its own subdirectory under "executorlib_cache" to ensure
multi-job workspace isolation is properly tested.
Even when the user does not define a specific working directory, otherwise the submission script and the error log gets overwritten.
Summary by CodeRabbit