-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-38184: Increase parsl wait time for Princeton site #14
Conversation
@@ -72,6 +72,7 @@ def get_executors(self) -> List[ParslExecutor]: | |||
parallelism=1.0, | |||
worker_init=export_environment(), | |||
launcher=SrunLauncher(overrides="-K0 -k --slurmd-debug=verbose"), | |||
cmd_timeout=300, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this configurable through the config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this? (force pushed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added user-facing documentation.
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## main #14 +/- ##
=======================================
Coverage 70.37% 70.37%
=======================================
Files 3 3
Lines 27 27
Branches 6 6
=======================================
Hits 19 19
Misses 6 6
Partials 2 2 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
d82a394
to
98ef119
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Be sure to add this to the docs.
aed9afb
to
2e6cffa
Compare
Class docstrings and a user-facing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add the cmd_timeout
parameter in doc/lsst.ctrl.bps.parsl/use.rst
under Tiger
.
@@ -26,6 +26,7 @@ class Tiger(Slurm): | |||
- ``walltime`` (`str`): time limit for each Slurm job. | |||
- ``mem_per_node`` (`int`): memory per node (GB) for each Slurm job. | |||
- ``max_blocks`` (`int`): maximum number of blocks (Slurm jobs) to use. | |||
- ``cmd_timeout`` (`int`): timeout (seconds) to wait for a scheduler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout (seconds) to wait for Slurm commands.
Why is it an int
rather than float
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cmd_timeout argument in the Slurm Execution Provider class within the Parsl module is currently configured to accept int
s. I'm not sure we can get away with requesting a float
time here in that case.
almost continually until the workflow is done. | ||
|
||
We set the cmd_timeout value to 300 seconds to help avoid | ||
TimeoutExpired errors when the schedulers are slow to reply (often due |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when commands are slow to return
2e6cffa
to
f65c3da
Compare
Updated documentation in |
Checklist
doc/changes