Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38184: Increase parsl wait time for Princeton site #14

Merged
merged 1 commit into from
Mar 1, 2023

Conversation

leeskelvin
Copy link
Contributor

@leeskelvin leeskelvin commented Mar 1, 2023

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@@ -72,6 +72,7 @@ def get_executors(self) -> List[ParslExecutor]:
parallelism=1.0,
worker_init=export_environment(),
launcher=SrunLauncher(overrides="-K0 -k --slurmd-debug=verbose"),
cmd_timeout=300,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this configurable through the config file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this? (force pushed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added user-facing documentation.

@codecov
Copy link

codecov bot commented Mar 1, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (6a9ae97) 70.37% compared to head (98ef119) 70.37%.

❗ Current head 98ef119 differs from pull request most recent head f65c3da. Consider uploading reports for the commit f65c3da to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #14   +/-   ##
=======================================
  Coverage   70.37%   70.37%           
=======================================
  Files           3        3           
  Lines          27       27           
  Branches        6        6           
=======================================
  Hits           19       19           
  Misses          6        6           
  Partials        2        2           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@PaulPrice PaulPrice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Be sure to add this to the docs.

@leeskelvin leeskelvin force-pushed the tickets/DM-38184 branch 2 times, most recently from aed9afb to 2e6cffa Compare March 1, 2023 16:51
@leeskelvin
Copy link
Contributor Author

Class docstrings and a user-facing doc/ feature file have been added. Jenkins currently running.

Copy link
Contributor

@PaulPrice PaulPrice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add the cmd_timeout parameter in doc/lsst.ctrl.bps.parsl/use.rst under Tiger.

@@ -26,6 +26,7 @@ class Tiger(Slurm):
- ``walltime`` (`str`): time limit for each Slurm job.
- ``mem_per_node`` (`int`): memory per node (GB) for each Slurm job.
- ``max_blocks`` (`int`): maximum number of blocks (Slurm jobs) to use.
- ``cmd_timeout`` (`int`): timeout (seconds) to wait for a scheduler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout (seconds) to wait for Slurm commands.

Why is it an int rather than float?

Copy link
Contributor Author

@leeskelvin leeskelvin Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cmd_timeout argument in the Slurm Execution Provider class within the Parsl module is currently configured to accept ints. I'm not sure we can get away with requesting a float time here in that case.

almost continually until the workflow is done.

We set the cmd_timeout value to 300 seconds to help avoid
TimeoutExpired errors when the schedulers are slow to reply (often due
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when commands are slow to return

@leeskelvin
Copy link
Contributor Author

Updated documentation in doc/lsst.ctrl.bps.parsl/use.rst.

@leeskelvin leeskelvin merged commit 5198779 into main Mar 1, 2023
@leeskelvin leeskelvin deleted the tickets/DM-38184 branch March 1, 2023 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants