-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HPC execution #192
Conversation
and fix typo in percentage calculation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sending those comments for now, and making a deeper review on what we discussed on slack later. 👍
Codecov Report
@@ Coverage Diff @@
## main #192 +/- ##
==========================================
- Coverage 41.30% 40.06% -1.24%
==========================================
Files 41 42 +1
Lines 2271 2401 +130
==========================================
+ Hits 938 962 +24
- Misses 1333 1439 +106
Continue to review full report at Codecov.
|
Also, modularized functions for better testing. TODO testing if it reproduces what @rvhonorate implemented
A small touch over @rvhonorato implementation. Good work m8! Done: * brings some variables to parameters render things more configurable and testable * leverages what was done in `benchmark` regarding job heading creation. * Creates a small factory for the `Engine` in the modules `_run`. * defines default variables at the module level so they are synchronized in `libworkflow` and `libhpc`
@rvhonorato give a look if you wish. Ready to merge from me. |
I implemented a new
libhpc
to handle the HPC executions (SLURM, but we can add TORQUE later). I tried my best to follow the same design of Brian'slibparallel
.I also implemented
queue_limit
that defines the size of the submission batches andconcat
, similar to what we have in haddock2.4I also added a "terminate signal" that will remove the .jobs from the queue:
As for the
wait
to check if the .jobs have finished I added an "adaptive timer" to be both HPC friendly and efficient: in the first batch submission is uses pre-defined wait timers10s (<10 jobs), 30s (<50) and 60s (>50)
, after that it keeps track of how long each batch took to finish and then waits for the average time.I have NOT added the re-submission logic, let's do it in another PR. This one is just for the implementation of
libhpc