-
Notifications
You must be signed in to change notification settings - Fork 117
PBS scheduler backend #316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PBS scheduler backend #316
Conversation
|
@rafa-esco Would you bother testing this version of the backend? Downloading the |
| self.sourcesdir = os.path.join(self.current_system.resourcesdir, | ||
| 'CUDA', 'essentials') | ||
| self.modules = ['cudatoolkit'] | ||
| self.modules = ['craype-accel-nvidia60'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's irrelevant here. I should remove it.
|
@vkarak No problemo, I just had to download a zip of the entire repository (because of the import reframe.core.config failing). It runs but there is a problem when cancelling the jobs... you have to change: by a part from that everything (even the async exec policy) seems to work nicely :). |
|
@rafa-esco Thanks for testing! Good catch, that was a typo. I wanted |
|
@vkarak I just noticed a small bug in the _emit_lselect_function... If I do not define _num_cpus_per_task I do not get the "ncpus=" value so my job will only get 1 cpu per node. The default value for _num_cpus_per_tasks should be 1, so maybe adding a num_cpus_per_task = self._num_cpus_per_task or 1 and then using this variable in the downstream will always generate the correct ncpus option: |
|
@rafa-esco So, in your setup, the The general idea for the backends is to generate the minimum required fields based on the user input in his regression test. So if the user does not set |
|
@vkarak ncpus does not need to be present but it will default to 1 core per node, so pbs will only allow me to use one cpu in each node (we have cgroups in here to make things even harder ;) ) even if mpiprocs is larger. |
|
@rafa-esco OK, now I get it. You are right. I'll fix that. |
|
@rafa-esco I've pushed the fixes you suggested. Could you test once more? |
Codecov Report
@@ Coverage Diff @@
## master #316 +/- ##
==========================================
- Coverage 91.45% 91.24% -0.22%
==========================================
Files 66 67 +1
Lines 7807 7935 +128
==========================================
+ Hits 7140 7240 +100
- Misses 667 695 +28
Continue to review full report at Codecov.
|
|
@vkarak Yep, all my tests passed :) |
- This test checks if the `ncpus` resource option is emitted correctly when `num_cpus_per_task` is not defined. - Also some cosmetic changes in `reframe/__init__.py`.
PBS fixes: - Treat separately options and `-lselect` resources. - Treat custom prefixes. Though I don't know of cases that need that, it adds more flexibility and is in accordance with the Slurm backend.
teojgo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
@victorusu I will merge this one. Is this ok? |
Fixes #46 and replaces #277.
To test on Dom, you should do the following:
This special configuration is only valid for Dom and uses
pbs+mpiexec.Still todo: