-
Notifications
You must be signed in to change notification settings - Fork 117
[feat] Add SGE scheduler backend #1959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hello @giordano, Thank you for updating! Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide Comment last updated at 2021-07-26 20:14:50 UTC |
|
Can I test this patch? |
ekouts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it looks good to me, thank you @giordano for the contribution! I have some comments but I think you should wait also for @vkarak 's feedback before changing things.
First of all I was wondering if you really need to rewrite everything. As far as I see most of the methods are very similar to the PBS scheduler, so it would make sense to me to inherit from PBS and change only the methods that are different, like we do for the torque scheduler. You would have to change __init__, submit and poll. In the same logic I don't think you need a special _SgeJob, SGE_CANCEL_DELAY and SGE_OUTPUT_WRITEBACK_WAIT, you can just reuse PBS's variables.
|
Thanks for the suggestion about inheriting What are exactly the settings that need to be in the preamble? |
|
ok to test |
Codecov Report
@@ Coverage Diff @@
## master #1959 +/- ##
==========================================
- Coverage 86.74% 86.33% -0.42%
==========================================
Files 52 53 +1
Lines 9258 9337 +79
==========================================
+ Hits 8031 8061 +30
- Misses 1227 1276 +49
Continue to review full report at Codecov.
|
|
Thanks @giordano for the PR! I agree with @ekouts on her comments.
What about setting the I am now looking into your PR in more detail and I'll get back to you with some more comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@giordano Thanks again for your PR! I think that the way you look into the XML output of the jobs can be simplified. Here is an outline of what I think would be more suitable:
@register_scheduler('sge')
class SgeJobScheduler(PbsJobScheduler):
def poll(self, *jobs):
...
# Index jobs
job_index = {job.jobid: job for job in jobs}
for queue_info in root:
...
for job_elem in queue_info:
jobid = job_list.find("JB_job_number").text
if jobid not in job_index:
# Not a reframe job
continue
job = job_index[jobid]
job._state = _sge_state(job_list.find("state").text)
if job.state in {'COMPLETED', 'SUSPENDED', 'ERROR'}:
job._completed = TrueWhere _sge_state() will actually do the mapping of the states, as you exactly do now. I don't think that anything else is needed from what I can imagine by reading through the code.
Co-authored-by: Vasileios Karakasis <vkarak@gmail.com>
|
The problem of the proposed approach is that a completed job may never enter the |
|
@giordano You're right on your comment about the last loop. I've opened a PR to your branch with some small improvements. If it works, you can merge it and then we update the documentation for this one (plus the "decomposition" error message) and we're good to go, I think. |
|
And I will add a unit test for the parts that do not require the scheduler, e.g., the |
Some SGE polling improvements
|
Before the merge I was trying to run some benchmarks using also Spack as build system to test my full pipeline (had to fix some things), but it seems to be working as expected. Thanks a lot! |
This should be mostly working, but I need to fix some details, so I'm marking the pull request as draft.
I probably also need to update the documentation to mention SGE is a supported scheduler?
Fix #1937