-
Notifications
You must be signed in to change notification settings - Fork 117
[feat] Abbreviate node lists in FAILURE INFO reports
#1912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1912 +/- ##
==========================================
- Coverage 87.90% 87.89% -0.02%
==========================================
Files 49 49
Lines 8451 8459 +8
==========================================
+ Hits 7429 7435 +6
- Misses 1022 1024 +2
Continue to review full report at Codecov.
|
|
Hello @jgphpc, Thank you for updating! Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide Comment last updated at 2021-04-16 15:10:34 UTC |
|
note: we could prefer hostlist format over nodelist and/or switch to hostlist if len(nodelist) > 1000 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution is entirely Slurm specific. I would prefer that we converted any host list to an abbreviated sequence ourselves. Algorithmically, it should not be difficult as soon as we sort the nodes. It's practically a run-length encoding of the node ids. As for the node name format, we can assume a generic pattern that ends in a sequence of consecutive numbers.
Also we don't need command line options etc. for this. We only need a configuration parameter and an associated environment variable RFM_ABBREV_NODELIST=<n>. If <n> is zero then we don't abbreviate, otherwise we abbreviate any node list with size >= n.
|
@jgphpc I took care of the algorithm and it works nicely now. Can you address the rest of the comments?
Also I don't think that this conversion should be done at the backends, so all the change in the scheduler backends should be reverted. This conversion is purely a presentation thing, so it has to go into the frontend, when we generate the final report. |
jjotero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
- Node lists are always abbreviated in the `FAILURE INFO` output but not in the JSON report.
FAILURE INFO reports
|
@vkarak, if I understand the last changes correctly, they imply that we will always use the abbreviated node list. Is it what we want? |
|
@victorusu Check the modified description of the PR. Yes, in |
victorusu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. I just have one question related to whether the r['nodelist'] array is always populated, because it used to have a check if it was empty or not. There are some assertions requesting that it has at least one entry. So, it got me confused a bit.
Closes #1730
EDIT vkarak: This PR adds an algorithm for compressing node lists and returning a a condensed string representation for them. This abbreviated form of node lists is used in
FAILURE INFOreports, but not in the full JSON run report. The reason for this is that the JSON report is meant as raw report info that other tools can process, thus it makes more sense imho not to abbreviate the node lists there.