Skip to content

Conversation

@jgphpc
Copy link
Contributor

@jgphpc jgphpc commented Apr 6, 2021

Closes #1730

EDIT vkarak: This PR adds an algorithm for compressing node lists and returning a a condensed string representation for them. This abbreviated form of node lists is used in FAILURE INFO reports, but not in the full JSON run report. The reason for this is that the JSON report is meant as raw report info that other tools can process, thus it makes more sense imho not to abbreviate the node lists there.

@jgphpc jgphpc requested a review from vkarak April 6, 2021 11:18
@jgphpc jgphpc self-assigned this Apr 6, 2021
@codecov-io
Copy link

codecov-io commented Apr 6, 2021

Codecov Report

Merging #1912 (0bafe2f) into master (725c78f) will decrease coverage by 0.01%.
The diff coverage is 75.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1912      +/-   ##
==========================================
- Coverage   87.90%   87.89%   -0.02%     
==========================================
  Files          49       49              
  Lines        8451     8459       +8     
==========================================
+ Hits         7429     7435       +6     
- Misses       1022     1024       +2     
Impacted Files Coverage Δ
reframe/core/schedulers/slurm.py 52.15% <0.00%> (-0.29%) ⬇️
reframe/core/schedulers/__init__.py 98.40% <100.00%> (+0.03%) ⬆️
reframe/frontend/cli.py 76.03% <100.00%> (+0.04%) ⬆️
reframe/frontend/statistics.py 95.47% <100.00%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 725c78f...0bafe2f. Read the comment docs.

@pep8speaks
Copy link

pep8speaks commented Apr 7, 2021

Hello @jgphpc, Thank you for updating!

Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide

Comment last updated at 2021-04-16 15:10:34 UTC

@jgphpc jgphpc changed the title [wip][feat] Abbreviate nodelist using hostlist format [feat] Abbreviate nodelist using hostlist format Apr 8, 2021
@jgphpc
Copy link
Contributor Author

jgphpc commented Apr 8, 2021

note: we could prefer hostlist format over nodelist and/or switch to hostlist if len(nodelist) > 1000 ?

Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution is entirely Slurm specific. I would prefer that we converted any host list to an abbreviated sequence ourselves. Algorithmically, it should not be difficult as soon as we sort the nodes. It's practically a run-length encoding of the node ids. As for the node name format, we can assume a generic pattern that ends in a sequence of consecutive numbers.

Also we don't need command line options etc. for this. We only need a configuration parameter and an associated environment variable RFM_ABBREV_NODELIST=<n>. If <n> is zero then we don't abbreviate, otherwise we abbreviate any node list with size >= n.

@vkarak vkarak self-assigned this Apr 13, 2021
@vkarak
Copy link
Contributor

vkarak commented Apr 15, 2021

@jgphpc I took care of the algorithm and it works nicely now. Can you address the rest of the comments?

Also we don't need command line options etc. for this. We only need a configuration parameter and an associated environment variable RFM_ABBREV_NODELIST=. If is zero then we don't abbreviate, otherwise we abbreviate any node list with size >= n.

Also I don't think that this conversion should be done at the backends, so all the change in the scheduler backends should be reverted. This conversion is purely a presentation thing, so it has to go into the frontend, when we generate the final report.

@vkarak vkarak requested a review from victorusu April 15, 2021 11:37
@vkarak vkarak requested a review from jjotero April 15, 2021 15:47
Copy link
Contributor

@jjotero jjotero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Vasileios Karakasis added 2 commits April 16, 2021 16:36
- Node lists are always abbreviated in the `FAILURE INFO` output but not in the
  JSON report.
@vkarak vkarak changed the title [feat] Abbreviate nodelist using hostlist format [feat] Abbreviate node lists in FAILURE INFO reports Apr 16, 2021
@victorusu
Copy link
Contributor

@vkarak, if I understand the last changes correctly, they imply that we will always use the abbreviated node list. Is it what we want?

@vkarak
Copy link
Contributor

vkarak commented Apr 16, 2021

@victorusu Check the modified description of the PR. Yes, in FAILURE INFO there is no reason to use the non-abbreviated form. Conversely, the JSON report contains the full node list.

Copy link
Contributor

@victorusu victorusu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. I just have one question related to whether the r['nodelist'] array is always populated, because it used to have a check if it was empty or not. There are some assertions requesting that it has at least one entry. So, it got me confused a bit.

@vkarak vkarak merged commit 91f481f into reframe-hpc:master Apr 16, 2021
@jgphpc jgphpc deleted the nodelist branch April 18, 2021 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Abbreviate host lists when reporting node lists

6 participants