Skip to content

Conversation

@rsarm
Copy link
Contributor

@rsarm rsarm commented Oct 18, 2022

Closes #2567

@codecov-commenter
Copy link

codecov-commenter commented Oct 18, 2022

Codecov Report

Base: 86.26% // Head: 86.34% // Increases project coverage by +0.08% 🎉

Coverage data is based on head (f64f88c) compared to base (6e68deb).
Patch coverage: 93.10% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2634      +/-   ##
==========================================
+ Coverage   86.26%   86.34%   +0.08%     
==========================================
  Files          60       60              
  Lines       10989    11062      +73     
==========================================
+ Hits         9480     9552      +72     
- Misses       1509     1510       +1     
Impacted Files Coverage Δ
reframe/core/schedulers/slurm.py 53.47% <75.00%> (+0.92%) ⬆️
reframe/utility/__init__.py 92.94% <100.00%> (+0.21%) ⬆️
reframe/core/variables.py 95.75% <0.00%> (-0.56%) ⬇️
reframe/core/meta.py 99.08% <0.00%> (ø)
reframe/core/pipeline.py 93.40% <0.00%> (ø)
reframe/frontend/cli.py 72.41% <0.00%> (+0.04%) ⬆️
reframe/utility/osext.py 84.67% <0.00%> (+0.12%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@vkarak vkarak requested review from ekouts and vkarak October 18, 2022 13:02
@rsarm rsarm marked this pull request as ready for review October 18, 2022 14:08
@rsarm
Copy link
Contributor Author

rsarm commented Oct 18, 2022

Here the idea is to use the command scontrol show hostname {nodespec}, which I don't think it looks at all to the Slurm database. I think it's just an utility to expand a node list string. For instance

> scontrol show hostname xxx0[00-05]
xxx000
xxx001
xxx002
xxx003
xxx004
xxx005

I was considering checking if len(job.nodelist) is equal to number of nodes and in that case, stop updating the node list. I'm a bit confused though with job.num_tasks_per_node in the case of flexible allocations. If job.num_tasks_per_node is None it means that there's only one node? Or that's not valid for flexible allocations?

Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rsarm I think the best approach is to simply store the nodespec as we get it back from the sacct or squeue and then convert it to a nodelist using scontrol once at the end after the job has finished. This is the easiest implementation, imho, until we implement the inverse of nodelist_abbrev.

Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR according to my last suggestion. We save the nodespec with every sacct and we expand it into the nodelist the first time this is requested. I have also added a utility function to expand a nodespec into a nodelist.

@vkarak vkarak merged commit b1089e3 into reframe-hpc:master Nov 17, 2022
@rsarm rsarm deleted the nodelist branch February 3, 2023 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incomplete node list

3 participants