-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ROCm] Add diskspace check for rocm CI nodes #93032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93032
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 7288d74: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
/easycla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM! But as this step is generic and can be used by others as well. It makes sense to move it to an action, i.e. https://github.com/pytorch/pytorch/blob/master/.github/actions/get-workflow-job-id/action.yml, then invoke the action wherever it is needed. The action could accept a diskspace cutoff param defaults to 70%
Also there is a minor linter check.
Also please sign the CLA first |
I'm signing the easycla but it's not updating on the ticket. I submitted a help request for the issue. |
Oh once you have signed it, I would just need to invoke the check again to make it pass |
Sounds good! Please check if it works. |
/easycla |
Weird, invoking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but please make sure to sign CLA (PR authors email must be covered by CLA)
Sounds good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, actually, any tips how one can test it?
Because, afaik $diskspace -ge diskspace_cutoff
would always evaluate to true, aren't it?
$ diskspace=1 ;if [[ $diskspace -ge diskspace_cutoff ]]; then echo "Ha ha ha $diskspace"; fi
Ha ha ha 1
@ZainRizvi , Hmm, don't we have lintrunner to detect such mistakes? |
@kit1980, do we have a shell lintrunner to catch issues like the one mentioned here? |
The support team stated the following: To resolve this [issue], the community usually make sure the GitHub Global Configs are properly configured, and run a git rebase to update the committer/author information on that specific commit. |
@pytorchbot rebase |
@pytorchbot successfully started a rebase job. Check the current status here |
Successfully rebased |
6ae95c3
to
c42e549
Compare
/easycla |
@pytorchbot rebase |
@pytorchbot successfully started a rebase job. Check the current status here |
Successfully rebased |
634a6a2
to
667adac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for fixing this issue!
@amdfaa Added ciflow/trunk label to trigger ROCm test jobs on trunk workflow. Please check them when they finish to see that the diskspace check is being executed correctly. |
I have locally tested this pr on jwr-20 with a lower cutoff and the script executed successfully. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Fixes #92822
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport