Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hook race fix #21

Merged
merged 10 commits into from
Aug 18, 2023
Merged

Hook race fix #21

merged 10 commits into from
Aug 18, 2023

Conversation

wtripp180901
Copy link
Contributor

The pre-upgrade hook will now drain all nodes before checking if jobs are running on them in order to prevent a race conditions in which pending jobs are scheduled onto the nodes after the check if complete. Will RESUME nodes on a successful upgrade

@wtripp180901 wtripp180901 requested a review from sjpb August 8, 2023 11:54
docker-entrypoint.sh Outdated Show resolved Hide resolved
docker-entrypoint.sh Outdated Show resolved Hide resolved
docker-entrypoint.sh Outdated Show resolved Hide resolved
@wtripp180901
Copy link
Contributor Author

Made scontrol changes, only thing is that by using NodeName=all the other named nodes (i.e slurmd-[2-9]) will get added to the list in a drain state, but I'm guessing this wouldn't really be an issue in production?

@wtripp180901 wtripp180901 requested a review from sjpb August 8, 2023 14:21
@wtripp180901
Copy link
Contributor Author

Actually one thing that I've realised is that this fix introduces a NEW race condition where once the nodes are undrained, jobs could potentially get scheduled onto them before the upgrade starts, I need to move the undraining to a post-upgrade hook

@wtripp180901 wtripp180901 merged commit a0a2323 into main Aug 18, 2023
1 check passed
@wtripp180901 wtripp180901 deleted the hook-race-fix branch August 18, 2023 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants