-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add force-reboot after force-timeout duration has been exceeded #341
Add force-reboot after force-timeout duration has been exceeded #341
Conversation
933fc91
to
864332f
Compare
@dholbach I rebased and retarged the PR. It's unclear to me why the github action is failing, and I didn't find any documents on how to reproduce locally, so :\ |
The command to run is |
See also #344 for the fix to that test. |
Thanks for rebasing and hanging in there - I hope we have things fixed up quickly again. |
#344 is merged now - can you rebase again? |
864332f
to
6529298
Compare
@dholbach rebased and all checks passed now. |
I'm not necessarily the best person to review this, but I pinged folks on Slack. Thanks a lot for rebasing! |
b881e19
to
8db5650
Compare
"force a reboot even if the drain is still running (default false)") | ||
rootCmd.PersistentFlags().IntVar(&drainGracePeriod, "drain-grace-period", -1, | ||
"grace period of time for pods to wait for the node drain in seconds (default -1)") | ||
rootCmd.PersistentFlags().DurationVar(&drainTimeout, "drain-timeout", 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory I'm fine with a zero drain timeout, as that is operationally equivalent to the current implementation (no timeout specified).
I think in the future it might be sensible to express an opinion here, as a node "stuck" in a drain operation can slow down kured-induced node reboots across the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jackfrancis . I would prefer set that in a different PR though, and in a different release (we have many things lined up for this one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice that you mention "infinite time", it's definitely good for our users!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like where this is heading
"force a reboot even if the drain is still running (default false)") | ||
rootCmd.PersistentFlags().IntVar(&drainGracePeriod, "drain-grace-period", -1, | ||
"grace period of time for pods to wait for the node drain in seconds (default -1)") | ||
rootCmd.PersistentFlags().DurationVar(&drainTimeout, "drain-timeout", 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jackfrancis . I would prefer set that in a different PR though, and in a different release (we have many things lined up for this one).
"force a reboot even if the drain is still running (default false)") | ||
rootCmd.PersistentFlags().IntVar(&drainGracePeriod, "drain-grace-period", -1, | ||
"grace period of time for pods to wait for the node drain in seconds (default -1)") | ||
rootCmd.PersistentFlags().DurationVar(&drainTimeout, "drain-timeout", 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice that you mention "infinite time", it's definitely good for our users!
0902744
to
25dcf3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that good enough. We can iterate later on follow up PRs (expose the feature to helm chart, refactorings)
I am leaving the time for others to review, if nobody has reviewed in the next days, I will merge this. |
Nobody opposes, let's merge! |
Resserrected copy of previous pull request #279
Updated version of #109 which passes in the force-timeout parameter into the kubectl drain helper configuration, utilizing Go context cancellation instead.
I also updated the helm chart to support passing the new parameters.