Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add force-reboot after force-timeout duration has been exceeded #341

Merged
merged 5 commits into from
Apr 13, 2021

Conversation

cnmcavoy
Copy link
Contributor

@cnmcavoy cnmcavoy commented Apr 6, 2021

Resserrected copy of previous pull request #279


Updated version of #109 which passes in the force-timeout parameter into the kubectl drain helper configuration, utilizing Go context cancellation instead.

I also updated the helm chart to support passing the new parameters.

@cnmcavoy cnmcavoy force-pushed the cnmcavoy/force-reboot-timeout branch from 933fc91 to 864332f Compare April 6, 2021 16:00
@cnmcavoy
Copy link
Contributor Author

cnmcavoy commented Apr 6, 2021

@dholbach I rebased and retarged the PR. It's unclear to me why the github action is failing, and I didn't find any documents on how to reproduce locally, so :\

@evrardjp
Copy link
Collaborator

evrardjp commented Apr 7, 2021

@dholbach I rebased and retarged the PR. It's unclear to me why the github action is failing, and I didn't find any documents on how to reproduce locally, so :\

The command to run is ct lint --config .github/ct.yaml . Weirdly, this PR is not the first one to fail since our branch change.

@evrardjp
Copy link
Collaborator

evrardjp commented Apr 7, 2021

@dholbach I rebased and retarged the PR. It's unclear to me why the github action is failing, and I didn't find any documents on how to reproduce locally, so :\

The command to run is ct lint --config .github/ct.yaml . Weirdly, this PR is not the first one to fail since our branch change.

See also #344 for the fix to that test.

@dholbach
Copy link
Member

dholbach commented Apr 7, 2021

Thanks for rebasing and hanging in there - I hope we have things fixed up quickly again.

@dholbach
Copy link
Member

dholbach commented Apr 7, 2021

#344 is merged now - can you rebase again?

@cnmcavoy cnmcavoy force-pushed the cnmcavoy/force-reboot-timeout branch from 864332f to 6529298 Compare April 7, 2021 14:39
@cnmcavoy
Copy link
Contributor Author

cnmcavoy commented Apr 7, 2021

@dholbach rebased and all checks passed now.

@dholbach
Copy link
Member

dholbach commented Apr 7, 2021

I'm not necessarily the best person to review this, but I pinged folks on Slack. Thanks a lot for rebasing!

charts/kured/Chart.yaml Outdated Show resolved Hide resolved
cmd/kured/main.go Show resolved Hide resolved
cmd/kured/main.go Outdated Show resolved Hide resolved
cmd/kured/main.go Outdated Show resolved Hide resolved
@cnmcavoy cnmcavoy force-pushed the cnmcavoy/force-reboot-timeout branch from b881e19 to 8db5650 Compare April 7, 2021 17:57
cmd/kured/main.go Outdated Show resolved Hide resolved
"force a reboot even if the drain is still running (default false)")
rootCmd.PersistentFlags().IntVar(&drainGracePeriod, "drain-grace-period", -1,
"grace period of time for pods to wait for the node drain in seconds (default -1)")
rootCmd.PersistentFlags().DurationVar(&drainTimeout, "drain-timeout", 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory I'm fine with a zero drain timeout, as that is operationally equivalent to the current implementation (no timeout specified).

I think in the future it might be sensible to express an opinion here, as a node "stuck" in a drain operation can slow down kured-induced node reboots across the cluster.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jackfrancis . I would prefer set that in a different PR though, and in a different release (we have many things lined up for this one).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that you mention "infinite time", it's definitely good for our users!

Copy link
Collaborator

@evrardjp evrardjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is heading

cmd/kured/main.go Show resolved Hide resolved
cmd/kured/main.go Outdated Show resolved Hide resolved
"force a reboot even if the drain is still running (default false)")
rootCmd.PersistentFlags().IntVar(&drainGracePeriod, "drain-grace-period", -1,
"grace period of time for pods to wait for the node drain in seconds (default -1)")
rootCmd.PersistentFlags().DurationVar(&drainTimeout, "drain-timeout", 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jackfrancis . I would prefer set that in a different PR though, and in a different release (we have many things lined up for this one).

"force a reboot even if the drain is still running (default false)")
rootCmd.PersistentFlags().IntVar(&drainGracePeriod, "drain-grace-period", -1,
"grace period of time for pods to wait for the node drain in seconds (default -1)")
rootCmd.PersistentFlags().DurationVar(&drainTimeout, "drain-timeout", 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that you mention "infinite time", it's definitely good for our users!

cmd/kured/main.go Show resolved Hide resolved
@cnmcavoy cnmcavoy force-pushed the cnmcavoy/force-reboot-timeout branch from 0902744 to 25dcf3c Compare April 8, 2021 14:52
Copy link
Collaborator

@evrardjp evrardjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that good enough. We can iterate later on follow up PRs (expose the feature to helm chart, refactorings)

@evrardjp
Copy link
Collaborator

evrardjp commented Apr 9, 2021

I am leaving the time for others to review, if nobody has reviewed in the next days, I will merge this.

@evrardjp
Copy link
Collaborator

Nobody opposes, let's merge!

@evrardjp evrardjp merged commit 8046977 into kubereboot:main Apr 13, 2021
@evrardjp evrardjp mentioned this pull request Apr 14, 2021
@flbla flbla mentioned this pull request Apr 14, 2021
@ckotzbauer ckotzbauer mentioned this pull request May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants