Add force-reboot after force-timeout duration has been exceeded #341

cnmcavoy · 2021-04-06T15:51:08Z

Resserrected copy of previous pull request #279

Updated version of #109 which passes in the force-timeout parameter into the kubectl drain helper configuration, utilizing Go context cancellation instead.

I also updated the helm chart to support passing the new parameters.

cnmcavoy · 2021-04-06T18:41:50Z

@dholbach I rebased and retarged the PR. It's unclear to me why the github action is failing, and I didn't find any documents on how to reproduce locally, so :\

evrardjp · 2021-04-07T07:09:10Z

@dholbach I rebased and retarged the PR. It's unclear to me why the github action is failing, and I didn't find any documents on how to reproduce locally, so :\

The command to run is ct lint --config .github/ct.yaml . Weirdly, this PR is not the first one to fail since our branch change.

evrardjp · 2021-04-07T08:17:25Z

@dholbach I rebased and retarged the PR. It's unclear to me why the github action is failing, and I didn't find any documents on how to reproduce locally, so :\

The command to run is ct lint --config .github/ct.yaml . Weirdly, this PR is not the first one to fail since our branch change.

See also #344 for the fix to that test.

dholbach · 2021-04-07T08:20:44Z

Thanks for rebasing and hanging in there - I hope we have things fixed up quickly again.

dholbach · 2021-04-07T08:55:13Z

#344 is merged now - can you rebase again?

cnmcavoy · 2021-04-07T15:13:36Z

@dholbach rebased and all checks passed now.

dholbach · 2021-04-07T15:17:53Z

I'm not necessarily the best person to review this, but I pinged folks on Slack. Thanks a lot for rebasing!

charts/kured/Chart.yaml

cmd/kured/main.go

jackfrancis · 2021-04-07T22:19:59Z

cmd/kured/main.go

+		"force a reboot even if the drain is still running (default false)")
+	rootCmd.PersistentFlags().IntVar(&drainGracePeriod, "drain-grace-period", -1,
+		"grace period of time for pods to wait for the node drain in seconds (default -1)")
+	rootCmd.PersistentFlags().DurationVar(&drainTimeout, "drain-timeout", 0,


In theory I'm fine with a zero drain timeout, as that is operationally equivalent to the current implementation (no timeout specified).

I think in the future it might be sensible to express an opinion here, as a node "stuck" in a drain operation can slow down kured-induced node reboots across the cluster.

I agree with @jackfrancis . I would prefer set that in a different PR though, and in a different release (we have many things lined up for this one).

Nice that you mention "infinite time", it's definitely good for our users!

evrardjp

I like where this is heading

cmd/kured/main.go

evrardjp · 2021-04-08T07:36:46Z

cmd/kured/main.go

+		"force a reboot even if the drain is still running (default false)")
+	rootCmd.PersistentFlags().IntVar(&drainGracePeriod, "drain-grace-period", -1,
+		"grace period of time for pods to wait for the node drain in seconds (default -1)")
+	rootCmd.PersistentFlags().DurationVar(&drainTimeout, "drain-timeout", 0,


I agree with @jackfrancis . I would prefer set that in a different PR though, and in a different release (we have many things lined up for this one).

evrardjp · 2021-04-08T07:38:35Z

cmd/kured/main.go

+		"force a reboot even if the drain is still running (default false)")
+	rootCmd.PersistentFlags().IntVar(&drainGracePeriod, "drain-grace-period", -1,
+		"grace period of time for pods to wait for the node drain in seconds (default -1)")
+	rootCmd.PersistentFlags().DurationVar(&drainTimeout, "drain-timeout", 0,


Nice that you mention "infinite time", it's definitely good for our users!

cmd/kured/main.go

…donning fails

evrardjp

I find that good enough. We can iterate later on follow up PRs (expose the feature to helm chart, refactorings)

evrardjp · 2021-04-09T06:22:40Z

I am leaving the time for others to review, if nobody has reviewed in the next days, I will merge this.

evrardjp · 2021-04-13T14:47:37Z

Nobody opposes, let's merge!

cnmcavoy force-pushed the cnmcavoy/force-reboot-timeout branch from 933fc91 to 864332f Compare April 6, 2021 16:00

Add force-reboot after force-timeout duration has been exceeded

6529298

cnmcavoy force-pushed the cnmcavoy/force-reboot-timeout branch from 864332f to 6529298 Compare April 7, 2021 14:39

jackfrancis reviewed Apr 7, 2021

View reviewed changes

charts/kured/Chart.yaml Outdated Show resolved Hide resolved

cmd/kured/main.go Show resolved Hide resolved

cmd/kured/main.go Outdated Show resolved Hide resolved

cmd/kured/main.go Outdated Show resolved Hide resolved

Refactor force-drain to be a drain-timeout in general

8db5650

cnmcavoy force-pushed the cnmcavoy/force-reboot-timeout branch from b881e19 to 8db5650 Compare April 7, 2021 17:57

Don't panic if the cordon fails and force-reboot is true

2400f34

jackfrancis reviewed Apr 7, 2021

View reviewed changes

cmd/kured/main.go Outdated Show resolved Hide resolved

Update the default drain timeout to be infinite

5a86ef4

jackfrancis reviewed Apr 7, 2021

View reviewed changes

evrardjp reviewed Apr 8, 2021

View reviewed changes

Expose SkipWaitForDeleteTimeoutSeconds and explicitly return when cor…

25dcf3c

…donning fails

cnmcavoy force-pushed the cnmcavoy/force-reboot-timeout branch from 0902744 to 25dcf3c Compare April 8, 2021 14:52

evrardjp approved these changes Apr 9, 2021

View reviewed changes

evrardjp merged commit 8046977 into kubereboot:main Apr 13, 2021

evrardjp mentioned this pull request Apr 14, 2021

Drain timeout #78

Closed

flbla mentioned this pull request Apr 14, 2021

add a drain timeout #351

Closed

cnmcavoy mentioned this pull request Apr 19, 2021

Add force-reboot and drain timeouts to chart config and ds #360

Merged

ckotzbauer mentioned this pull request May 19, 2021

1.7.0 release notes #295

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add force-reboot after force-timeout duration has been exceeded #341

Add force-reboot after force-timeout duration has been exceeded #341

cnmcavoy commented Apr 6, 2021

cnmcavoy commented Apr 6, 2021

evrardjp commented Apr 7, 2021

evrardjp commented Apr 7, 2021

dholbach commented Apr 7, 2021

dholbach commented Apr 7, 2021

cnmcavoy commented Apr 7, 2021

dholbach commented Apr 7, 2021

jackfrancis Apr 7, 2021

evrardjp Apr 8, 2021

evrardjp Apr 8, 2021

evrardjp left a comment

evrardjp Apr 8, 2021

evrardjp Apr 8, 2021

evrardjp left a comment •

edited

Loading

evrardjp commented Apr 9, 2021

evrardjp commented Apr 13, 2021

Add force-reboot after force-timeout duration has been exceeded #341

Add force-reboot after force-timeout duration has been exceeded #341

Conversation

cnmcavoy commented Apr 6, 2021

cnmcavoy commented Apr 6, 2021

evrardjp commented Apr 7, 2021

evrardjp commented Apr 7, 2021

dholbach commented Apr 7, 2021

dholbach commented Apr 7, 2021

cnmcavoy commented Apr 7, 2021

dholbach commented Apr 7, 2021

jackfrancis Apr 7, 2021

Choose a reason for hiding this comment

evrardjp Apr 8, 2021

Choose a reason for hiding this comment

evrardjp Apr 8, 2021

Choose a reason for hiding this comment

evrardjp left a comment

Choose a reason for hiding this comment

evrardjp Apr 8, 2021

Choose a reason for hiding this comment

evrardjp Apr 8, 2021

Choose a reason for hiding this comment

evrardjp left a comment • edited Loading

Choose a reason for hiding this comment

evrardjp commented Apr 9, 2021

evrardjp commented Apr 13, 2021

evrardjp left a comment •

edited

Loading