clarify kubelet upgrade process #12326

liggitt · 2019-01-22T18:03:02Z

Follow up from #11060, tracked in #12329

Upgrade process for kubelet is not sufficiently clear in user-facing documentation:

add details to kubelet upgrade procedure (whether drain is required, whether skip-level kubelet upgrades are supported, etc) @kubernetes/sig-node-pr-reviews @kubernetes/sig-storage-pr-reviews

Page to Update:
https://kubernetes.io/docs/setup/version-skew-policy/

The text was updated successfully, but these errors were encountered:

roberthbailey · 2019-02-06T08:24:44Z

Is this about in-place updates of the kubelet? Do the kubeadm upgrade tests cover that scenario? The kube-up GCE upgrade tests just replace machines running the old kubelet with machines running a newer one, bypassing the in-place upgrade questions here.

Unless or until we have testing for in-place upgrades, the conservative answer is that the upgrade process for a kubelet is to provision a new machine with the desired kubelet version.

liggitt · 2019-02-06T16:01:27Z

Is this about in-place updates of the kubelet?

Yes

Do the kubeadm upgrade tests cover that scenario?

I don't know. @kubernetes/sig-cluster-lifecycle, @kubernetes/sig-testing?

neolit123 · 2019-02-06T16:11:17Z

Do the kubeadm upgrade tests cover that scenario?

yes, but our tests are failing due to problems with the upgrade framework in k/k (e.g. ginkgo skipping); possibly some other reasons too. the current tests are pretty much unmaintained and there are plans to replace them with something else next cycle, hopefully.

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade-1-13/#drain-control-plane-and-worker-nodes

we do recommend draining in our 12 -> 13 upgrade process.
skipping minor versions is claimed as unsupported. (edit: not claimed -> claimed)

liggitt · 2019-02-06T16:16:51Z

cordon -> drain -> upgrade kubelet -> uncordon is the informal guidance I've seen until now. There may be other deployment-specific reasons to destroy and rebuild nodes (node-level improvements that only take effect for new nodes, etc), but from the kubelet's perspective, drain has been sufficient, as far as I know.

imkin · 2019-03-15T21:57:06Z

/wg lts

sftim · 2019-06-04T08:57:55Z

/language en

fejta-bot · 2019-09-02T09:09:16Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

sftim · 2019-09-10T21:43:03Z

/kind feature
/priority backlog

bowei · 2019-09-10T21:46:01Z

cc: @freehan

sftim · 2019-09-10T22:59:28Z

/remove-lifecycle stale

liggitt · 2019-10-28T16:01:22Z

There is evidence of users upgrading kubelets between minor versions without draining pods (kubernetes/kubernetes#84443). If that is required, the kubelet upgrade docs need to be made explicit.

BenTheElder · 2020-01-07T02:26:40Z

Is it required? Do we have a formal decision on that?

dlipovetsky · 2020-03-19T17:32:07Z

Please, let also us address whether drain is required when upgrading between patch versions, e.g., 1.17.0 to 1.17.1. These upgrades are, arguably, more frequent than upgrades between minor versions, and therefore users have a greater incentive to skip drain.

neolit123 · 2020-03-19T18:37:24Z

WRT workload stability, the process is the same for PATCH iterations, so a drain would be required there too. one potential difference with MINOR updates is the CPU checkpoint format of the kubelet is not supposed to change on PATCH iterations.

fejta-bot · 2020-06-17T19:09:09Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

detiber · 2020-06-17T19:35:36Z

/remove-lifecycle stale

liggitt · 2020-09-09T13:33:11Z

/assign @derekwaynecarr @dchen1107

Routing to sig-node leads. If we require draining nodes before minor version upgrade or reconfiguration (and that is required, as far as I can tell), that needs to be made explicit

sjenning · 2020-10-06T17:13:57Z

My opinion, a cordon -> drain -> upgrade -> uncordon path is the safest thing to document for all situations. We should be able to do patch level upgrades (z-stream, in x.y.z) without draining but, imho, there is no point in complicating the guidance.

dchen1107 · 2020-10-06T18:33:13Z

cordon -> drain -> upgrade kubelet -> uncordon is the only path supported by SIG Node today. In the past, there were efforts to do in-place upgrade kubelet, including containerized kubelet by CoreOS team to simplify the upgrading flow, but none of them are officially supported by the community.

Let's make the upgrade flow explicit in the doc for now, but open for the enhancement.

dlipovetsky · 2020-10-06T19:09:42Z

Now that it's official, I'll work up a docs PR 🙂

sftim · 2020-10-08T01:02:04Z

/triage accepted

fejta-bot · 2021-01-06T02:04:55Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

sftim · 2021-01-06T11:36:26Z

/remove-lifecycle stale

liggitt · 2021-01-14T20:07:41Z

cordon -> drain -> upgrade kubelet -> uncordon is the only path supported by SIG Node today

Opened #26098 to update the doc.

The cluster-upgrade doc already included this information

For each node in your cluster, drain
that node and then either replace it with a new node that uses the {{< skew latestVersion >}}
kubelet, or upgrade the kubelet on that node and bring the node back into service.

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Jan 22, 2019

This was referenced Jan 22, 2019

Add information on supported version skew and upgrade order #11060

Merged

Umbrella issue: clarify version, skew, upgrade support #12329

Open

k8s-ci-robot added the wg/lts Categorizes an issue or PR as relevant to WG LTS. label Mar 15, 2019

k8s-ci-robot added the language/en Issues or PRs related to English language label Jun 4, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 2, 2019

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Sep 10, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 10, 2019

liggitt mentioned this issue Oct 28, 2019

upgrading from v1.13.6 to v1.14.7 causes container restart unexpectedly kubernetes/kubernetes#84443

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 17, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 17, 2020

k8s-ci-robot assigned derekwaynecarr and dchen1107 Sep 9, 2020

k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Oct 8, 2020

liggitt mentioned this issue Nov 9, 2020

kubelet's calculation of whether a container has changed can cause cluster-wide outages kubernetes/kubernetes#63814

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2021

liggitt mentioned this issue Jan 14, 2021

Clarify that nodes must be drained before minor version kubelet upgrades #26098

Merged

k8s-ci-robot closed this as completed in #26098 Jan 17, 2021

HirazawaUi mentioned this issue Apr 7, 2024

[kubelet]: fixed container restart due to pod spec field changes kubernetes/kubernetes#124220

Merged

neolit123 mentioned this issue Aug 6, 2024

Clarify node drain requirement for kubelet minor version upgrades #46359

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clarify kubelet upgrade process #12326

clarify kubelet upgrade process #12326

liggitt commented Jan 22, 2019 •

edited

Loading

roberthbailey commented Feb 6, 2019

liggitt commented Feb 6, 2019

neolit123 commented Feb 6, 2019 •

edited

Loading

liggitt commented Feb 6, 2019 •

edited

Loading

imkin commented Mar 15, 2019

sftim commented Jun 4, 2019

fejta-bot commented Sep 2, 2019

sftim commented Sep 10, 2019

bowei commented Sep 10, 2019

sftim commented Sep 10, 2019

liggitt commented Oct 28, 2019

BenTheElder commented Jan 7, 2020

dlipovetsky commented Mar 19, 2020

neolit123 commented Mar 19, 2020

fejta-bot commented Jun 17, 2020

detiber commented Jun 17, 2020

liggitt commented Sep 9, 2020 •

edited

Loading

sjenning commented Oct 6, 2020

dchen1107 commented Oct 6, 2020

dlipovetsky commented Oct 6, 2020

sftim commented Oct 8, 2020

fejta-bot commented Jan 6, 2021

sftim commented Jan 6, 2021

liggitt commented Jan 14, 2021

clarify kubelet upgrade process #12326

clarify kubelet upgrade process #12326

Comments

liggitt commented Jan 22, 2019 • edited Loading

roberthbailey commented Feb 6, 2019

liggitt commented Feb 6, 2019

neolit123 commented Feb 6, 2019 • edited Loading

liggitt commented Feb 6, 2019 • edited Loading

imkin commented Mar 15, 2019

sftim commented Jun 4, 2019

fejta-bot commented Sep 2, 2019

sftim commented Sep 10, 2019

bowei commented Sep 10, 2019

sftim commented Sep 10, 2019

liggitt commented Oct 28, 2019

BenTheElder commented Jan 7, 2020

dlipovetsky commented Mar 19, 2020

neolit123 commented Mar 19, 2020

fejta-bot commented Jun 17, 2020

detiber commented Jun 17, 2020

liggitt commented Sep 9, 2020 • edited Loading

sjenning commented Oct 6, 2020

dchen1107 commented Oct 6, 2020

dlipovetsky commented Oct 6, 2020

sftim commented Oct 8, 2020

fejta-bot commented Jan 6, 2021

sftim commented Jan 6, 2021

liggitt commented Jan 14, 2021

liggitt commented Jan 22, 2019 •

edited

Loading

neolit123 commented Feb 6, 2019 •

edited

Loading

liggitt commented Feb 6, 2019 •

edited

Loading

liggitt commented Sep 9, 2020 •

edited

Loading