From 4a59ebbea1c3d50d36dce67598232cd470eeff99 Mon Sep 17 00:00:00 2001 From: Paco Xu Date: Mon, 29 May 2023 12:24:09 +0800 Subject: [PATCH] use kubeadm-config-instance.yaml to save node specific configurations --- .../3929-no-cri-socket-annotation/README.md | 161 ++++++++++-------- .../3929-no-cri-socket-annotation/kep.yaml | 3 +- 2 files changed, 91 insertions(+), 73 deletions(-) diff --git a/keps/sig-cluster-lifecycle/kubeadm/3929-no-cri-socket-annotation/README.md b/keps/sig-cluster-lifecycle/kubeadm/3929-no-cri-socket-annotation/README.md index b22cf60978d7..c9e48d4a01f3 100644 --- a/keps/sig-cluster-lifecycle/kubeadm/3929-no-cri-socket-annotation/README.md +++ b/keps/sig-cluster-lifecycle/kubeadm/3929-no-cri-socket-annotation/README.md @@ -77,47 +77,71 @@ tags, and then generate with `hack/update-toc.sh`. --> -- [Release Signoff Checklist](#release-signoff-checklist) -- [Summary](#summary) -- [Motivation](#motivation) - - [Goals](#goals) - - [Non-Goals](#non-goals) -- [Proposal](#proposal) - - [User Stories (Optional)](#user-stories-optional) - - [Story 1](#story-1) - - [Story 2](#story-2) - - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) - - [Risks and Mitigations](#risks-and-mitigations) -- [Design Details](#design-details) - - [init: upload a global kubelet configuration with cri socket](#init-upload-a-global-kubelet-configuration-with-cri-socket) - - [join: can override it using --config](#join-can-override-it-using---config) - - [upgrade: re-download global one, but should use local kubelet configuration firstly](#upgrade-re-download-global-one-but-should-use-local-kubelet-configuration-firstly) - - [Proposal 1: respect a list of configuration in local kubelet configuration, and in v1.27, CRI socket is the only one](#proposal-1-respect-a-list-of-configuration-in-local-kubelet-configuration-and-in-v127-cri-socket-is-the-only-one) - - [Proposal 2: introduce a /var/lib/kubelet/kubeadm-config.yaml to maintain node specific configuration](#proposal-2-introduce-a--to-maintain-node-specific-configuration) - - [old version handling](#old-version-handling) - - [Test Plan](#test-plan) - - [Prerequisite testing updates](#prerequisite-testing-updates) - - [Unit tests](#unit-tests) - - [Integration tests](#integration-tests) - - [e2e tests](#e2e-tests) - - [Graduation Criteria](#graduation-criteria) - - [Alpha](#alpha) - - [Beta](#beta) - - [GA](#ga) - - [Deprecation](#deprecation) - - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) - - [Version Skew Strategy](#version-skew-strategy) -- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) - - [Feature Enablement and Rollback](#feature-enablement-and-rollback) - - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) - - [Monitoring Requirements](#monitoring-requirements) - - [Dependencies](#dependencies) - - [Scalability](#scalability) - - [Troubleshooting](#troubleshooting) -- [Implementation History](#implementation-history) -- [Drawbacks](#drawbacks) -- [Alternatives](#alternatives) -- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) +- [3929: Remove CRI Socket Annotation from Node Object](#3929-remove-cri-socket-annotation-from-node-object) + - [Release Signoff Checklist](#release-signoff-checklist) + - [Summary](#summary) + - [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) + - [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) + - [Design Details](#design-details) + - [init: upload a global kubelet configuration with cri socket](#init-upload-a-global-kubelet-configuration-with-cri-socket) + - [join: can override it using --config](#join-can-override-it-using---config) + - [upgrade: re-download global one, but should use local kubelet configuration firstly](#upgrade-re-download-global-one-but-should-use-local-kubelet-configuration-firstly) + - [other proposal: respect a list of configuration in local kubelet configuration, and in v1.27, CRI socket is the only one](#other-proposal-respect-a-list-of-configuration-in-local-kubelet-configuration-and-in-v127-cri-socket-is-the-only-one) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Beta](#beta) + - [GA](#ga) + - [Deprecation](#deprecation) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) + - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [How can this feature be enabled / disabled in a live cluster?](#how-can-this-feature-be-enabled--disabled-in-a-live-cluster) + - [Does enabling the feature change any default behavior?](#does-enabling-the-feature-change-any-default-behavior) + - [Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?](#can-the-feature-be-disabled-once-it-has-been-enabled-ie-can-we-roll-back-the-enablement) + - [What happens if we reenable the feature if it was previously rolled back?](#what-happens-if-we-reenable-the-feature-if-it-was-previously-rolled-back) + - [Are there any tests for feature enablement/disablement?](#are-there-any-tests-for-feature-enablementdisablement) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [How can a rollout or rollback fail? Can it impact already running workloads?](#how-can-a-rollout-or-rollback-fail-can-it-impact-already-running-workloads) + - [What specific metrics should inform a rollback?](#what-specific-metrics-should-inform-a-rollback) + - [Were upgrade and rollback tested? Was the upgrade-\>downgrade-\>upgrade path tested?](#were-upgrade-and-rollback-tested-was-the-upgrade-downgrade-upgrade-path-tested) + - [Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?](#is-the-rollout-accompanied-by-any-deprecations-andor-removals-of-features-apis-fields-of-api-types-flags-etc) + - [Monitoring Requirements](#monitoring-requirements) + - [How can an operator determine if the feature is in use by workloads?](#how-can-an-operator-determine-if-the-feature-is-in-use-by-workloads) + - [How can someone using this feature know that it is working for their instance?](#how-can-someone-using-this-feature-know-that-it-is-working-for-their-instance) + - [What are the reasonable SLOs (Service Level Objectives) for the enhancement?](#what-are-the-reasonable-slos-service-level-objectives-for-the-enhancement) + - [What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?](#what-are-the-slis-service-level-indicators-an-operator-can-use-to-determine-the-health-of-the-service) + - [Are there any missing metrics that would be useful to have to improve observability of this feature?](#are-there-any-missing-metrics-that-would-be-useful-to-have-to-improve-observability-of-this-feature) + - [Dependencies](#dependencies) + - [Does this feature depend on any specific services running in the cluster?](#does-this-feature-depend-on-any-specific-services-running-in-the-cluster) + - [Scalability](#scalability) + - [Will enabling / using this feature result in any new API calls?](#will-enabling--using-this-feature-result-in-any-new-api-calls) + - [Will enabling / using this feature result in introducing new API types?](#will-enabling--using-this-feature-result-in-introducing-new-api-types) + - [Will enabling / using this feature result in any new calls to the cloud provider?](#will-enabling--using-this-feature-result-in-any-new-calls-to-the-cloud-provider) + - [Will enabling / using this feature result in increasing size or count of the existing API objects?](#will-enabling--using-this-feature-result-in-increasing-size-or-count-of-the-existing-api-objects) + - [Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?](#will-enabling--using-this-feature-result-in-increasing-time-taken-by-any-operations-covered-by-existing-slisslos) + - [Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?](#will-enabling--using-this-feature-result-in-non-negligible-increase-of-resource-usage-cpu-ram-disk-io--in-any-components) + - [Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?](#can-enabling--using-this-feature-result-in-resource-exhaustion-of-some-node-resources-pids-sockets-inodes-etc) + - [Troubleshooting](#troubleshooting) + - [How does this feature react if the API server and/or etcd is unavailable?](#how-does-this-feature-react-if-the-api-server-andor-etcd-is-unavailable) + - [What are other known failure modes?](#what-are-other-known-failure-modes) + - [What steps should be taken if SLOs are not being met to determine the problem?](#what-steps-should-be-taken-if-slos-are-not-being-met-to-determine-the-problem) + - [Implementation History](#implementation-history) + - [Drawbacks](#drawbacks) + - [Alternatives](#alternatives) + - [Infrastructure Needed (Optional)](#infrastructure-needed-optional) ## Release Signoff Checklist @@ -196,9 +220,15 @@ cri socket in kubelet configuration. ## Proposal -1. init: upload a global kubelet configuration with cri socket -2. join: can override it using --config -3. upgrade: re-download global one, but should use local kubelet configuration firstly +1. init: upload a global kubelet configuration with cri socket. + - the cri socket will take `--cri-socket` value and if the flag is empty, kubeadm will auto-detect it. + - After seting or auto-detecting, it will be set in the global kubelet configuration. +2. join: it will use the global confugration. + - if it is not set in the gloabl configuration, it will use `--cri-socket` value + - if it is still empty, kubeadm will auto-detect it. + - join will not change the global configuration, and if it is different with the global, + kubeadm will save it in `/var/lib/kubelet/kubeadm-config-instance.yaml` +3. upgrade: re-download global one, but should use local kubelet configuration firstly in `kubeadm-config-instance.yaml` ### User Stories (Optional) @@ -214,9 +244,11 @@ cri socket in kubelet configuration. ### init: upload a global kubelet configuration with cri socket -- `kubeadm init` will not add the annotation to node. +- `kubeadm init` will not add the annotation to node any more. - `kubeadm init` will check the customized `--config` at first and if no cri socket is set, it will - auto-detect it and save it global configuration and local as well. + auto-detect it and save it global configuration. + if `--cri-socket` is specified, we will use it in the local kubelet configuration and `kubeadm-config-instance.yaml`, + but it will not be saved to the global configuration. ### join: can override it using --config @@ -224,12 +256,22 @@ cri socket in kubelet configuration. - `kubeadm join` will download the kubelet configuration from apiserver and the customized `--config` at first and auto-detect will work only if not set. Auto-detect may log a warning message if it may be misconfigured and log a general debug log if there is multi CRI-sockets. + if `--cri-socket` is specified, we will use it in the local kubelet configuration and `kubeadm-config-instance.yaml`, + but it will not be saved to the global configuration. ### upgrade: re-download global one, but should use local kubelet configuration firstly - `kubeadm upgrade` will download the kubelet configuration from apiserver and respect local one. +- in v1.28-1.29, for backward compatibiliy, when `kubeadm upgrade apply`, we will read the `cri` annotation(if no annotation, we autodetect it) + and then patch it to the global configuration. `kubeadm upgrade node` is similar, and it will never change global configuration. +- in v1.30+, `kubeadm upgrade apply` will not read the cri annotation any more. +- in v1.28, for other nodes, `kubeadm upgrade node` will check if the cri annotation is diffent with the global setting. + if `cri-socket` is different, we will use it in the local kubelet configuration and `kubeadm-config-instance.yaml`, + but it will not be saved to the global configuration. +- in v1.29 or later, `kubeadm upgrade node` will check `kubeadm-config-instance.yaml` at first and then check annoation like v1.28. +- in v1.30+, `kubeadm upgrade node` will check `kubeadm-config-instance.yaml` and then global configuration only. -### Proposal 1: respect a list of configuration in local kubelet configuration, and in v1.27, CRI socket is the only one +### other proposal: respect a list of configuration in local kubelet configuration, and in v1.27, CRI socket is the only one During `kubeadm upgrade`, kubeadm will read the local kubelet configuration in `/var/lib/kubelet/config.yaml`. kubeadm also download the kubelet configuration from configmap and replace the `containerRuntimeEndpoint` and @@ -240,31 +282,6 @@ A node-specific kubelet configuration list should be maintained in kubeadm code. - containerRuntimeEndpoint - imageServiceEndpoint -### Proposal 2: introduce a `/var/lib/kubelet/kubeadm-config.yaml` to maintain node specific configuration - -We should introduce a `/var/lib/kubelet/kubeadm-config.yaml` to maintain node specific configuration. -It is similar to `/var/lib/kubelet/kubeadm-flags.env`. - -```text -KUBELET_KUBEADM_ARGS="--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=k8s.m.daocloud.io/pause:3.9" -``` - -We may introduce a feature gate "KubeadmNodeSpecificConfig" to enable the use the `/var/lib/kubelet/kubeadm-config.yaml` here. - -- If the feature gate is disabled, use the cri socket annotation directly. -- If the feature gate is enabled, `/var/lib/kubelet/kubeadm-config.yaml` will be created and the cri socket will be maintained in it. - -[To be discussed] Another proposal is using a strategy like `--patch`. A file like `/var/lib/kubelet/kubeadm-config.patch` -or a `kubelet.yaml`/`config.ayml` file under `/var/lib/kubelet/patch/`. (This should be removed if we make a decision). - -### old version handling - -For old version cluster upgradation with the annotation, we will not touch the annotation at first. - -1. in v1.28, `kubeadm upgrade` will respect the annotation and save it to local kubelet configuration or node - specific configuration `/var/lib/kubelet/kubeadm-config.yaml`. [TODO update according to the final decision] -2. in v1.29, `kubeadm upgrade` will ignore the annotation. - ### Test Plan [x] I/we understand the owners of the involved components may require updates to diff --git a/keps/sig-cluster-lifecycle/kubeadm/3929-no-cri-socket-annotation/kep.yaml b/keps/sig-cluster-lifecycle/kubeadm/3929-no-cri-socket-annotation/kep.yaml index d271c447c932..212e63a744ae 100644 --- a/keps/sig-cluster-lifecycle/kubeadm/3929-no-cri-socket-annotation/kep.yaml +++ b/keps/sig-cluster-lifecycle/kubeadm/3929-no-cri-socket-annotation/kep.yaml @@ -7,9 +7,10 @@ participating-sigs: - sig-cluster-lifecycle status: provisional creation-date: 2023-03-30 -last-updated: 2022-04-03 +last-updated: 2022-05-29 reviewers: - "@neolit123" + - "@chendave" approvers: - "@neolit123" latest-milestone: "0.0"