Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: deprecate the `ClusterStatus` dependency #87656

Merged

Conversation

@ereslibre
Copy link
Member

ereslibre commented Jan 29, 2020

What type of PR is this?
/kind feature

What this PR does / why we need it:
While ClusterStatus will be maintained and uploaded, it won't be
used by the internal kubeadm logic in order to determine the etcd
endpoints anymore.

The only exception is during the first upgrade cycle (kubeadm upgrade apply, kubeadm upgrade node), in which we will fallback to the
ClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.

Which issue(s) this PR fixes:
Implements kubernetes/enhancements#1380

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

kubeadm: The ClusterStatus struct present in the kubeadm-config ConfigMap is deprecated and will be removed on a future version. It is going to be maintained by kubeadm until it gets removed. The same information can be found on `etcd` and `kube-apiserver` pod annotations, `kubeadm.kubernetes.io/etcd.advertise-client-urls` and `kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint` respectively.

@kubernetes/sig-cluster-lifecycle @kubernetes/sig-cluster-lifecycle-pr-reviews

@ereslibre

This comment has been minimized.

Copy link
Member Author

ereslibre commented Jan 29, 2020

@ereslibre ereslibre force-pushed the ereslibre:do-not-depend-on-cluster-status branch from 95f1451 to 4bc036e Jan 29, 2020
@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Jan 29, 2020

applying hold for review.
/hold

Copy link
Member

rosti left a comment

Thanks @ereslibre !
Overall, I like how this is going.

cmd/kubeadm/app/apis/kubeadm/apiendpoint.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/util/etcd/etcd.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/util/etcd/etcd.go Outdated Show resolved Hide resolved
@ereslibre ereslibre force-pushed the ereslibre:do-not-depend-on-cluster-status branch 2 times, most recently from 3d4f225 to fc098d9 Jan 31, 2020
@ereslibre ereslibre requested a review from rosti Jan 31, 2020
@ereslibre

This comment has been minimized.

Copy link
Member Author

ereslibre commented Jan 31, 2020

Thanks @rosti for your review!, this is ready for another pass.

@ereslibre ereslibre force-pushed the ereslibre:do-not-depend-on-cluster-status branch from fc098d9 to 7b17998 Jan 31, 2020
@k8s-ci-robot k8s-ci-robot added size/XL and removed size/L labels Jan 31, 2020
@ereslibre ereslibre force-pushed the ereslibre:do-not-depend-on-cluster-status branch 3 times, most recently from 0448978 to 764e993 Jan 31, 2020
Copy link
Member

rosti left a comment

Thanks @ereslibre !
I feel that we need to focus our unit tests on the utility funcs. That way we can make more thorough and easy to follow spec as some problems may be hidden by our broad spec tests.

cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved
@rosti
rosti approved these changes Feb 11, 2020
Copy link
Member

rosti left a comment

Thanks @ereslibre !
Looks good to me! Only minor naming nits at this point.
I'll nevertheless hold for a review by @fabriziopandini as he is the KEP author and may spot some detail that I've missed.
/lgtm
/hold

cmd/kubeadm/app/util/config/cluster.go Show resolved Hide resolved
e, ok := clusterStatus.APIEndpoints[nodeName]
if !ok {
return errors.New("failed to get APIEndpoint information for this node")
func getRawAPIEndpointFromPodAnnotationWithoutRetry(client clientset.Interface, nodeName string) (string, error) {

This comment has been minimized.

Copy link
@rosti

rosti Feb 11, 2020

Member

I see where you are going, but can't we merge getAPIEndpointWithBackoff and getAPIEndpoint?

@@ -127,6 +122,95 @@ func NewFromCluster(client clientset.Interface, certificatesDir string) (*Client
return etcdClient, nil
}

// getEtcdEndpoints returns the list of etcd endpoints.
func getEtcdEndpoints(client clientset.Interface) ([]string, error) {
return getEtcdEndpointsWithBackoff(client, constants.StaticPodMirroringDefaultRetry)

This comment has been minimized.

Copy link
@rosti

rosti Feb 11, 2020

Member

Again, can we merge getEtcdEndpoints and getEtcdEndpointsWithBackoff?

This comment has been minimized.

Copy link
@ereslibre

ereslibre Feb 11, 2020

Author Member

The reason for these functions is that getEtcdEndpoints and getAPIEndpoint don't need testing (they have no logic). We test their WithBackoff counterparts, where we can control the backoff on the unit tests, so we have faster unit test execution and controlled with the backoff required depending on the test cases we are stubbing.

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 11, 2020
Copy link
Member

fabriziopandini left a comment

@ereslibre this is turning out well.
The only nit from my side is about avoiding to test the ExponentialBackoff behavior

cmd/kubeadm/app/util/config/cluster_test.go Outdated Show resolved Hide resolved
cmd/kubeadm/app/util/etcd/etcd_test.go Outdated Show resolved Hide resolved
@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Feb 13, 2020

/retitle kubeadm: deprecate the ClusterStatus dependency

@k8s-ci-robot k8s-ci-robot changed the title kubeadm: Remove `ClusterStatus` dependency kubeadm: deprecate the `ClusterStatus` dependency Feb 13, 2020
@ereslibre ereslibre force-pushed the ereslibre:do-not-depend-on-cluster-status branch from cbd2c85 to 9dacd21 Feb 19, 2020
@k8s-ci-robot k8s-ci-robot removed the lgtm label Feb 19, 2020
@ereslibre

This comment has been minimized.

Copy link
Member Author

ereslibre commented Feb 19, 2020

@rosti
rosti approved these changes Feb 19, 2020
Copy link
Member

rosti left a comment

Thanks @ereslibre !
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 19, 2020
@ereslibre

This comment has been minimized.

Copy link
Member Author

ereslibre commented Feb 19, 2020

/retest

@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Feb 20, 2020

/priority important-longterm

@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Feb 20, 2020

@ereslibre bazel needs a update:

pull-kubernetes-verify — Job failed.

ereslibre added 2 commits Jan 24, 2020
While `ClusterStatus` will be maintained and uploaded, it won't be
used by the internal `kubeadm` logic in order to determine the etcd
endpoints anymore.

The only exception is during the first upgrade cycle (`kubeadm upgrade
apply`, `kubeadm upgrade node`), in which we will fallback to the
ClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.
When doing the very first upgrade from a cluster that contains the
source of truth in the ClusterStatus struct, the new kubeadm logic
will try to retrieve this information from annotations.

This changeset adds to both etcd and apiserver endpoint retrieval the
special case in which they won't retry if we are in such cases. The
logic will retry if we find any unknown error, but will not retry in
the following cases:

- etcd annotations do not contain etcd endpoints, but the overall list
  of etcd pods is greater than 0. This means that we listed at least
  one etcd pod, but they are missing the annotation.

- API server annotation is not found on the api server pod for a given
  node name, but no errors aside from that one were found. This means
  that the API server pod is present, but is missing the annotation.

In both cases there is no point in retrying, and so, this speeds up the
upgrade path when coming from a previous existing cluster.
@ereslibre ereslibre force-pushed the ereslibre:do-not-depend-on-cluster-status branch from 9dacd21 to 3e59a06 Feb 20, 2020
@k8s-ci-robot k8s-ci-robot removed the lgtm label Feb 20, 2020
@ereslibre

This comment has been minimized.

Copy link
Member Author

ereslibre commented Feb 20, 2020

bazel needs a update:

Ouch, updated and took the opportunity to rebase on top of latest master.

Copy link
Member

fabriziopandini left a comment

Thanks @ereslibre for addressing all the comments!
Let's keep an eye on the test grid now!
/hold cancel
/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm and removed do-not-merge/hold labels Feb 22, 2020
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 22, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ereslibre, fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Feb 23, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Feb 23, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 31b8c0d into kubernetes:master Feb 23, 2020
15 of 16 checks passed
15 of 16 checks passed
tide Not mergeable. Retesting: pull-kubernetes-kubemark-e2e-gce-big
Details
cla/linuxfoundation ereslibre authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-e2e-kind-ipv6 Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Feb 23, 2020
@ereslibre ereslibre deleted the ereslibre:do-not-depend-on-cluster-status branch Feb 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.