exit nodeup gracefully if server already exists in k8s #15138

zetaab · 2023-02-12T14:15:43Z

Original issue for this is #15057 but it was reverted due to other issues in #15129

some discussion available in #15114 (review)

I have verified that this solution works in OpenStack. When existing instance tries to join again to cluster (it does that when rebooting server OR restarting kops-configuration.service):

kops-controller log:

kops-controller-t4b8j kops-controller I0212 14:11:02.063861       1 server.go:148] 10.2.96.248:59480: instance already exists in kubernetes cluster

node log

Feb 12 14:11:03 nodes-helpa-fctnib nodeup[669]: I0212 14:11:03.187949     669 client.go:156] bootstrap request responded with code 204
Feb 12 14:11:03 nodes-helpa-fctnib systemd[1]: kops-configuration.service: Succeeded.
Feb 12 14:11:03 nodes-helpa-fctnib systemd[1]: Finished Run kOps bootstrap (nodeup).

So it works like planned. The only possible problem in this solution is that if someone do have older than 455 days https://github.com/kubernetes/kops/blob/master/cmd/kops-controller/pkg/server/server.go#L188 kubelet, the cert will expiry and it cannot renew the cert because it will be still part of the Kubernetes cluster (node is maybe in NotReady state). Currently the workaround for that is delete the node manually kubectl delete node... and restart it in cloudprovider (or kops-configuration.service in node itself). After that the node can request the new certificate. I am just hoping that people could update their clusters in time and they should NOT have 455 day old nodes in their cluster.

cc @justinsb @hakman

hakman · 2023-02-12T14:28:40Z

/hold for 1-2 days for more feedback

pkg/kopscontrollerclient/client.go

pkg/bootstrap/authenticate.go

justinsb · 2023-02-15T14:04:19Z

cmd/kops-controller/pkg/server/server.go

@@ -142,6 +142,12 @@ func (s *Server) bootstrap(w http.ResponseWriter, r *http.Request) {

 	id, err := s.verifier.VerifyToken(ctx, r, r.Header.Get("Authorization"), body, s.opt.Server.UseInstanceIDForNodeName)
 	if err != nil {
+		// means that we should exit nodeup gracefully
+		if err == bootstrap.ErrAlreadyExists {
+			w.WriteHeader(http.StatusNoContent)


Nit-picking, but we could also do http.StatusConflict - that's what I've seen from GCP APIs at least.

justinsb · 2023-02-15T14:05:32Z

One nit pick on the http status code, but I like this approach - a specific response, and exit with 0 👍

justinsb · 2023-02-15T14:11:48Z

On the expiry side, I think we should figure out what behaviour we want here when the node already exists. IIUC this is mostly a security defense-in-depth, so as we beef up auth here (e.g. the node callback ) then we might not need it at all. Whatever we decide, I'd like for us to try to converge the various clouds so they have similar behvaior here.

Anyway, I don't think my nit on the status code is blocking ....

/lgtm

hakman · 2023-02-15T14:46:34Z

On the expiry side, I think we should figure out what behaviour we want here when the node already exists. IIUC this is mostly a security defense-in-depth, so as we beef up auth here (e.g. the node callback ) then we might not need it at all. Whatever we decide, I'd like for us to try to converge the various clouds so they have similar behvaior here.

It is also nice to know that the node config is sent only once and we don't update it dynamically on reboot, if something changes in the IG config.

k8s-ci-robot · 2023-02-20T11:02:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [hakman]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…15138-upstream-release-1.26 Automated cherry pick of #15069: openstack verifier: support IPv6 #15138: exit gracefully if server already exists in k8s

…-of-#15069-kubernetes#15138-upstream-release-1.26 Automated cherry pick of kubernetes#15069: openstack verifier: support IPv6 kubernetes#15138: exit gracefully if server already exists in k8s

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 12, 2023

zetaab changed the title ~~exit gracefully if server already exists in k8s~~ exit nodeup gracefully if server already exists in k8s Feb 12, 2023

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. area/kops-controller labels Feb 12, 2023

k8s-ci-robot requested review from johngmyers and olemarkus February 12, 2023 14:15

k8s-ci-robot added the area/provider/openstack Issues or PRs related to openstack provider label Feb 12, 2023

zetaab mentioned this pull request Feb 12, 2023

Disable kops-configuration after package updates #15114

Closed

hakman requested review from justinsb and hakman February 12, 2023 14:26

hakman added the blocks-next label Feb 12, 2023

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2023

hakman requested changes Feb 12, 2023

View reviewed changes

pkg/kopscontrollerclient/client.go Outdated Show resolved Hide resolved

pkg/bootstrap/authenticate.go Outdated Show resolved Hide resolved

k8s-ci-robot assigned hakman Feb 12, 2023

exit gracefully if server already exists in k8s

8e6199f

zetaab force-pushed the exitgracefully branch from 91ef784 to 8e6199f Compare February 12, 2023 14:52

justinsb reviewed Feb 15, 2023

View reviewed changes

k8s-ci-robot assigned justinsb Feb 15, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 15, 2023

use http.StatusConflict

a765191

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2023

hakman approved these changes Feb 20, 2023

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 20, 2023

hakman removed the blocks-next label Feb 20, 2023

hakman removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 20, 2023

k8s-ci-robot merged commit 511f32a into kubernetes:master Feb 20, 2023

zetaab deleted the exitgracefully branch February 20, 2023 15:21

hakman mentioned this pull request Feb 23, 2023

Automated cherry pick of #15069: openstack verifier: support IPv6 #15138: exit gracefully if server already exists in k8s #15178

Merged

hakman mentioned this pull request May 27, 2023

kops-controller: Return http.StatusConflict when node already exists #15453

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exit nodeup gracefully if server already exists in k8s #15138

exit nodeup gracefully if server already exists in k8s #15138

zetaab commented Feb 12, 2023 •

edited

Loading

hakman commented Feb 12, 2023

justinsb Feb 15, 2023

justinsb commented Feb 15, 2023

justinsb commented Feb 15, 2023

hakman commented Feb 15, 2023

k8s-ci-robot commented Feb 20, 2023

exit nodeup gracefully if server already exists in k8s #15138

exit nodeup gracefully if server already exists in k8s #15138

Conversation

zetaab commented Feb 12, 2023 • edited Loading

hakman commented Feb 12, 2023

justinsb Feb 15, 2023

Choose a reason for hiding this comment

justinsb commented Feb 15, 2023

justinsb commented Feb 15, 2023

hakman commented Feb 15, 2023

k8s-ci-robot commented Feb 20, 2023

zetaab commented Feb 12, 2023 •

edited

Loading