Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exit nodeup gracefully if server already exists in k8s #15138

Merged
merged 2 commits into from
Feb 20, 2023

Conversation

zetaab
Copy link
Member

@zetaab zetaab commented Feb 12, 2023

Original issue for this is #15057 but it was reverted due to other issues in #15129

some discussion available in #15114 (review)

I have verified that this solution works in OpenStack. When existing instance tries to join again to cluster (it does that when rebooting server OR restarting kops-configuration.service):

kops-controller log:

kops-controller-t4b8j kops-controller I0212 14:11:02.063861       1 server.go:148] 10.2.96.248:59480: instance already exists in kubernetes cluster

node log

Feb 12 14:11:03 nodes-helpa-fctnib nodeup[669]: I0212 14:11:03.187949     669 client.go:156] bootstrap request responded with code 204
Feb 12 14:11:03 nodes-helpa-fctnib systemd[1]: kops-configuration.service: Succeeded.
Feb 12 14:11:03 nodes-helpa-fctnib systemd[1]: Finished Run kOps bootstrap (nodeup).

So it works like planned. The only possible problem in this solution is that if someone do have older than 455 days https://github.com/kubernetes/kops/blob/master/cmd/kops-controller/pkg/server/server.go#L188 kubelet, the cert will expiry and it cannot renew the cert because it will be still part of the Kubernetes cluster (node is maybe in NotReady state). Currently the workaround for that is delete the node manually kubectl delete node... and restart it in cloudprovider (or kops-configuration.service in node itself). After that the node can request the new certificate. I am just hoping that people could update their clusters in time and they should NOT have 455 day old nodes in their cluster.

cc @justinsb @hakman

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 12, 2023
@zetaab zetaab changed the title exit gracefully if server already exists in k8s exit nodeup gracefully if server already exists in k8s Feb 12, 2023
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. area/kops-controller labels Feb 12, 2023
@k8s-ci-robot k8s-ci-robot added the area/provider/openstack Issues or PRs related to openstack provider label Feb 12, 2023
@hakman
Copy link
Member

hakman commented Feb 12, 2023

/hold for 1-2 days for more feedback

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2023
pkg/kopscontrollerclient/client.go Outdated Show resolved Hide resolved
pkg/bootstrap/authenticate.go Outdated Show resolved Hide resolved
@@ -142,6 +142,12 @@ func (s *Server) bootstrap(w http.ResponseWriter, r *http.Request) {

id, err := s.verifier.VerifyToken(ctx, r, r.Header.Get("Authorization"), body, s.opt.Server.UseInstanceIDForNodeName)
if err != nil {
// means that we should exit nodeup gracefully
if err == bootstrap.ErrAlreadyExists {
w.WriteHeader(http.StatusNoContent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-picking, but we could also do http.StatusConflict - that's what I've seen from GCP APIs at least.

@justinsb
Copy link
Member

One nit pick on the http status code, but I like this approach - a specific response, and exit with 0 👍

@justinsb
Copy link
Member

On the expiry side, I think we should figure out what behaviour we want here when the node already exists. IIUC this is mostly a security defense-in-depth, so as we beef up auth here (e.g. the node callback ) then we might not need it at all. Whatever we decide, I'd like for us to try to converge the various clouds so they have similar behvaior here.

Anyway, I don't think my nit on the status code is blocking ....

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 15, 2023
@hakman
Copy link
Member

hakman commented Feb 15, 2023

On the expiry side, I think we should figure out what behaviour we want here when the node already exists. IIUC this is mostly a security defense-in-depth, so as we beef up auth here (e.g. the node callback ) then we might not need it at all. Whatever we decide, I'd like for us to try to converge the various clouds so they have similar behvaior here.

It is also nice to know that the node config is sent only once and we don't update it dynamically on reboot, if something changes in the IG config.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 20, 2023
@hakman hakman removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 20, 2023
@k8s-ci-robot k8s-ci-robot merged commit 511f32a into kubernetes:master Feb 20, 2023
@zetaab zetaab deleted the exitgracefully branch February 20, 2023 15:21
k8s-ci-robot added a commit that referenced this pull request Feb 23, 2023
…15138-upstream-release-1.26

Automated cherry pick of #15069: openstack verifier: support IPv6
#15138: exit gracefully if server already exists in k8s
Shimiazoulai pushed a commit to spotinst/kubernetes-kops that referenced this pull request Jul 13, 2023
…-of-#15069-kubernetes#15138-upstream-release-1.26

Automated cherry pick of kubernetes#15069: openstack verifier: support IPv6
kubernetes#15138: exit gracefully if server already exists in k8s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kops-controller area/provider/openstack Issues or PRs related to openstack provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants