-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exit nodeup gracefully if server already exists in k8s #15138
Conversation
/hold for 1-2 days for more feedback |
91ef784
to
8e6199f
Compare
@@ -142,6 +142,12 @@ func (s *Server) bootstrap(w http.ResponseWriter, r *http.Request) { | |||
|
|||
id, err := s.verifier.VerifyToken(ctx, r, r.Header.Get("Authorization"), body, s.opt.Server.UseInstanceIDForNodeName) | |||
if err != nil { | |||
// means that we should exit nodeup gracefully | |||
if err == bootstrap.ErrAlreadyExists { | |||
w.WriteHeader(http.StatusNoContent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit-picking, but we could also do http.StatusConflict - that's what I've seen from GCP APIs at least.
One nit pick on the http status code, but I like this approach - a specific response, and exit with 0 👍 |
On the expiry side, I think we should figure out what behaviour we want here when the node already exists. IIUC this is mostly a security defense-in-depth, so as we beef up auth here (e.g. the node callback ) then we might not need it at all. Whatever we decide, I'd like for us to try to converge the various clouds so they have similar behvaior here. Anyway, I don't think my nit on the status code is blocking .... /lgtm |
It is also nice to know that the node config is sent only once and we don't update it dynamically on reboot, if something changes in the IG config. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hakman The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…-of-#15069-kubernetes#15138-upstream-release-1.26 Automated cherry pick of kubernetes#15069: openstack verifier: support IPv6 kubernetes#15138: exit gracefully if server already exists in k8s
Original issue for this is #15057 but it was reverted due to other issues in #15129
some discussion available in #15114 (review)
I have verified that this solution works in OpenStack. When existing instance tries to join again to cluster (it does that when rebooting server OR restarting kops-configuration.service):
kops-controller log:
node log
So it works like planned. The only possible problem in this solution is that if someone do have older than 455 days https://github.com/kubernetes/kops/blob/master/cmd/kops-controller/pkg/server/server.go#L188 kubelet, the cert will expiry and it cannot renew the cert because it will be still part of the Kubernetes cluster (node is maybe in
NotReady
state). Currently the workaround for that is delete the node manuallykubectl delete node...
and restart it in cloudprovider (or kops-configuration.service in node itself). After that the node can request the new certificate. I am just hoping that people could update their clusters in time and they should NOT have 455 day old nodes in their cluster.cc @justinsb @hakman