Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make apiserver available at any point in time during upgrade #3444

Closed
TeddyAndrieux opened this issue Jul 12, 2021 · 1 comment
Closed

Make apiserver available at any point in time during upgrade #3444

TeddyAndrieux opened this issue Jul 12, 2021 · 1 comment
Assignees
Labels
complexity:medium Something that requires one or few days to fix kind:bug Something isn't working release:blocker An issue that blocks a release until resolved topic:lifecycle Issues related to upgrade or downgrade of MetalK8s

Comments

@TeddyAndrieux
Copy link
Collaborator

TeddyAndrieux commented Jul 12, 2021

Component:

'salt', 'lifecycle', 'kubernetes'

What happened:

Due to a bug (:question: ) in kubelet, if apiserver is not available it may take a lot of time to re-schedule static pod depending on what is deployed in the cluster.

See: kubernetes/kubernetes#103658

Because of this MetalK8s upgrade may fail.

Resolution proposal (optional):

In order to avoid that kind of issue let's have kubelet always configured with an available APIServer endpoint.

Step to update APIServer:

  • Deploy a temporary apiserver Pod listening on "127.0.0.1:9443" (make sure it's ready and listening)
  • Reconfigure kubelet to use this "127.0.0.1:9443" endpoint
  • Restart kubelet
  • Update "real" apiserver manifest and apiserver proxy (if needed) (make sure it's ready and listening)
  • Reconfigure kubelet to use the apiserver proxy
  • Restart kubelet
  • Delete temporary apiserver

NOTE: We do not use port 6443 for "temp" apiserver as "real" apiserver is bind to 0.0.0.0:6443 in MetalK8s < 2.10

@TeddyAndrieux TeddyAndrieux added kind:bug Something isn't working topic:lifecycle Issues related to upgrade or downgrade of MetalK8s complexity:medium Something that requires one or few days to fix release:blocker An issue that blocks a release until resolved labels Jul 12, 2021
@TeddyAndrieux TeddyAndrieux added this to the MetalK8s 2.10.0 milestone Jul 12, 2021
@TeddyAndrieux TeddyAndrieux self-assigned this Jul 12, 2021
TeddyAndrieux added a commit that referenced this issue Jul 12, 2021
Due to (a bug in) kubelet that treat all pods the same way including
static pods and kubelet that do some query to apiserver to retrieve PVC
informations, if apiserver endpoint used by kubelet is unreachable it
may take a long time to schedule static pod as kubelet will timeout for
every PVC it need to query because apiserver is not available (because
part of the static pod not yet scheduled, ...)

Fixes: #3444
@TeddyAndrieux
Copy link
Collaborator Author

Implementation: 688b35d
It's not sufficient at the same may happen when upgrading kubelet, as all Pods may restart at this time

Close this one and bug should be workaround by this ticket #3445

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity:medium Something that requires one or few days to fix kind:bug Something isn't working release:blocker An issue that blocks a release until resolved topic:lifecycle Issues related to upgrade or downgrade of MetalK8s
Projects
None yet
Development

No branches or pull requests

1 participant