Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade: Best effort drain when upgrading single node cluster #3447

Merged
merged 6 commits into from
Jul 15, 2021

Conversation

TeddyAndrieux
Copy link
Collaborator

Component:

'lifecycle'

Context:

See: #3445

Summary:

When upgrading a cluster with a single control plane node, first do a "best-effort" drain that will evict all Pods but does not retry if some Pods cannot be evicted instead continue the "classic" upgrade process

NOTE: We do not explicitly uncordon the node at the end, as it's automatically handled by the "deploy_node" orchestrate


Fixes: #3445

Since we always run (even during upgrade) with Salt Python3 and a recent
version of Jinja we can use `loop.previtem` to get the previous item
from the loop that can be used as "requisite" for salt state
Even if the kube-apiserver is ready, apiserver may not be available
because of ... reasons (e.g.: etcd not yet fully ready), in order to
prevent a failure in next upgrade step, let's make sure apiserver is
healthy with a query on apiserver.
Also wait for etcd container before apiserver as apiserver cannot work
without etcd
By default with kubeadm config kube-apiserver bind on all
address `0.0.0.0` even if it's bind to a single one, this commit just
bind APIServer to the Control Plane IP (which is the advertise IP) so
APIServer is only reachable on Control Plane IP and no longer reachable
on `127.0.0.1:6443`
Since salt drain module use kubernetes salt module as part of MetalK8s
to retrieve objects and try to retrieve controller of each pod, it may
happens that the controller object is not known by kubernetes salt
module so we are unable to get the controller object.
Instead of having an ugly traceback for this just consider we do not
manage the controller so if `force=True` we just evict the Pod as every
other "classic Pod"
In some case we may want to just do a "best effort" drain, so that we
do not hang if unable to evict a specific pod for whatever reason
When running in single node cluster we want to drain the node so that we
have as less pod running on the node as possible.

Fixes: #3445
@TeddyAndrieux TeddyAndrieux requested a review from a team July 13, 2021 16:53
@bert-e
Copy link
Contributor

bert-e commented Jul 13, 2021

Hello teddyandrieux,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Jul 13, 2021

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

Copy link
Contributor

@sayf-eddine-scality sayf-eddine-scality left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@TeddyAndrieux
Copy link
Collaborator Author

/approve

@bert-e
Copy link
Contributor

bert-e commented Jul 15, 2021

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/2.10

The following branches will NOT be impacted:

  • development/2.0
  • development/2.1
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Jul 15, 2021

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/2.10

The following branches have NOT changed:

  • development/2.0
  • development/2.1
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

Please check the status of the associated issue None.

Goodbye teddyandrieux.

@bert-e bert-e merged commit 5db464c into development/2.10 Jul 15, 2021
@bert-e bert-e deleted the bugfix/workaround-after-apiserver-upgrade branch July 15, 2021 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Single node "best-effort drain" during upgrade
3 participants