Skip to content

Commit

Permalink
Docs: Document handling eviction requests
Browse files Browse the repository at this point in the history
Document the way KubeVirt handles K8s' API-initiated eviction [1].

[1] https://kubernetes.io/docs/concepts/scheduling-eviction/api-eviction/

Signed-off-by: Orel Misan <omisan@redhat.com>
  • Loading branch information
orelmisan committed May 2, 2024
1 parent 2817318 commit 9c705b2
Showing 1 changed file with 97 additions and 0 deletions.
97 changes: 97 additions & 0 deletions docs/handling-eviction-requests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
Handling Eviction Requests
=

# Preface
The [Kubernetes](https://kubernetes.io/) API supports [API-initiated Eviction](https://kubernetes.io/docs/concepts/scheduling-eviction/api-eviction/) which allows to programmatically evict pods.
The API is used for example by:
- [kubectl drain](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/)
- [Descheduler](https://github.com/kubernetes-sigs/descheduler)

Since KubeVirt virtual machines are running inside `virt-launcher` pods - they are affected by Kubernetes' eviction mechanism.
This requires special handling on KubeVirt's side, since virtual machines eviction is a bit more complex than the average pod.
`Evacuation` is the term used by KubeVirt to describe the migration of a VMI as the result of `virt-launcher` pod eviction.

This document will describe how KubeVirt currently handles eviction requests.

# Eviction Strategies
A VirtualMachineInstance can have one of four Eviction Strategies. The eviction strategy is defined in the VMI spec, with a fallback to a cluster-wide definition in the KubeVirt CustomResource.

The eviction strategy affects the way the VirtualMachineInstance will be evacuated:

| Eviction Strategy | Meaning |
|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| None | No action will be taken, according to the specified 'RunStrategy' the VirtualMachine will be restarted or shutdown |
| LiveMigrate | The VirtualMachine will be migrated instead of being shutdown |
| LiveMigrateIfPossible | Same as `LiveMigrate` but only if the VirtualMachine is Live-Migratable, otherwise it will behave as `None` |
| External | The VirtualMachine will be protected by a PDB and vmi.Status.EvacuationNodeName will be set on eviction. This is mainly useful for cluster-api-provider-kubevirt (capk) which needs a way for VMI’s to be blocked from eviction, yet signal capk that eviction has been called on the VMI so the capk controller can handle tearing the VMI down |

# Pod Eviction Webhook
`virt-api` serves a [validating webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) which intercepts **all** eviction requests in the cluster:
```shell
kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io virt-api-validator -o yaml
```

The purpose of this webhook is to trigger VMI evacuation in cases where it is required.
The way the webhook triggers the evacuation is by setting the VMI's `Status.EvacuationNodeName` field to the node name it is currently running on, so the [evacuation controller](#evacuation-controller) will know it needs to migrate it to another node.

The webhook has the ability to:
1. Approve the request - so it could be further processed
2. Deny the request - the request will be declined without additional processing

The webhook admits eviction requests **before** `kube-api` checks them against [Pod Distribution Budget](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets) objects.

In case the pod is not a `virt-launcher` pod - the eviction request is approved.
Otherwise, depending on the VMI's eviction strategy and whether it is migrtable - the webhook will potentially mark the VMI for evacuation and approve or deny the eviction request:

| Eviction Strategy | Is VMI migratable | Is VMI marked for eviction | Does Webhook approve eviction | Webhook Response |
|-----------------------|-------------------|----------------------------|-------------------------------|------------------------|
| None | True/False | False | True | 200 - Eviction granted |
| LiveMigrate | True | True | True | 200 - Eviction granted |
| LiveMigrate | False | False | False | 429 - Eviction denied |
| LiveMigrateIfPossible | True | True | True | 200 - Eviction granted |
| LiveMigrateIfPossible | False | False | True | 200 - Eviction granted |
| External | True/False | True | True | 200 - Eviction granted |

> **Note**
> Since the webhook intercepts all eviction requests in the cluster, it is configured to be [ignored](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#failure-policy) in case kube-api fails to get a response from it.
> Ignored in this context means that the eviction request is considered to be approved by the webhook, and will be further checked against the PodDistributionBudget.
> Some virt-launcher pods should be protected from eviction even if the webhook fails, this is the reason PodDistributionBudget objects are required (described in the next section).
> In case the webhook is down, the virt-launcher pod will be protected from eviction by the PDB (if required), but the evacuation will **not** be triggered.
# Pod Distribution Budget
In case the `Pod Eviction Webhook` approved the eviction, kube-api checks whether a PDB protects the `virt-launcher` pod.
If there is a PDB protecting the `virt-launcher` pod - the eviction request is denied, otherwise it is approved and the pod is evicted.

In order for the evacuation of VMIs to happen in a controlled manner, KubeVirt protects part of the `virt-launcher` pods with a PDB which blocks eviction requests.

`virt-controller` has a `Disruption Budget Controller` which decides whether a `virt-launcher` pod should be protected based on the eviction strategy of its controlling VMI:

| Eviction Strategy | Is a PDB required |
|-----------------------|-------------------------------|
| None | False |
| LiveMigrate | True |
| LiveMigrateIfPossible | Only if the VMI is migratable |
| External | True |

> **Note**
> During a migration, the PDB that protects the source virt-launcher pod is expended by the migration controller to also protect the target pod.
# Eviction Approval Summery

The eviction request's initiator will observe one of the following responses:

| Eviction Strategy | Is VMI migratable | Is VMI marked for eviction | Does Webhook approve eviction | Does PDB allow eviction | Final Response |
|-----------------------|-------------------|----------------------------|-------------------------------|-------------------------|----------------------------------|
| None | True/False | False | True | True | 200 - Eviction granted |
| LiveMigrate | True | True | True | False | 429 - Eviction blocked by PDB |
| LiveMigrate | False | False | False | False | 429 - Eviction denied by webhook |
| LiveMigrateIfPossible | True | True | True | False | 429 - Eviction blocked by PDB |
| LiveMigrateIfPossible | False | False | True | True | 200 - Eviction granted |
| External | True/False | True | True | False | 429 - Eviction blocked by PDB |

To summarize:
1. The eviction request is granted only if both the webhook and the PDB allow them.
2. The eviction request's initiator might get a 429 response, but the VMI will be migrated in the background.

# Evacuation Controller
`virt-controller` has an evacuation controller which looks for potential VMIs to evict and tries to migrate them to another node.

0 comments on commit 9c705b2

Please sign in to comment.