Skip to content

Commit

Permalink
Mark webhook and controller as safe-to-evict
Browse files Browse the repository at this point in the history
The safe-to-evict annotation tells the cluster autoscaler whether the
pod can be evicted to allow the node it's on to scale down.

This was set to false (by me!) 2 years ago in tektoncd@fc6ef39
to prevent service unreliability during scale-down events. If the
no webhook replicas are available, users can't create/update/delete
Tekton objects; if no controller replicas are available, status updates
from Pod events, etc., won't be processed.

Unfortunately, blocking node eviction means the node that the pod(s) get
scheduled to can't be scaled down. Furthermore, the nodes can't be fully
drained when updating the cluster. This can leave a cluster in a
mid-upgrade state that can make issues difficult to diagnose and reason
about.

With this change, a cluster scale-down event might cause temporary
service unreliability with the default single-replica configuration. As
with tektoncd#3787 if a user/operator wants to prevent this, they should
configure more replicas for HA.

(cherry picked from commit 5350069)
Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
  • Loading branch information
imjasonh authored and vdemeester committed Aug 17, 2021
1 parent 8d31e08 commit 1fd62b7
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 4 deletions.
2 changes: 0 additions & 2 deletions config/controller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,6 @@ spec:
app.kubernetes.io/part-of: tekton-pipelines
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
labels:
app.kubernetes.io/name: controller
app.kubernetes.io/component: controller
Expand Down
2 changes: 0 additions & 2 deletions config/webhook.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,6 @@ spec:
app.kubernetes.io/part-of: tekton-pipelines
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
labels:
app.kubernetes.io/name: webhook
app.kubernetes.io/component: webhook
Expand Down
4 changes: 4 additions & 0 deletions docs/enabling-ha.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@ spec:
minReplicas: 1
```

By default, the Webhook deployment is _not_ configured to block a [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) from scaling down the node that's running the only replica of the deployment using the `cluster-autoscaler.kubernetes.io/safe-to-evict` annotation.
This means that during node drains, the Webhook might be unavailable temporarily, during which time Tekton resources can't be created, updated or deleted.
To avoid this, you can add the `safe-to-evict` annotation set to `false` to block node drains during autoscaling, or, better yet, configure multiple replicas of the Webhook deployment.

### Avoiding Disruptions

To avoid the Webhook Service becoming unavailable during node unavailability (e.g., during node upgrades), you can ensure that a minimum number of Webhook replicas are available at time by defining a [`PodDisruptionBudget`](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) which sets a `minAvailable` greater than zero:
Expand Down

0 comments on commit 1fd62b7

Please sign in to comment.