Change pod `NotReady`/`Unreachable` tolerations from `300s` to something much smaller, e.g. `60s` #7689

vlerenc · 2023-03-22T15:01:18Z

What would you like to be added:

The KAPI's --default-not-ready-toleration-seconds and --default-unreachable-toleration-seconds options define how fast to evict pods from nodes whose Ready status condition is either Unknown (node status unknown, a.k.a unreachable) or False (kubelet not ready). This can also be overridden individually per pod.

We do not make use of it, so the default of 300s applies but we probably should set it to something closer to 0s:

Generally for seeds (KAPI setting or for all pods) and also...
For our shoot add-ons (for all our pods in kube-system)

Why is this needed:

We saw during "zone outage simulations" that recovery happens very slowly. It takes 5m for KCM to evict pods that are sitting on dead nodes.

The text was updated successfully, but these errors were encountered:

Sallyan · 2023-04-17T15:01:45Z

- For seeds (KAPI setting)
Like create seed cluster with below configuration

  kubernetes:
    kubeAPIServer:
      defaultNotReadyTolerationSeconds: 60
      defaultUnreachableTolerationSeconds: 60

Then the afterwards created pods will have the automatically added tolerations node.kubernetes.io/not-ready and node.kubernetes.io/unreachable with tolerationSeconds=60
It will take effect for the extensions, garden components (pods in garden namespace), and shoot namespace

- For seeds (selected components) and shoot add-ons (pods in kube-system)
Probably could create kind of mutating admission webhook which will add the tolerations to pods.

@vlerenc WDYT?

vlerenc · 2023-04-17T15:23:41Z

@Sallyan Yes, I was thinking about the web hook we already have for HA: https://github.com/gardener/gardener/blob/master/docs/development/high-availability.md#convenient-application-of-these-rules

So, maybe we want to make both toleration settings configurable in the ManagedSeed for seeds (or hardcode it to 60s at KAPI level), but we need a specific solution anyway for the shoot add-ons (as we cannot and should not force our end users to different KAPI toleration settings). If that's the case, maybe we want the web hook to do it in all cases? 🤷‍♂️

Sallyan · 2023-04-19T12:41:31Z

Agree, we keep in mind of customer autonomy, customer should decide the pod toleration settings for their own workload.
And it is easy and configurable in shoot manifest to update KAPI default toleration time, just adding below lines in shoot yaml.

  kubernetes:
    kubeAPIServer:
      defaultNotReadyTolerationSeconds: 60
      defaultUnreachableTolerationSeconds: 60

Personally I prefer to use one webhook for seeds to update pod toleration of all pods and another webhook on shoot just to update pod toleration in kube-system namespace.
We already have many nice webhooks of GRM (Gardener Resource Manager) [link]
Two interesting ones:

systemcomponentsconfig: will set spec.nodeSelector and spec.tolerations on system components pods
It adds following field:

"worker.gardener.cloud/system-components": "true"

highavailabilityconfig
It sets fields .spec.replicas .spec.template.spec.affinity and .spec.template.spec.topologySpreadConstraints for HA based on the failure tolerance type and the component type

Maybe we can extend the systemcomponentsconfig webhook to update the tolerationSeconds which key is node.kubernetes.io/not-ready or node.kubernetes.io/unreachable of pod .spec.tolerations

[
  {
    "effect": "NoExecute",
    "key": "node.kubernetes.io/not-ready",
    "operator": "Exists",
    "tolerationSeconds": 300
  },
  {
    "effect": "NoExecute",
    "key": "node.kubernetes.io/unreachable",
    "operator": "Exists",
    "tolerationSeconds": 300
  }
]

For seeds, probably write a new webhook to update all pods toleration.

func (h *Handler) Default(_ context.Context, obj runtime.Object) error {
	pod, ok := obj.(*corev1.Pod)
	if !ok {
		return fmt.Errorf("expected *corev1.Pod but got %T", obj)
	}

    // Modify the Pod tolerations to add a new default toleration
    tolerations := append(pod.Spec.Tolerations, corev1.Toleration{
        Key:      "node.kubernetes.io/not-ready",
        Operator: corev1.TolerationOpExists,
        Effect:   corev1.TaintEffectNoExecute,
        TolerationSeconds: new(int64),
    })
    pod.Spec.Tolerations = tolerations
}
....

rfranzke · 2023-04-25T09:34:16Z

Why would we write a new webhook? @vlerenc already suggested to use the existing HA webhook to specify these settings, and I agree that this makes sense. It is active in both seeds and shoots, so it seems a good fit.

timuthy · 2023-04-27T05:49:15Z

/assign

vlerenc changed the title ~~Change pod NotReady/Unreachable tolerations from 300s to 0s~~ Change pod NotReady/Unreachable tolerations from 300s to something closer to 0s Mar 23, 2023

vlerenc mentioned this issue Mar 23, 2023

☂️ [GEP-20] Highly Available Seed and Shoot Clusters #6529

Closed

56 tasks

vlerenc changed the title ~~Change pod NotReady/Unreachable tolerations from 300s to something closer to 0s~~ Change pod NotReady/Unreachable tolerations from 300s to something much smaller, e.g. 60s Apr 3, 2023

gardener-prow bot assigned timuthy Apr 27, 2023

timuthy mentioned this issue May 2, 2023

Add options to configure {NotReady,Unreachable}TolerationSeconds #7861

Merged

gardener-prow bot closed this as completed in #7861 May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change pod `NotReady`/`Unreachable` tolerations from `300s` to something much smaller, e.g. `60s` #7689

Change pod `NotReady`/`Unreachable` tolerations from `300s` to something much smaller, e.g. `60s` #7689

vlerenc commented Mar 22, 2023 •

edited

Sallyan commented Apr 17, 2023

vlerenc commented Apr 17, 2023

Sallyan commented Apr 19, 2023

rfranzke commented Apr 25, 2023

timuthy commented Apr 27, 2023

Change pod NotReady/Unreachable tolerations from 300s to something much smaller, e.g. 60s #7689

Change pod NotReady/Unreachable tolerations from 300s to something much smaller, e.g. 60s #7689

Comments

vlerenc commented Mar 22, 2023 • edited

Sallyan commented Apr 17, 2023

vlerenc commented Apr 17, 2023

Sallyan commented Apr 19, 2023

rfranzke commented Apr 25, 2023

timuthy commented Apr 27, 2023

Change pod `NotReady`/`Unreachable` tolerations from `300s` to something much smaller, e.g. `60s` #7689

Change pod `NotReady`/`Unreachable` tolerations from `300s` to something much smaller, e.g. `60s` #7689

vlerenc commented Mar 22, 2023 •

edited