Repeated Pod creation and eviction during DaemonSet rolling update(surge > 1) when Node is under pressure #124388

sujoshua · 2024-04-18T17:35:57Z

What happened?

When the update strategy for a DaemonSet in Kubernetes is set to RollingUpdate and maxSurge is greater than 1, I've noticed an issue where, if a Node's status is under pressure, the system repeatedly creates and then evicts Pods. This behavior creates a lot of unnecessary churn and could potentially cause problems with system resources.

What did you expect to happen?

like the behavior when option maxSurge is 0

How can we reproduce it (as minimally and precisely as possible)?

produce a diskpressure, and then deploy this

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nginx-daemonset
spec:
  selector:
    matchLabels:
      name: nginx
  template:
    metadata:
      labels:
        name: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
  updateStrategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

Anything else we need to know?

Actually, I've found the problem.

When maxSurge == 1, these codes work perfectly. Because we run into this and then directly return.

But when maxSurge > 1, it passes that, and run into this. Badly, it causes node be added to nodesNeedingDaemonPods.

Maybe if we are inBackoff, we should break directly ?

Kubernetes version

main

Cloud provider

no

OS version

No response

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-04-18T17:36:07Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sujoshua · 2024-04-18T17:38:50Z

/sig scheduling

kannon92 · 2024-04-18T20:18:46Z

/remove-sig scheduling
/sig node

kannon92 · 2024-04-18T20:19:21Z

/remove-sig node
/sig apps

sujoshua added the kind/bug Categorizes issue or PR as related to a bug. label Apr 18, 2024

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 18, 2024

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 18, 2024

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Apr 18, 2024

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed sig/node Categorizes an issue or PR as relevant to SIG Node. labels Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated Pod creation and eviction during DaemonSet rolling update(surge > 1) when Node is under pressure #124388

Repeated Pod creation and eviction during DaemonSet rolling update(surge > 1) when Node is under pressure #124388

sujoshua commented Apr 18, 2024 •

edited

k8s-ci-robot commented Apr 18, 2024

sujoshua commented Apr 18, 2024

kannon92 commented Apr 18, 2024

kannon92 commented Apr 18, 2024

Repeated Pod creation and eviction during DaemonSet rolling update(surge > 1) when Node is under pressure #124388

Repeated Pod creation and eviction during DaemonSet rolling update(surge > 1) when Node is under pressure #124388

Comments

sujoshua commented Apr 18, 2024 • edited

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Apr 18, 2024

sujoshua commented Apr 18, 2024

kannon92 commented Apr 18, 2024

kannon92 commented Apr 18, 2024

sujoshua commented Apr 18, 2024 •

edited