Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated Pod creation and eviction during DaemonSet rolling update(surge > 1) when Node is under pressure #124388

Open
sujoshua opened this issue Apr 18, 2024 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@sujoshua
Copy link

sujoshua commented Apr 18, 2024

What happened?

When the update strategy for a DaemonSet in Kubernetes is set to RollingUpdate and maxSurge is greater than 1, I've noticed an issue where, if a Node's status is under pressure, the system repeatedly creates and then evicts Pods. This behavior creates a lot of unnecessary churn and could potentially cause problems with system resources.

What did you expect to happen?

like the behavior when option maxSurge is 0

How can we reproduce it (as minimally and precisely as possible)?

produce a diskpressure, and then deploy this

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nginx-daemonset
spec:
  selector:
    matchLabels:
      name: nginx
  template:
    metadata:
      labels:
        name: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
  updateStrategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

Anything else we need to know?

Actually, I've found the problem.

When maxSurge == 1, these codes work perfectly. Because we run into this and then directly return.

But when maxSurge > 1, it passes that, and run into this. Badly, it causes node be added to nodesNeedingDaemonPods.

Maybe if we are inBackoff, we should break directly ?

Kubernetes version

main

Cloud provider

no

OS version

No response

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@sujoshua sujoshua added the kind/bug Categorizes issue or PR as related to a bug. label Apr 18, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 18, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sujoshua
Copy link
Author

/sig scheduling

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 18, 2024
@kannon92
Copy link
Contributor

/remove-sig scheduling
/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Apr 18, 2024
@kannon92
Copy link
Contributor

/remove-sig node
/sig apps

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed sig/node Categorizes an issue or PR as relevant to SIG Node. labels Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
Status: Needs Triage
Development

No branches or pull requests

3 participants