Scheduler pre-binding can cause race conditions with automated empty node removal #125491

BigDarkClown · 2024-06-13T14:03:20Z

What happened?

In a Google Kubernetes Engine (GKE) environment, a pod was requesting a large Persistent Volume Claim (PVC). After the appropriate node was identified for the pod, the pod became stuck in the prebinding stage for several minutes while the volume provisioning process completed. Since the node name was not assigned to the pod during this time, the Cluster Autoscaler perceived the node as unoccupied. Consequently, the Cluster Autoscaler initiated a scale-down of the node, unaware that the pending pod was scheduled to run there.

What did you expect to happen?

I would expect that the Scheduler would communicate the intended binding of the pod to the identified node. This would enable the Cluster Autoscaler to recognize that the node is not actually empty and prevent it from being scaled down prematurely.

How can we reproduce it (as minimally and precisely as possible)?

The issue arose in a large GKE cluster with pods requesting substantial PVCs, making replication potentially challenging. However, the race condition within the Scheduler is evident.

Anything else we need to know?

No response

Kubernetes version

1.27

Cloud provider

GKE

OS version

No response

Install tools

No response

Container runtime (CRI) and version (if applicable)

No response

Related plugins (CNI, CSI, ...) and versions (if applicable)

No response

k8s-ci-robot · 2024-06-13T14:03:30Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

neolit123 · 2024-06-13T15:29:11Z

/sig scheduling

BigDarkClown added the kind/bug Categorizes issue or PR as related to a bug. label Jun 13, 2024

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 13, 2024

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler pre-binding can cause race conditions with automated empty node removal #125491

Scheduler pre-binding can cause race conditions with automated empty node removal #125491

BigDarkClown commented Jun 13, 2024

k8s-ci-robot commented Jun 13, 2024

neolit123 commented Jun 13, 2024

Scheduler pre-binding can cause race conditions with automated empty node removal #125491

Scheduler pre-binding can cause race conditions with automated empty node removal #125491

Comments

BigDarkClown commented Jun 13, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Jun 13, 2024

neolit123 commented Jun 13, 2024