Unschedulable Pod might take a long time to get the condition set #109796

alculquicondor · 2022-05-04T18:10:13Z

What happened?

If there is a connection failure when updating the Pod status, we don't retry.
Furthermore, due to #108761, we won't have another chance until much later.

The problem is that if we don't mark the Pod as Unschedulable, other controllers (such as cluster-autoscaler) wouldn't react to these pods.

What did you expect to happen?

We should have stronger guarantees to mark a Pod as Unschedulable.

How can we reproduce it (as minimally and precisely as possible)?

You need an unschedulable Pod (for example, a Pod with a non matching node affinity) and a flaky connection to apiserver.

Anything else we need to know?

No response

Kubernetes version

master

This is particularly problematic in 1.24, due to #108761.

Cloud provider

Any

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot · 2022-05-04T18:10:21Z

@alculquicondor: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alculquicondor · 2022-05-04T18:11:54Z

/sig scheduling

ahg-g · 2022-05-04T19:43:47Z

That is bad, retrying in place on transient errors should probably be ok.

sanposhiho · 2022-05-05T08:18:55Z

I'll address it.
/assign

alculquicondor · 2022-05-05T14:53:34Z

Thanks!

alculquicondor added the kind/bug Categorizes issue or PR as related to a bug. label May 4, 2022

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 4, 2022

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 4, 2022

alculquicondor mentioned this issue May 4, 2022

Flaky scheduler integration tests #109783

Closed

k8s-ci-robot assigned sanposhiho May 5, 2022

sanposhiho mentioned this issue May 5, 2022

Retry when it fails to update pods status on scheduling loop #109832

Merged

k8s-ci-robot closed this as completed in #109832 Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unschedulable Pod might take a long time to get the condition set #109796

Unschedulable Pod might take a long time to get the condition set #109796

alculquicondor commented May 4, 2022 •

edited

k8s-ci-robot commented May 4, 2022

alculquicondor commented May 4, 2022

ahg-g commented May 4, 2022

sanposhiho commented May 5, 2022

alculquicondor commented May 5, 2022

Unschedulable Pod might take a long time to get the condition set #109796

Unschedulable Pod might take a long time to get the condition set #109796

Comments

alculquicondor commented May 4, 2022 • edited

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented May 4, 2022

alculquicondor commented May 4, 2022

ahg-g commented May 4, 2022

sanposhiho commented May 5, 2022

alculquicondor commented May 5, 2022

alculquicondor commented May 4, 2022 •

edited