Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unschedulable Pod might take a long time to get the condition set #109796

Closed
alculquicondor opened this issue May 4, 2022 · 5 comments · Fixed by #109832
Closed

Unschedulable Pod might take a long time to get the condition set #109796

alculquicondor opened this issue May 4, 2022 · 5 comments · Fixed by #109832
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@alculquicondor
Copy link
Member

alculquicondor commented May 4, 2022

What happened?

If there is a connection failure when updating the Pod status, we don't retry.
Furthermore, due to #108761, we won't have another chance until much later.

The problem is that if we don't mark the Pod as Unschedulable, other controllers (such as cluster-autoscaler) wouldn't react to these pods.

What did you expect to happen?

We should have stronger guarantees to mark a Pod as Unschedulable.

How can we reproduce it (as minimally and precisely as possible)?

You need an unschedulable Pod (for example, a Pod with a non matching node affinity) and a flaky connection to apiserver.

Anything else we need to know?

No response

Kubernetes version

master

This is particularly problematic in 1.24, due to #108761.

Cloud provider

Any

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@alculquicondor alculquicondor added the kind/bug Categorizes issue or PR as related to a bug. label May 4, 2022
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 4, 2022
@k8s-ci-robot
Copy link
Contributor

@alculquicondor: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alculquicondor
Copy link
Member Author

/sig scheduling

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 4, 2022
@ahg-g
Copy link
Member

ahg-g commented May 4, 2022

That is bad, retrying in place on transient errors should probably be ok.

@sanposhiho
Copy link
Member

I'll address it.
/assign

@alculquicondor
Copy link
Member Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants