Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add(nodevolumelimits): return UnschedulableAndUnresolvable when PVC is not found #121952

Merged
merged 1 commit into from Feb 6, 2024

Conversation

sanposhiho
Copy link
Member

@sanposhiho sanposhiho commented Nov 19, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

When PVC is not found, currently, we return Error from the plugin.
When error is returned from the plugin, the scheduler keeps retrying this Pod's scheduling because the scheduler doesn't put pods failed by Error status into unschedQ.

This PR makes nodevolumelimits return UnschedulableAndUnresolvable when PVC is not found, and register PVC added event in EventsToRegister so that the scheduling queue retry the Pod only when PVC is created.
In the future, it'll be wiser via QHint as part of #118893 when to retry scheduling, but this PR doesn't do that for now.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

The scheduler retries Pods, which are failed by nodevolumelimits due to not found PVCs, only when new PVCs are added.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 19, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Nov 19, 2023
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 19, 2023
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 19, 2023
@sanposhiho
Copy link
Member Author

/hold

To go thru two reviews.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 19, 2023
@@ -78,6 +79,7 @@ func (pl *CSILimits) EventsToRegister() []framework.ClusterEventWithHint {
return []framework.ClusterEventWithHint{
{Event: framework.ClusterEvent{Resource: framework.CSINode, ActionType: framework.Add}},
{Event: framework.ClusterEvent{Resource: framework.Pod, ActionType: framework.Delete}},
{Event: framework.ClusterEvent{Resource: framework.PersistentVolumeClaim, ActionType: framework.Add}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take on implementing it as part of #118893.

@sanposhiho
Copy link
Member Author

/hold

waiting for #121953

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 15, 2023
@sanposhiho
Copy link
Member Author

/unhold

#121953 is merged.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 16, 2023
@sanposhiho
Copy link
Member Author

/hold

To go thru two reviews.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 19, 2023
@sanposhiho
Copy link
Member Author

@kubernetes/sig-scheduling-approvers

Sorry for rushing, can anyone take a review for this? It'll block #121508.

@ahg-g
Copy link
Member

ahg-g commented Jan 24, 2024

/assign @msau42

@ahg-g
Copy link
Member

ahg-g commented Jan 24, 2024

/lgtm

I think this makes sense, but it would be good to have @msau42 double check.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 24, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 891a973a766cc81aeced0d606ef9f6e2ea3e37f2

Copy link
Member

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought is can we return Pending when PVC not found, it's similar to DRA waiting for allocation.

@alculquicondor
Copy link
Member

/approve
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 6, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit e566bd7 into kubernetes:master Feb 6, 2024
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Feb 6, 2024
@sanposhiho
Copy link
Member Author

@kerthcet You're correct. We can change all of this kind of not-found error to generate Pending state instead of UnschedulableAndUnresolvable - let's track that at #123227.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants