Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap #119778

Merged

Conversation

sanposhiho
Copy link
Member

@sanposhiho sanposhiho commented Aug 6, 2023

What type of PR is this?

/kind bug
/triage accepted
/priority important-soon

What this PR does / why we need it:

When PreFilter returns UnschedulableAndUnresolvable, we don't need to run the preemption, but we run.
This unexpected preemption could cause errors from the preemption by trying to read non-exist PreFilter data.

As @Huang-Wei pointed out, we didn't run the preemption in such cases, but #110894 introduced this bug.


#119777 adds integ test for this scenario and #119780 proves this patch fixing the bug.

Which issue(s) this PR fixes:

Fixes: #119782

Special notes for your reviewer:

We should cherry-pick this to v1.26 and v1.27.

Does this PR introduce a user-facing change?

Fixed a 1.26 regression scheduling bug by ensuring that preemption is skipped when a PreFilter plugin returns `UnschedulableAndUnresolvable`

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Aug 6, 2023
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.28 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.28.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Sat Aug 5 22:30:03 UTC 2023.

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 6, 2023
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 6, 2023
@sanposhiho
Copy link
Member Author

will soon add UTs

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 6, 2023
@sanposhiho sanposhiho force-pushed the bugfix-unschedulableandunresolvable branch from e00983f to 3320a0a Compare August 6, 2023 06:05
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 6, 2023
@sanposhiho
Copy link
Member Author

/cc @Huang-Wei

@sanposhiho sanposhiho force-pushed the bugfix-unschedulableandunresolvable branch from 3320a0a to e08e6e9 Compare August 6, 2023 09:41
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 6, 2023
Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could you incorporate the integration test in (#119777) to this PR?

@sanposhiho
Copy link
Member Author

Sure, added.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 7, 2023
Comment on lines 117 to 122
if _, err := state.Read(tokenFilterName); err != nil {
// Should be bug.
// We don't store state only when PreFilter returned Skip.
// In other words, if reaches here, PreFilter returned Skip but somehow Filter is called.
return framework.NewStatus(framework.Error, err.Error())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Place this to AddPod/RemovePod would be more close to real world usage.

Unresolvable bool
Tokens int
PreFilterStatus *framework.Status
PreFilterResult *framework.PreFilterResult
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter is not used in any test?

}

func (fp *tokenFilter) PreFilter(ctx context.Context, state *framework.CycleState, pod *v1.Pod) (*framework.PreFilterResult, *framework.Status) {
if pod.Name == fp.PreFilterTargetPodName && fp.PreFilterStatus.IsSkip() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we should only check is: (not quite a fan to check the pod name if we can achieve this by the following logic)

if fp.PreFilterStatus.Code() == framework.UnschedulableAndUnresolvable || fp.PreFilterStatus.IsSkip() {
	return nil, fp.PreFilterStatus
}


state.Write(tokenFilterName, &stateData{})

if pod.Name == fp.PreFilterTargetPodName {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, I think we're able to get rid of PreFilterTargetPodName

v1.ResourceMemory: *resource.NewQuantity(200, resource.DecimalSI)},
},
}),
preemptedPodIndexes: map[int]struct{}{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is nil, it seems the existing test logic is problematic: (w/o the fix, the test would also pass)

The else loop should be employing a "Eventually" semantics; ohterwise it would always pass. In other words, a) the p in else block should be obtained from api server and b) we should wait a bit to ensure the preemption doesn't happen. But this is an existing test issue.

// Wait for preemption of pods and make sure the other ones are not preempted.
for i, p := range pods {
	if _, found := test.preemptedPodIndexes[i]; found { ... } else {
		if p.DeletionTimestamp != nil {
			t.Errorf("Didn't expect pod %v to get preempted.", p.Name)
		}
	}
}

@Huang-Wei
Copy link
Member

Given the integration test has been wonky for a while, and the UT should suffice to verify the fix. I'm inclined with a separate PR to polish the integration test: including

  1. correct its semantics first (mentioned in fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap #119778 (comment))
  2. add a new test to cover PreFilter plugin returning UnschedulableAndUnresolvable

// needed to avoid this copy.
for _, n := range allNodes {
diagnosis.NodeToStatusMap[n.Node().Name] = s
}
// Record the messages from PreFilter in Diagnosis.PreFilterMsg.
msg := s.Message()
diagnosis.PreFilterMsg = msg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this cause the message to be duplicated in the event/log?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. @sanposhiho could you double check the event/log is still rendered properly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good catch. I changed type.go so that we won't make duplicates in the message.

@@ -100,8 +100,11 @@ func waitForNominatedNodeName(cs clientset.Interface, pod *v1.Pod) error {
const tokenFilterName = "token-filter"

type tokenFilter struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment describing this plugin?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given it's likely this PR will proceed with UT only, I will add comment in #119769.

@alculquicondor
Copy link
Member

+1 on a quick fix with UTs that can be merged ASAP.

@alculquicondor
Copy link
Member

The effect here would be that the preemption fails with an error, causing the pod to be re-queued into backoff, instead of unschedulable. Correct?

@Huang-Wei
Copy link
Member

The effect here would be that the preemption fails with an error, causing the pod to be re-queued into backoff, instead of unschedulable. Correct?

In most cases, yes - just unnecessary cycles spent on preemption which should have been avoided.

But, with a plugin returning UnschedulableAndResolvable in PreFilter, and due to the regression, preemption may be able to find a node to host the pod by preempting pods (b/c dryrun preemption runs Filter only). In this edge case, it'd cause unexpected disruption to existing pods.

@alculquicondor
Copy link
Member

Hopefully we can merge before the rc.1 tomorrow

@sanposhiho sanposhiho force-pushed the bugfix-unschedulableandunresolvable branch from 9730596 to b008223 Compare August 12, 2023 06:59
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sanposhiho
Copy link
Member Author

sanposhiho commented Aug 12, 2023

@alculquicondor @Huang-Wei
Sorry, I didn't have much time on this for a few days.

As suggested,

  • add changes on type.go so that this PR won't make duplicates in the message.
      • add test for this.
  • remove the integration test change from this PR for now.

PTAL again.

@sanposhiho
Copy link
Member Author

/retest

@Huang-Wei
Copy link
Member

/release-note-edit release-note Fix a regression to ensure preemption is skipped when a PreFilter plugin returns UnschedulableAndUnresolvable

@Huang-Wei
Copy link
Member

@sanposhiho LGTM. Could you polish the release note a bit? like ^^

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 13, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 0b5316f8f51ce76a8524cab2db4a640bdba927ea

@sanposhiho
Copy link
Member Author

@Huang-Wei done.

@alculquicondor
Copy link
Member

Let's keep an eye for the end of the code freeze.
I think we could prepare the cherry-picks already.

@sftim
Copy link
Contributor

sftim commented Aug 14, 2023

Changelog suggestion

Fixed a scheduling bug by ensuring that preemption is skipped when a PreFilter plugin returns `UnschedulableAndUnresolvable`

@k8s-ci-robot k8s-ci-robot merged commit 719d1a8 into kubernetes:master Aug 16, 2023
12 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Aug 16, 2023
k8s-ci-robot added a commit that referenced this pull request Sep 6, 2023
…119778-upstream-release-1.26

Automated cherry pick of #119778: fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap
k8s-ci-robot added a commit that referenced this pull request Sep 6, 2023
…119778-upstream-release-1.28

Automated cherry pick of #119778: fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap
k8s-ci-robot added a commit that referenced this pull request Sep 6, 2023
…119778-upstream-release-1.27

Automated cherry pick of #119778: fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

the preemption happens even when PreFilter returns UnschedulableAndUnresolvable
5 participants