Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a race in setting nominated node and the scheduling cycle after it. #72259

Merged
merged 3 commits into from
Dec 30, 2018

Conversation

bsalamat
Copy link
Member

What type of PR is this?
/kind bug

What this PR does / why we need it:
It fixes the issue reported in #72124.

Which issue(s) this PR fixes:

Fixes #72124

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Fix a race condition in the scheduler preemption logic that could cause nominatedNodeName of a pod not to be considered in one or more scheduling cycles.

/sig scheduling

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 21, 2018
@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Dec 21, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 21, 2018
@@ -490,6 +491,119 @@ func TestPreemptionStarvation(t *testing.T) {
}
}

// TestPreemptionRaces tests that other scheduling events and operations do not
// race with the preemption process.
func TestPreemptionRaces(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you succeed to run this test with an expected error? I mean without other code fixes.

I tried several times and they all passed without errors.

Copy link
Member Author

@bsalamat bsalamat Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have managed to get it to fail, but given that there is race condition that is hard to reproduce, you need to set numRepitions high (100) and still sometimes run several times.
I tried finding a way to simulate late arrival of events, but I didn't find a reasonable way to do so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add that my reasons for adding it was to at least see a flaky test if a race condition exists.

@Huang-Wei
Copy link
Member

/assign

@bsalamat bsalamat added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Dec 21, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Dec 21, 2018
return
}
for i, np := range npm.nominatedPods[nnn] {
if np.UID != p.UID {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit can we reverse the logic, usually negation is harder to reason about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -326,14 +288,13 @@ func (p *PriorityQueue) Add(pod *v1.Pod) error {
}
if p.unschedulableQ.get(pod) != nil {
klog.Errorf("Error: pod %v/%v is already in the unschedulable queue.", pod.Namespace, pod.Name)
p.deleteNominatedPodIfExists(pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Souldn’t we make sure that the pod in the unschedulable queue isn’t nominated anymore by calling a delete ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Miss p.nominatedPods.delete(pod) here?

Copy link
Member

@Huang-Wei Huang-Wei Dec 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I think it's because at Line 297, p.nominatedPods.add(pod, "") firstly deletes the pod and then add it to cache map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might either split the delete or rename the add to make code more straightforward when reading it @bsalamat ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @Huang-Wei noted, this is no longer needed as the add function of nominated node struct ensures that there won't be duplicate entries. I think add is still the best name for it. Any other suggestions?

// one instance of the pod.
npm.delete(p)

nnn := nodeName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace nnn by nominatedNodeName here, the intent will be much clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nnn is a common pattern in other functions and is more idiomatic in Go than long names.

@Huang-Wei
Copy link
Member

/retest

"node1": {&highPriNominatedPod, &medPriorityPod, &unschedulablePod},
expectedNominatedPods := &nominatedPodMap{
nominatedPodToNode: map[types.UID]string{
medPriorityPod.UID: "node1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pod.UID is node1 It doesn't look very suitable. pod1 is acceptable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node1 is not a Pod UID. This is a map from a pod UID to a node name. node1 is the node name.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 27, 2018
…ed node name of a pod and scheduling cycle of other pods
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 27, 2018
Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bsalamat thanks for the PR!

Please squash last commit (3 commits in total is fine).

@bsalamat
Copy link
Member Author

@Huang-Wei Done

@Huang-Wei
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 30, 2018
@k8s-ci-robot k8s-ci-robot merged commit 65f87b5 into kubernetes:master Dec 30, 2018
k8s-ci-robot added a commit that referenced this pull request Jan 6, 2019
…59-upstream-release-1.13

Automated cherry pick of #72259 upstream release 1.13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition in preemption logic of the scheduler when setting nominatedNodeName
5 participants