-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a race in setting nominated node and the scheduling cycle after it. #72259
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bsalamat The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -490,6 +491,119 @@ func TestPreemptionStarvation(t *testing.T) { | |||
} | |||
} | |||
|
|||
// TestPreemptionRaces tests that other scheduling events and operations do not | |||
// race with the preemption process. | |||
func TestPreemptionRaces(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you succeed to run this test with an expected error? I mean without other code fixes.
I tried several times and they all passed without errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have managed to get it to fail, but given that there is race condition that is hard to reproduce, you need to set numRepitions high (100) and still sometimes run several times.
I tried finding a way to simulate late arrival of events, but I didn't find a reasonable way to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add that my reasons for adding it was to at least see a flaky test if a race condition exists.
/assign |
return | ||
} | ||
for i, np := range npm.nominatedPods[nnn] { | ||
if np.UID != p.UID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit can we reverse the logic, usually negation is harder to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -326,14 +288,13 @@ func (p *PriorityQueue) Add(pod *v1.Pod) error { | |||
} | |||
if p.unschedulableQ.get(pod) != nil { | |||
klog.Errorf("Error: pod %v/%v is already in the unschedulable queue.", pod.Namespace, pod.Name) | |||
p.deleteNominatedPodIfExists(pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Souldn’t we make sure that the pod in the unschedulable queue isn’t nominated anymore by calling a delete ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Miss p.nominatedPods.delete(pod)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think it's because at Line 297, p.nominatedPods.add(pod, "")
firstly deletes the pod and then add it to cache map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might either split the delete or rename the add to make code more straightforward when reading it @bsalamat ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @Huang-Wei noted, this is no longer needed as the add
function of nominated node struct ensures that there won't be duplicate entries. I think add
is still the best name for it. Any other suggestions?
// one instance of the pod. | ||
npm.delete(p) | ||
|
||
nnn := nodeName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace nnn by nominatedNodeName here, the intent will be much clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nnn
is a common pattern in other functions and is more idiomatic in Go than long names.
/retest |
"node1": {&highPriNominatedPod, &medPriorityPod, &unschedulablePod}, | ||
expectedNominatedPods := &nominatedPodMap{ | ||
nominatedPodToNode: map[types.UID]string{ | ||
medPriorityPod.UID: "node1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pod.UID
is node1
It doesn't look very suitable. pod1
is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node1
is not a Pod UID. This is a map from a pod UID to a node name. node1
is the node name.
…ed node name of a pod and scheduling cycle of other pods
aa018bd
to
01bf481
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bsalamat thanks for the PR!
Please squash last commit (3 commits in total is fine).
01bf481
to
b75672c
Compare
@Huang-Wei Done |
/lgtm |
…59-upstream-release-1.13 Automated cherry pick of #72259 upstream release 1.13
What type of PR is this?
/kind bug
What this PR does / why we need it:
It fixes the issue reported in #72124.
Which issue(s) this PR fixes:
Fixes #72124
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
/sig scheduling