-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix two race issues in schedule_queue #81148
Fix two race issues in schedule_queue #81148
Conversation
/sig scheduling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data race is indeed a problem that we need to solve. But can we have a benchmark number for this change since this one adds a mutex to the important data structure in the scheduler
Yes, agree with your comment. I will do it. |
|
From the test results, the two issues are separate. |
We do not need add lock for |
35dde4a
to
a51d574
Compare
I have been tested and found it caused by the test code. Thank you for your reminder. |
@draveness I have modified the code so it seems that I don't need to continue testing the performance data. Thanks. |
Verified locally, this indeed solves the data race problem in the schedule queue tests. But I'm quite interested in the cause of the data race, the two tests use two different priority queue, it's not supposed to cause the problem, could you explain a little bit to me? |
In fact, you can see the reason from the description of the problem(#81148 (comment)): the two tests that are causing the problem are not directly responsible for the race, but they have a relationship with the logic in the main code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
@@ -351,13 +351,15 @@ func TestPriorityQueue_AddUnschedulableIfNotPresent_Backoff(t *testing.T) { | |||
q.AddUnschedulableIfNotPresent(unschedulablePod, oldCycle) | |||
} | |||
|
|||
q.lock.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to write a function(similar to L109) to wrap the lock and getting pod from podBackoffQ
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I prefer to keep the test simple and verify the data like L358.
@@ -1055,8 +1057,11 @@ func TestHighPriorityFlushUnschedulableQLeftover(t *testing.T) { | |||
|
|||
addOrUpdateUnschedulablePod(q, &highPod) | |||
addOrUpdateUnschedulablePod(q, &midPod) | |||
|
|||
q.lock.Lock() | |||
q.unschedulableQ.podInfoMap[util.GetPodFullName(&highPod)].Timestamp = time.Now().Add(-1 * unschedulableQTimeInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addOrUpdateUnschedulablePod
already does locking. Instead of applying locking here, I suggest to modify addOrUpdateUnschedulablePod
to accept a PodInfo input instead of Pod, therefore you can first create a PodInfo and avoid updating the podInfoMap again (in L1062-1063).
E.g.,
change
func addOrUpdateUnschedulablePod(p *PriorityQueue, pod *v1.Pod) {
p.lock.Lock()
defer p.lock.Unlock()
p.unschedulableQ.addOrUpdate(p.newPodInfo(pod))
}
to
func addOrUpdateUnschedulablePod(p *PriorityQueue, podInfo *framework.PodInfo) {
p.lock.Lock()
defer p.lock.Unlock()
p.unschedulableQ.addOrUpdate(podInfo)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
79819ca
to
ca61b79
Compare
/assign @Huang-Wei @bsalamat |
/test pull-kubernetes-kubemark-e2e-gce-big |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Thanks, @wgliang !
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Huang-Wei, wgliang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Run:
go test k8s.io/kubernetes/pkg/scheduler/internal/queue --race --count=50
You can get:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: