New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid dobule RLock() in cpumanager #62464

Merged
merged 1 commit into from Apr 23, 2018

Conversation

Projects
None yet
7 participants
@choury
Contributor

choury commented Apr 12, 2018

What this PR does / why we need it:

We met a deadlock when removing pod.
kubelet keeps logging:

Pod "xxxx" is terminated, but some containers are still running 

After debug, we found it stuck in SetDefaultCPUSet here while another goroutine are calling GetCPUSetOrDefault here.

According golang/go#15418, It is not safe to double RLock a RWMutex.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

none

Special notes for your reviewer:

Release note:

removed unsafe double RLock in cpumanager
@choury

This comment has been minimized.

Contributor

choury commented Apr 12, 2018

/ok-to-test

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Apr 12, 2018

@choury: you can't request testing unless you are a kubernetes member.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jennybuckley

This comment has been minimized.

Contributor

jennybuckley commented Apr 12, 2018

/ok-to-test

@choury

This comment has been minimized.

Contributor

choury commented Apr 13, 2018

/assign @vishh

@vishh

This comment has been minimized.

Member

vishh commented Apr 17, 2018

/assign @ConnorDoyle

@choury

This comment has been minimized.

Contributor

choury commented Apr 20, 2018

@ConnorDoyle Would you like to review this? This is a bug fix, and should be cherry-picked to 1.8, 1.9 and 1.10
/kind bug

@@ -59,10 +59,10 @@ func (s *stateMemory) GetCPUSetOrDefault(containerID string) cpuset.CPUSet {
s.RLock()

This comment has been minimized.

@vishh

vishh Apr 20, 2018

Member

Can you instead just not lock here?

This comment has been minimized.

@choury

choury Apr 23, 2018

Contributor

Yes, that seems a better solution since atomic operation is not needed here.

@ConnorDoyle

This comment has been minimized.

Member

ConnorDoyle commented Apr 23, 2018

Thanks very much for fixing this bug.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Apr 23, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Apr 23, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: choury, ConnorDoyle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@choury

This comment has been minimized.

Contributor

choury commented Apr 23, 2018

/test pull-kubernetes-e2e-kops-aws
/test pull-kubernetes-integration

@fejta-bot

This comment has been minimized.

fejta-bot commented Apr 23, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

2 similar comments
@fejta-bot

This comment has been minimized.

fejta-bot commented Apr 23, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot

This comment has been minimized.

fejta-bot commented Apr 23, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@ConnorDoyle

This comment has been minimized.

Member

ConnorDoyle commented Apr 23, 2018

/test pull-kubernetes-e2e-kops-aws
(kops bringup failed, per logs

@ConnorDoyle

This comment has been minimized.

Member

ConnorDoyle commented Apr 23, 2018

kops tests are temporarily blocking the queue (see #63024)

@fejta-bot

This comment has been minimized.

fejta-bot commented Apr 23, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Apr 23, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Apr 23, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit fca65dc into kubernetes:master Apr 23, 2018

14 of 15 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-e2e-gce
Details
cla/linuxfoundation choury authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@ConnorDoyle

This comment has been minimized.

Member

ConnorDoyle commented Apr 24, 2018

@choury: I created the cherry-pick PRs to the three affected release branches (see links above).

@choury choury deleted the choury:fix-double-rlock-in-cpumanger branch Apr 24, 2018

k8s-merge-robot added a commit that referenced this pull request Apr 25, 2018

Merge pull request #63043 from ConnorDoyle/automated-cherry-pick-of-#…
…62464-upstream-release-1.8

Automatic merge from submit-queue.

Automated cherry pick of #62464: avoid dobule RLock() in cpumanager

Cherry pick of #62464 on release-1.8.

#62464: avoid dobule RLock() in cpumanager

k8s-merge-robot added a commit that referenced this pull request Apr 25, 2018

Merge pull request #63041 from ConnorDoyle/automated-cherry-pick-of-#…
…62464-upstream-release-1.10

Automatic merge from submit-queue.

Automated cherry pick of #62464: avoid dobule RLock() in cpumanager

Cherry pick of #62464 on release-1.10.

#62464: avoid dobule RLock() in cpumanager

k8s-merge-robot added a commit that referenced this pull request Jun 16, 2018

Merge pull request #63042 from ConnorDoyle/automated-cherry-pick-of-#…
…62464-upstream-release-1.9

Automatic merge from submit-queue.

Automated cherry pick of #62464: avoid dobule RLock() in cpumanager

Cherry pick of #62464 on release-1.9.

#62464: avoid dobule RLock() in cpumanager
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment