Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid dobule RLock() in cpumanager #62464

Merged

Conversation

choury
Copy link
Contributor

@choury choury commented Apr 12, 2018

What this PR does / why we need it:

We met a deadlock when removing pod.
kubelet keeps logging:

Pod "xxxx" is terminated, but some containers are still running 

After debug, we found it stuck in SetDefaultCPUSet here while another goroutine are calling GetCPUSetOrDefault here.

According golang/go#15418, It is not safe to double RLock a RWMutex.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

none

Special notes for your reviewer:

Release note:

removed unsafe double RLock in cpumanager

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 12, 2018
@k8s-ci-robot k8s-ci-robot requested review from ncdc and vishh April 12, 2018 12:16
@choury
Copy link
Contributor Author

choury commented Apr 12, 2018

/ok-to-test

@k8s-ci-robot
Copy link
Contributor

@choury: you can't request testing unless you are a kubernetes member.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@choury choury force-pushed the fix-double-rlock-in-cpumanger branch from 39b6623 to 49d5a93 Compare April 12, 2018 12:24
@jennybuckley
Copy link

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 12, 2018
@choury
Copy link
Contributor Author

choury commented Apr 13, 2018

/assign @vishh

@vishh
Copy link
Contributor

vishh commented Apr 17, 2018

/assign @ConnorDoyle

@choury
Copy link
Contributor Author

choury commented Apr 20, 2018

@ConnorDoyle Would you like to review this? This is a bug fix, and should be cherry-picked to 1.8, 1.9 and 1.10
/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 20, 2018
@@ -59,10 +59,10 @@ func (s *stateMemory) GetCPUSetOrDefault(containerID string) cpuset.CPUSet {
s.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you instead just not lock here?

Copy link
Contributor Author

@choury choury Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems a better solution since atomic operation is not needed here.

@choury choury force-pushed the fix-double-rlock-in-cpumanger branch from 49d5a93 to c1b19fc Compare April 23, 2018 02:34
@ConnorDoyle
Copy link
Contributor

Thanks very much for fixing this bug.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 23, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: choury, ConnorDoyle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 23, 2018
@choury
Copy link
Contributor Author

choury commented Apr 23, 2018

/test pull-kubernetes-e2e-kops-aws
/test pull-kubernetes-integration

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

2 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@ConnorDoyle
Copy link
Contributor

/test pull-kubernetes-e2e-kops-aws
(kops bringup failed, per logs

@ConnorDoyle
Copy link
Contributor

kops tests are temporarily blocking the queue (see #63024)

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@ConnorDoyle
Copy link
Contributor

ConnorDoyle commented Apr 24, 2018

@choury: I created the cherry-pick PRs to the three affected release branches (see links above).

@choury choury deleted the fix-double-rlock-in-cpumanger branch April 24, 2018 01:52
k8s-github-robot pushed a commit that referenced this pull request Apr 25, 2018
…62464-upstream-release-1.8

Automatic merge from submit-queue.

Automated cherry pick of #62464: avoid dobule RLock() in cpumanager

Cherry pick of #62464 on release-1.8.

#62464: avoid dobule RLock() in cpumanager
k8s-github-robot pushed a commit that referenced this pull request Apr 25, 2018
…62464-upstream-release-1.10

Automatic merge from submit-queue.

Automated cherry pick of #62464: avoid dobule RLock() in cpumanager

Cherry pick of #62464 on release-1.10.

#62464: avoid dobule RLock() in cpumanager
k8s-github-robot pushed a commit that referenced this pull request Jun 16, 2018
…62464-upstream-release-1.9

Automatic merge from submit-queue.

Automated cherry pick of #62464: avoid dobule RLock() in cpumanager

Cherry pick of #62464 on release-1.9.

#62464: avoid dobule RLock() in cpumanager
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants