New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sync.map to scale equiv class cache better #66862

Merged
merged 1 commit into from Aug 21, 2018

Conversation

@resouer
Member

resouer commented Aug 1, 2018

What this PR does / why we need it:

Change the current lock in first level ecache into sync.Map, which is known for scaling better than sync. Mutex on machines with >8 CPUs

ref: https://golang.org/pkg/sync/#Map

And the code is much cleaner in this way.

5k Nodes, 10k Pods benchmark with ecache enabled in 64 cores VM:

// before
BenchmarkScheduling/5000Nodes/0Pods-64             10000          17550089 ns/op

// after
BenchmarkScheduling/5000Nodes/0Pods-64             10000          16975098 ns/op

Comparing to current implementation, the improvement after this change is noticeable, and the test is stable in 8, 16, 64 cores VM.

Special notes for your reviewer:

Release note:

Use sync.map to scale ecache better
@resouer

This comment has been minimized.

Show comment
Hide comment
@resouer
Member

resouer commented Aug 1, 2018

/assign @bsalamat

xref: #65714 (comment)

@misterikkit

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Aug 2, 2018

@k8s-ci-robot k8s-ci-robot removed the lgtm label Aug 7, 2018

@misterikkit

This comment has been minimized.

Show comment
Hide comment
@misterikkit

misterikkit Aug 7, 2018

Contributor

/lgtm

Contributor

misterikkit commented Aug 7, 2018

/lgtm

@bsalamat

/lgtm

mu sync.RWMutex
nodeToCache nodeMap
// i.e. map[string]*NodeCache
sync.Map

This comment has been minimized.

@bsalamat

bsalamat Aug 21, 2018

Contributor

If you look at the sync.Map documentation, it reads:

"The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex."

eCache does not follow any of these patterns. The second pattern is closer to what eCache does, but even that is not completely applicable to eCache. We have informer event handlers which may write/delete eCache entries in parallel to other go-routines that run our predicate functions. So, the sets of entries written/deleted in various goroutines are not disjoint.
The documentation also states:
"The Map type is specialized. Most code should use a plain Go map instead, with separate locking or coordination, for better type safety and to make it easier to maintain other invariants along with the map content."

These are the reasons that I am a bit unsure about using sync.Map here. Also, the performance improvement does not seem to be large. Those said, I don't have any serious objection against it.

@bsalamat

bsalamat Aug 21, 2018

Contributor

If you look at the sync.Map documentation, it reads:

"The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex."

eCache does not follow any of these patterns. The second pattern is closer to what eCache does, but even that is not completely applicable to eCache. We have informer event handlers which may write/delete eCache entries in parallel to other go-routines that run our predicate functions. So, the sets of entries written/deleted in various goroutines are not disjoint.
The documentation also states:
"The Map type is specialized. Most code should use a plain Go map instead, with separate locking or coordination, for better type safety and to make it easier to maintain other invariants along with the map content."

These are the reasons that I am a bit unsure about using sync.Map here. Also, the performance improvement does not seem to be large. Those said, I don't have any serious objection against it.

This comment has been minimized.

@resouer

resouer Aug 21, 2018

Member

Thanks Bobby! Actually, that's also the reason I did not use sync.Map at the beginning :D

While since lock contention is visibly reduced during benchmark CPU profiling:

https://github.com/resouer/temp/blob/master/torch_lock_true.5000.svg
https://github.com/resouer/temp/blob/master/torch_lock_true.map.5000.svg

and as the Cache logic is very simple, I guess we can keep the change as is.

@resouer

resouer Aug 21, 2018

Member

Thanks Bobby! Actually, that's also the reason I did not use sync.Map at the beginning :D

While since lock contention is visibly reduced during benchmark CPU profiling:

https://github.com/resouer/temp/blob/master/torch_lock_true.5000.svg
https://github.com/resouer/temp/blob/master/torch_lock_true.map.5000.svg

and as the Cache logic is very simple, I guess we can keep the change as is.

@k8s-ci-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-ci-robot

k8s-ci-robot Aug 21, 2018

Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, misterikkit, resouer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Contributor

k8s-ci-robot commented Aug 21, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, misterikkit, resouer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-merge-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-merge-robot

k8s-merge-robot Aug 21, 2018

Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

Contributor

k8s-merge-robot commented Aug 21, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-merge-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-merge-robot

k8s-merge-robot Aug 21, 2018

Contributor

Automatic merge from submit-queue (batch tested with PRs 66862, 67618). If you want to cherry-pick this change to another branch, please follow the instructions here.

Contributor

k8s-merge-robot commented Aug 21, 2018

Automatic merge from submit-queue (batch tested with PRs 66862, 67618). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit 4cca6a8 into kubernetes:master Aug 21, 2018

16 of 18 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-kubemark-e2e-gce-big
Details
pull-kubernetes-kubemark-e2e-gce-big Job triggered.
Details
cla/linuxfoundation resouer authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@k8s-ci-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-ci-robot

k8s-ci-robot Aug 21, 2018

Contributor

@resouer: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-kubemark-e2e-gce-big 17d0190 link /test pull-kubernetes-kubemark-e2e-gce-big

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Contributor

k8s-ci-robot commented Aug 21, 2018

@resouer: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-kubemark-e2e-gce-big 17d0190 link /test pull-kubernetes-kubemark-e2e-gce-big

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@resouer resouer deleted the resouer:sync-map branch Aug 21, 2018

k8s-merge-robot added a commit that referenced this pull request Aug 22, 2018

Merge pull request #67681 from misterikkit/reviewer
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add misterikkit to sig-scheduling REVIEWERS.

I have met the following criteria.
- member for at least 3 months
- primary reviewer for at least 5 PRs
  - #63603
  - #63665 (and related PRs)
  - #63839
  - #65714
  - #66862
- reviewed or merged at least 20 PRs
  reviewed 13: https://github.com/pulls?utf8=%E2%9C%93&q=is%3Apr+archived%3Afalse+is%3Amerged+repo%3Akubernetes%2Fkubernetes+commenter%3Amisterikkit+in%3Acomment+assignee%3Amisterikkit+
  merged 22: https://github.com/pulls?utf8=%E2%9C%93&q=is%3Apr+author%3Amisterikkit+archived%3Afalse+is%3Amerged+repo%3Akubernetes%2Fkubernetes+


**Release note**:
```release-note
NONE
```
/cc @bsalamat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment