Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use RWMutex in cacheBasedManager and add some unit tests #81408

Open
wants to merge 1 commit into
base: master
from

Conversation

@cwdsuzhou
Copy link
Member

commented Aug 14, 2019

What type of PR is this?

/kind feature

What this PR does / why we need it:

Use RWMutex in cacheBasedManager to replace Mutex to accelerate map operation.

  1. To use RWMutex, I have changed the old method to trigger Get().
  2. Add some tests to simulate cache expired due to ttl

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/cc @mattjmcnaughton

@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

@mattjmcnaughton
Copy link
Contributor

left a comment

Thanks for your pr!

As a starting point, can you add some additional explanation around how the change to use a RWMutex relates to your change around cache expiration and adding refVersion? I'm not sure I'm following how the cache behavior impacts whether we can use a RWMutex vs a plain Mutex.

@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Aug 15, 2019

Thanks for your pr!

As a starting point, can you add some additional explanation around how the change to use a RWMutex relates to your change around cache expiration and adding refVersion? I'm not sure I'm following how the cache behavior impacts whether we can use a RWMutex vs a plain Mutex.

Thanks @mattjmcnaughton , from the codes, we know we use Mutex in the current version, which may influence the performance when we frequently call Get(). Current codes would reset item.data to nil when new pods add. Then initialize item.data in Get() if it is nil, this leads to RWMutex(RLock()) can not work because we also have write operation here(may cause concurrent write).

So, if we want to use RWMutex, we have to avoid write here. item.data == nil is used as a signal for getting data from api-server. Using refVersion to mark changes would have the same effects but would not own read lock.

@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Aug 17, 2019

ping @tallclair for review

@tallclair
Copy link
Member

left a comment

I'm still a bit confused about the motivation for this change. Is there evidence that the cache lock is a bottleneck?

objectTTL := s.defaultTTL
if ttl, ok := s.getTTL(); ok {
objectTTL = ttl
}
return s.clock.Now().Before(data.lastUpdateTime.Add(objectTTL))
return !refUpdated && s.clock.Now().Before(data.lastUpdateTime.Add(objectTTL))

This comment has been minimized.

Copy link
@tallclair

tallclair Aug 20, 2019

Member

if refUpdated, you can just shortcircuit. That seems like a better check to inline where the function is called though.

This comment has been minimized.

Copy link
@tallclair

tallclair Sep 11, 2019

Member

I meant check refUpdated where you call isObjectFresh rather than passing it in.

// This will trigger fetch on the next Get() operation.
item.data = nil
item.refCount++
item.refVersion++

This comment has been minimized.

Copy link
@tallclair

tallclair Aug 20, 2019

Member

Why does adding a reference bump the refVersion?

This comment has been minimized.

Copy link
@tallclair

tallclair Aug 20, 2019

Member

Maybe I'm confused about what refVersion represents. I assumed this was a version of the cached data?

In either case, please add a comment to the field explaining what it is.

@@ -185,8 +185,8 @@ func (s *objectStore) Get(namespace, name string) (runtime.Object, error) {
}

object, err := s.getObject(namespace, name, opts)
if err != nil && !apierrors.IsNotFound(err) && data.object == nil && data.err == nil {
// Couldn't fetch the latest object, but there is no cached data to return.
if err != nil && !apierrors.IsNotFound(err) && refUpdated {

This comment has been minimized.

Copy link
@tallclair

tallclair Aug 20, 2019

Member

What if err != nil && !refUpdated?

This comment has been minimized.

Copy link
@tallclair

tallclair Aug 20, 2019

Member

Ah, I see it just returns the cached object in that case. I'd prefer to move that return closer to here.

@@ -175,7 +174,8 @@ func (s *objectStore) Get(namespace, name string) (runtime.Object, error) {
// needed and return data.
data.Lock()
defer data.Unlock()
if data.err != nil || !s.isObjectFresh(data) {
refUpdated := data.lastVersion < refVersion
if data.err != nil || !s.isObjectFresh(data, refUpdated) {

This comment has been minimized.

Copy link
@tallclair

tallclair Aug 20, 2019

Member

nit: Negate this conditional & shortcircuit, to make the control flow easier to follow.

@@ -24,7 +24,7 @@ import (
"testing"
"time"

v1 "k8s.io/api/core/v1"
"k8s.io/api/core/v1"

This comment has been minimized.

Copy link
@tallclair

tallclair Aug 20, 2019

Member

revert this. Aliasing v1 to itself is standard practice.

@cwdsuzhou cwdsuzhou force-pushed the cwdsuzhou:Aug/rwmutext_cacheManager_refactor branch from 6463f30 to e1d0d09 Aug 23, 2019

@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels Aug 23, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2019

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cwdsuzhou
To complete the pull request process, please assign derekwaynecarr
You can assign the PR to them by writing /assign @derekwaynecarr in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cwdsuzhou cwdsuzhou force-pushed the cwdsuzhou:Aug/rwmutext_cacheManager_refactor branch from e1d0d09 to 974752e Aug 23, 2019

@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Aug 23, 2019

@tallclair thanks. I have fixed the PR according to your comments.

PTAL

Use RWMutex in cacheBasedManager
add some unit tests

fix code according to comments

@cwdsuzhou cwdsuzhou force-pushed the cwdsuzhou:Aug/rwmutext_cacheManager_refactor branch from 974752e to c7a1362 Aug 23, 2019

@mattjmcnaughton
Copy link
Contributor

left a comment

@tallclair thanks. I have fixed the PR according to your comments.

PTAL

I'd also be interested to know the answer to @tallclair's question around whether there's evidence the locking is a bottleneck. The code to support the RWLock seems more complex, so I'm hesitant for us to add it unless we believe it will provide a tangible benefit.

@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Sep 5, 2019

During I read the codes, I found we can accelerate map operation by using RWMutex.
This PR would mainly help accelerate read when we have a large number of pod using configmap or secret start at the some time.

I simulated it locally using script. and found this could accelerate it.
I simulated the scene like this

secret pod
10 secret 1000 pod
20 secret 1000 pod
100 secret 1000 pod

I found using RWMutex would actually improve the performance.

@tallclair @mattjmcnaughton

@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Sep 7, 2019

/retest

@tallclair

This comment has been minimized.

Copy link
Member

commented Sep 11, 2019

A performance boost in a hot looping test script doesn't necessarily indicate a performance boost in the way it's used in Kubernetes.

BTW, I recommend using benchmark tests for scripts like that: https://golang.org/pkg/testing/#hdr-Benchmarks

@cwdsuzhou

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2019

A performance boost in a hot looping test script doesn't necessarily indicate a performance boost in the way it's used in Kubernetes.

BTW, I recommend using benchmark tests for scripts like that: https://golang.org/pkg/testing/#hdr-Benchmarks

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.