Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-design equivalence class cache to two level cache #65714

Merged
merged 2 commits into from Jul 19, 2018

Conversation

@resouer
Copy link
Member

resouer commented Jul 2, 2018

What this PR does / why we need it:

The current ecache introduced a global lock across all the nodes, and this patch tried to assign ecache per node to eliminate that global lock. The improvement of scheduling performance and throughput are both significant.

CPU Profile Result

Machine: 32-core 60GB GCE VM

1k nodes 10k pods bench test (we've highlighted the critical function):

  1. Current default scheduler with ecache enabled:
    equivlance class cache bench test 001
  2. Current default scheduler with ecache disabled:
    equivlance class cache bench test 002
  3. Current default scheduler with this patch and ecache enabled:
    equivlance class cache bench test 003

Throughput Test Result

1k nodes 3k pods scheduler_perf test:

Current default scheduler, ecache is disabled:

Minimal observed throughput for 3k pod test: 200
PASS
ok      k8s.io/kubernetes/test/integration/scheduler_perf    30.091s

With this patch, ecache is enabled:

Minimal observed throughput for 3k pod test: 556
PASS
ok      k8s.io/kubernetes/test/integration/scheduler_perf    11.119s

Design and implementation:

The idea is: we re-designed ecache into a "two level cache".

The first level cache holds the global lock across nodes and sync is needed only when node is added or deleted, which is of much lower frequency.

The second level cache is assigned per node and its lock is restricted to per node level, thus there's no need to bother the global lock during whole predicate process cycle. For more detail, please check the original discussion.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #63784

Special notes for your reviewer:

Tagged as WIP to make sure this does not break existing code and tests, we can start review after CI is happy.

Release note:

Re-design equivalence class cache to two level cache
@@ -51,7 +51,7 @@ type schedulerCache struct {
period time.Duration

// This mutex guards all fields within this cache struct.
mu sync.Mutex
mu sync.RWMutex

This comment has been minimized.

@resouer

resouer Jul 3, 2018

Author Member

Comment for reviewer: this change is to improve cache.IsUpToDate

This comment has been minimized.

@misterikkit

misterikkit Jul 9, 2018

Contributor

Did we forget to approve/merge the PR where someone else made this change? /=

This comment has been minimized.

@resouer

resouer Jul 10, 2018

Author Member

Nope, that previous PR only improved ecache, not cache :D

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jul 11, 2018

Contributor

So, the way I see it, we are making 2 changes as part of this PR,

  • Changing scheduler cache's lock from Mutex to RWMutex
  • Changing ecache to be 2 level cache.

So, the flame graphs attached, could be result of both the changes? Am I missing something?

This comment has been minimized.

@resouer

resouer Jul 11, 2018

Author Member

The scheduler cache refactoring is just a bonus :D Performance improvement is brought by 2 level cache which is the core idea of this patch.

@resouer resouer force-pushed the resouer:fix-63784 branch from 014d756 to 9c69575 Jul 3, 2018

@resouer resouer changed the title [WIP] Re-design equivalence class cache to two level cache Re-design equivalence class cache to two level cache Jul 3, 2018

@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Jul 3, 2018

cc @bsalamat @misterikkit Seems CI is happy now, please take a look :D

@misterikkit
Copy link
Contributor

misterikkit left a comment

If I understand correctly, we want to avoid top level RLock() multiple times when running various predicates for one node. If that's the case, I think we should structure the equivalence pkg API around that. (which is basically what you've done) I just think the names of things could better reflect how we expect users to use them. For example, how about this?

eCache := equivalence.NewCache()
class := equivalence.NewClass(pod)
nodeCache := eCache.ForNode(nodeName)
fit, reasons, err := nodeCache.RunPredicate(p, pod, class, etc...)

The type NodeCache would have only one exported function and no exported fields (maybe it's an interface).

AddNode is done lazily by ForNode.
RemoveNode could keep it's old name of InvalidateAllPredicatesOnNode.

// TopLevelEquivCache is a thread safe nodeMap
// NOTE(harry): Theoretically sync.Map has better performance in machine with 8+ CPUs, while
// the reality is lock contention in first level cache is rare.
type TopLevelEquivCache struct {

This comment has been minimized.

@misterikkit

misterikkit Jul 3, 2018

Contributor

The exported symbols from this package should be "Cache" and "NewCache". The top level struct/constructor should take those names, while we rename the old ones.

This comment has been minimized.

@misterikkit

misterikkit Jul 3, 2018

Contributor

Side note: To avoid "stutter" we should not put "equiv" or "equivalence" in exported symbols of this package.

This comment has been minimized.

@resouer

resouer Jul 4, 2018

Author Member

And NodeCache, we need to pass this type as parameter in generic_scheduler

@@ -33,6 +33,152 @@ import (
"github.com/golang/glog"
)

// nodeMap is type for a map of {nodeName: *Cache}

This comment has been minimized.

@misterikkit

misterikkit Jul 3, 2018

Contributor

This comment is mostly redundant. We know it's a type, a map, and that the value is *Cache. How about something like, "nodeMap stores a *Cache for each node."

This comment has been minimized.

@resouer

resouer Jul 5, 2018

Author Member

Fixed, thanks!

// GetOrCreateCache returns the second level Cache for given node name.
// If the cache does not exist, create it.
// This should be called in generic scheduler before predicate phase begins for given node.
func (n *TopLevelEquivCache) GetOrCreateCache(name string) *Cache {

This comment has been minimized.

@misterikkit

misterikkit Jul 3, 2018

Contributor

Does the API surface of this package need to change for this improvement? RunPredicate had the side effect of creating new cache items if needed.

I'm ok with improvements to the eCache API, but this seems like more management work will be put on generic scheduler.

This comment has been minimized.

@resouer

resouer Jul 5, 2018

Author Member

As RunPredicate now only handled by NodeCache, the first level cache has to be managed by generic scheduler.

@@ -41,14 +187,14 @@ import (
// class". (Equivalence class is defined in the `Class` type.) Saved results
// will be reused until an appropriate invalidation function is called.
type Cache struct {
mu sync.RWMutex
cache nodeMap
sync.RWMutex

This comment has been minimized.

@misterikkit

misterikkit Jul 3, 2018

Contributor

Why remove the 'mu' here? By embedding the type, ppl outside this package can start calling (*Cache).Lock(), which I don't think is intended.

@@ -355,21 +355,27 @@ func (g *genericScheduler) findNodesThatFit(pod *v1.Pod, nodes []*v1.Node) ([]*v
// We can use the same metadata producer for all nodes.
meta := g.predicateMetaProducer(pod, g.cachedNodeInfoMap)

var equivClass *equivalence.Class
var (

This comment has been minimized.

@misterikkit

misterikkit Jul 3, 2018

Contributor

single var can be on single line.

This comment has been minimized.

@resouer

resouer Jul 6, 2018

Author Member

Done! Also, refactored functions to new names :-) Please review.

@resouer resouer force-pushed the resouer:fix-63784 branch 2 times, most recently from e36cdd4 to f505798 Jul 5, 2018

@misterikkit
Copy link
Contributor

misterikkit left a comment

This looks great! I left mostly style/naming comments, and they don't all need to be addressed. There are a couple API/threading questions though.

@@ -51,7 +51,7 @@ type schedulerCache struct {
period time.Duration

// This mutex guards all fields within this cache struct.
mu sync.Mutex
mu sync.RWMutex

This comment has been minimized.

@misterikkit

misterikkit Jul 9, 2018

Contributor

Did we forget to approve/merge the PR where someone else made this change? /=

// nodeMap stores a *Cache for each node.
type nodeMap map[string]*NodeCache

// Cache is a thread safe nodeMap

This comment has been minimized.

@misterikkit

misterikkit Jul 9, 2018

Contributor

Because the Cache type is the main entry point of this package, I think we should keep a lot of the previous type comment with it. i.e. the fact that it saves and reuses predicate results, and the explanation of node/predicate/class heirarchy. (Those comments were moved to the NodeCache type, but I think it's okay if we repeat ourselves a bit.)

Also, this is the second significant API change I've seen that is driven by performance needs. I think we should add to the comment on this type that users are expected to precompute the equivalence.Class value and pre-lookup the NodeCache within one scheduling cycle.

This comment has been minimized.

@resouer

resouer Jul 10, 2018

Author Member

Good point, I will clarify these with more comments.

// ForNode returns the existing NodeCache for given node if present. Otherwise,
// it creates the NodeCache and returns it.
// The boolean flag is true if the value was loaded, false if created.
func (n *Cache) ForNode(name string) (nodeCache *NodeCache, exists bool) {

This comment has been minimized.

@misterikkit

misterikkit Jul 9, 2018

Contributor

It's weird for the receiver variable to be n. Could you change it to c?

}

// removeCachedPreds deletes cached predicates by given keys.
func (c *NodeCache) removeCachedPreds(predicateKeys sets.String) {

This comment has been minimized.

@misterikkit

misterikkit Jul 9, 2018

Contributor

Should this just be called invalidatePreds?

}

// removeCachedPreds deletes cached predicates by given keys.
func (c *NodeCache) removeCachedPreds(predicateKeys sets.String) {

This comment has been minimized.

@misterikkit

misterikkit Jul 9, 2018

Contributor

Probably we should move this function down to the other NodeCache receivers.

return &Cache{
cache: make(nodeMap),
// newCache returns an empty NodeCache.
func newCache() *NodeCache {

This comment has been minimized.

@misterikkit

misterikkit Jul 9, 2018

Contributor

Let's call this newNodeCache to avoid confusion with NewCache

@@ -97,7 +218,7 @@ type predicateResult struct {
// run and its results cached for the next call.
//
// NOTE: RunPredicate will not update the equivalence cache if the given NodeInfo is stale.
func (c *Cache) RunPredicate(
func (c *NodeCache) RunPredicate(

This comment has been minimized.

@misterikkit

misterikkit Jul 9, 2018

Contributor

Similar to above, receiver variable should probably change from c to n.

This comment has been minimized.

@resouer

resouer Jul 10, 2018

Author Member

Fixed, thanks for catch this nits.

@@ -126,16 +247,15 @@ func (c *Cache) RunPredicate(
}

// updateResult updates the cached result of a predicate.
func (c *Cache) updateResult(
// This function is thread safe for second level Cache, no need to sync with top level cache.

This comment has been minimized.

@misterikkit

misterikkit Jul 9, 2018

Contributor

All operations on NodeCache objects are threadsafe in the context of that NodeCache object, right? It might be worth mentioning that on the type's comment.

This comment has been minimized.

@resouer

resouer Jul 10, 2018

Author Member

Done!

// ForNode returns the existing NodeCache for given node if present. Otherwise,
// it creates the NodeCache and returns it.
// The boolean flag is true if the value was loaded, false if created.
func (n *Cache) ForNode(name string) (nodeCache *NodeCache, exists bool) {

This comment has been minimized.

@misterikkit

misterikkit Jul 9, 2018

Contributor

What should callers do differently depending on the boolean return?

This comment has been minimized.

@resouer

resouer Jul 10, 2018

Author Member

I would like to expose a indication of the change we did in ForNode() call as it has mixed duties. This is only used by tests for now.

@resouer resouer force-pushed the resouer:fix-63784 branch from f505798 to c428bf6 Jul 10, 2018

@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Jul 10, 2018

/test pull-kubernetes-e2e-kops-aws

@misterikkit
Copy link
Contributor

misterikkit left a comment

/lgtm

(I still don't have approver powers, btw)

}

// NewCache returns an empty Cache.
// NewCache create am empty equiv class cache.

This comment has been minimized.

@misterikkit

misterikkit Jul 10, 2018

Contributor

s/create am/creates an/

c.nodeToCache[name] = newNodeCache()
nodeCache = c.nodeToCache[name]
}
return

This comment has been minimized.

@misterikkit

misterikkit Jul 10, 2018

Contributor

I'm still new to golang - what's the general style for naked returns? Should we be redundant and say return nodeCache, exists here?

This comment has been minimized.

@resouer

resouer Jul 11, 2018

Author Member

I believe this is one of benefits of named return value, i.e. no need to refactor the return line in the future change.


// InvalidateCachedPredicateItemForPodAdd is a wrapper of
// InvalidateCachedPredicateItem for pod add case
// TODO: This does not belong with the equivalence cache implementatioc.

This comment has been minimized.

@misterikkit

misterikkit Jul 10, 2018

Contributor

s/implementatioc/implementation/

}
}
// predKeySet returns all cached predicate keys.
func (n *NodeCache) predKeySet() sets.String {

This comment has been minimized.

@misterikkit

misterikkit Jul 10, 2018

Contributor

Is this function used anywhere? Can we delete it?

This comment has been minimized.

@resouer

resouer Jul 11, 2018

Author Member

Oh, a legacy one.

mockCache := &syncingMockCache{
Cache: cache,
cycleStart: make(chan struct{}),
cacheInvalidated: make(chan struct{}),
}

eCache := equivalence.NewCache()
// ForNode will lazily create NodeCache for given test node.

This comment has been minimized.

@misterikkit

misterikkit Jul 10, 2018

Contributor

Is this needed as part of this test?

This comment has been minimized.

@resouer

resouer Jul 11, 2018

Author Member

Oh, no need.

@k8s-ci-robot k8s-ci-robot added the lgtm label Jul 10, 2018

@resouer resouer force-pushed the resouer:fix-63784 branch from c428bf6 to fe583e7 Jul 11, 2018

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jul 11, 2018

// it creates the NodeCache and returns it.
// The boolean flag is true if the value was loaded, false if created.
func (c *Cache) ForNode(name string) (nodeCache *NodeCache, exists bool) {
c.mu.RLock()

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jul 11, 2018

Contributor

Shouldn't we do c.mu.Lock() and c.mu.UnLock() as we are adding a new entry in case we don't have one?

}

// InvalidatePredicatesOnNode clears cached results for the given predicates on one node.
func (c *Cache) InvalidatePredicatesOnNode(nodeName string, predicateKeys sets.String) {

This comment has been minimized.

@ravisantoshgudimetla

ravisantoshgudimetla Jul 11, 2018

Contributor

You can probably combine this function and above one.

This comment has been minimized.

@resouer

resouer Jul 11, 2018

Author Member

I will choose to keep them as separate calls are needed.

@resouer resouer force-pushed the resouer:fix-63784 branch 2 times, most recently from 1cb63e9 to 153045e Jul 11, 2018

@ravisantoshgudimetla
Copy link
Contributor

ravisantoshgudimetla left a comment

@resouer Thanks. The changes LTM. I am just curious about RWMutex scale issue. Can you try couple of things

  • Schedule pods on a bigger node where GO_MAX_PROCS is higher(>8).
  • Schedule more pods lon more nodes ike 30k and 5k nodes.

I want to know, if we are going to hit any of the scaling issues mentioned for RWlock.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jul 19, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, misterikkit, ravisantoshgudimetla, resouer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jul 19, 2018

@resouer: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-device-plugin-gpu e5a7a4c link /test pull-kubernetes-e2e-gce-device-plugin-gpu

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jul 19, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 795b7da into kubernetes:master Jul 19, 2018

16 of 18 checks passed

pull-kubernetes-e2e-gce-device-plugin-gpu Job failed.
Details
Submit Queue Required Github CI test is not green: pull-kubernetes-kubemark-e2e-gce-big
Details
cla/linuxfoundation resouer authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

@resouer resouer deleted the resouer:fix-63784 branch Jul 20, 2018

@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Jul 20, 2018

Thanks Bobby, I am keeping eyes on tests grid.

@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Jul 25, 2018

cc @wojtek-t you maybe interested in this fix as well. The integration test is still happy by now.

ref: #58222

@bsalamat

This comment has been minimized.

Copy link
Contributor

bsalamat commented Jul 25, 2018

@resouer I checkout the master today and made sure that your changes are in, but when I run scheduler benchmarks for 5000 nodes, scheduling 1000 pods, I still see a performance degradation when I enable equivalence cache. 20ms/pod when ecache is enabled vs. 18ms/pod when ecache is disabled. I enable/disable eCache here. Am I missing anything in running the test?

eCache disabled:

pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkScheduling/5000Nodes/1000Pods-12         	    1000	  18598875 ns/op
PASS
ok  	k8s.io/kubernetes/test/integration/scheduler_perf	41.281s
+++ [0725 11:41:51] Cleaning up etcd
+++ [0725 11:41:51] Integration test cleanup complete

eCache enabled:

pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkScheduling/5000Nodes/1000Pods-12         	    1000	  20127793 ns/op
PASS
ok  	k8s.io/kubernetes/test/integration/scheduler_perf	43.632s
@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Jul 26, 2018

The test method is right, while I just tested with HEAD@d2fc875489, it seems work well:

{nodes: 5000, existingPods: 0, minPods: 10000},

make test-integration WHAT=./test/integration/scheduler_perf KUBE_TEST_ARGS="-run=xxx -bench=.*BenchmarkScheduling.* -cpuprofile cpu-5000n-10000p.out"

ecache=false: 
BenchmarkScheduling/5000Nodes/0Pods-64             10000          50404244 ns/op
BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-64        	     250	  18695074 ns/o

ecache=true:
BenchmarkScheduling/5000Nodes/0Pods-64             10000          15933557 ns/op
BenchmarkSchedulingAntiAffinity/500Nodes/5000Pods-64                 250           7225648 ns/op

It's wired. I'm looking into it.

@bsalamat

This comment has been minimized.

Copy link
Contributor

bsalamat commented Jul 26, 2018

Hmmm. I don't know what is going on. I checked out the head again and used your exact configutation: {nodes: 5000, existingPods: 0, minPods: 10000}. Still the results are in favor of disabled ecache.

Just to make sure we are using similar config, I use the following command line to run the benchmarks:

make test-integration WHAT=./test/integration/scheduler_perf GOFLAGS="-v=1" KUBE_TEST_VMODULE="''" KUBE_TEST_ARGS="-run=xxx -bench BenchmarkScheduling/5000Nodes/0Pods"

Cache enabled:

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkScheduling/5000Nodes/0Pods-12         	   10000	  10227194 ns/op
PASS
ok  	k8s.io/kubernetes/test/integration/scheduler_perf	116.835s

cache disabled:

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkScheduling/5000Nodes/0Pods-12         	   10000	   8873203 ns/op
PASS
ok  	k8s.io/kubernetes/test/integration/scheduler_perf	103.355s
@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Jul 27, 2018

Emm, when I rebased to HEAD@e4465b6e2f (should be latest commit of yesterday), I can see your problem happens:

ecache=false: 
BenchmarkScheduling/5000Nodes/0Pods-64             10000          16414234 ns/op

ecache=true:
BenchmarkScheduling/5000Nodes/0Pods-64             10000          15934557 ns/op

Seems like some recent commits "improved" default scheduler (or bench testing at least) somehow, it's interesting. I am taking look at this.

[Update] After bi-search the commits, it seems related to: 0bf7427 of #66061

// before  0bf7427
ecache=false: 
BenchmarkScheduling/5000Nodes/0Pods-64             10000          57223416 ns/op

// after (with)  0bf7427
ecache=false: 
BenchmarkScheduling/5000Nodes/0Pods-64             10000          16414234 ns/op
@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Jul 29, 2018

@bsalamat , a quick follow up is, with test command below, everything works as expected in my env:

export KUBE_TEST_ARGS="-run=xxx -bench=BenchmarkScheduling -cpuprofile cpu-5000n-10000p.out"
make test-integration WHAT=./test/integration/scheduler_perf

But with test command like this, benchmark test will finish much earlier (which never happen before #66061):

make test-integration WHAT=./test/integration/scheduler_perf KUBE_TEST_ARGS="-run=xxx -bench=BenchmarkScheduling -cpuprofile cpu-5000n-10000p.out"

The difference seems related to env passing across multiple .sh files

[Update]: I've reported the issue and this inconsistency will be solved by #66816

@misterikkit

This comment has been minimized.

Copy link
Contributor

misterikkit commented Jul 30, 2018

@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Aug 1, 2018

@misterikkit @bsalamat Just fired #66862 and #66816 to partially fix this wired case.

While the root cause I recently observed is: depending on KUBE_TEST_VMODULE="''" is set or not, the test result will be hugely differentiated. i.e. with -vmodule='' the benchmark test will finish much earlier than expected.

@bsalamat

This comment has been minimized.

Copy link
Contributor

bsalamat commented Aug 2, 2018

Thanks, @resouer! As you have noticed, vmodule skews the results significantly. It must be disabled for performance benchmarking, otherwise vmodule takes most of the time. "vmodule" is disabled in our production code, but it is enabled by default when running integration tests, including our benchmarks.

I used the following command to run the benchmarks while vmodule is disabled:

KUBE_TEST_ARGS="-run=xxx -bench=BenchmarkScheduling/5000Nodes/0Pods" GOFLAGS="-v=1" KUBE_TEST_VMODULE="''" make test-integration WHAT=./test/integration/scheduler_perf

I still see a slight performance decrease when ecache is enabled. We should think what we kind of changes we should make to improve eCache's performance. CPU profiling may reveal areas of potential improvement.

Cache disabled

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkScheduling/5000Nodes/0Pods-12         	   10000	   9401233 ns/op
PASS
ok  	k8s.io/kubernetes/test/integration/scheduler_perf	108.352s

Cache enabled:

pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkScheduling/5000Nodes/0Pods-12         	   10000	  10969937 ns/op
PASS
ok  	k8s.io/kubernetes/test/integration/scheduler_perf	124.011s
@bsalamat

This comment has been minimized.

Copy link
Contributor

bsalamat commented Aug 2, 2018

I have some good news. The above results are obtained with regular Scheduler benchmark. That benchmark tries scheduling a number of simple pods. The only scheduling requirement of those pods is the amount of resources available on nodes. Checking amount of resources is a simple mathematical operation. That's why we don't see performance improvement when we enable eCache. I ran the AntiAffinity benchmarks with and without eCache. eCache improves performance by over 16x!

These results indicate that eCache is beneficial only for more sophisticated predicates which are computationally expensive. Examples are inter-pod affinity and anti-affinity predicates. What I think we should do is to find a list of predicates which are more computationally expensive and use eCache only for those, instead of using the eCache for all the predicates.

Cache disabled

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkSchedulingAntiAffinity/5000Nodes/0Pods-12         	    2000	 287683998 ns/op
PASS
ok  	k8s.io/kubernetes/test/integration/scheduler_perf	589.456s

Cache enabled

goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/test/integration/scheduler_perf
BenchmarkSchedulingAntiAffinity/5000Nodes/0Pods-12         	    2000	  17425632 ns/op
PASS
ok  	k8s.io/kubernetes/test/integration/scheduler_perf	48.989s
@resouer

This comment has been minimized.

Copy link
Member Author

resouer commented Aug 3, 2018

AntiAffinity benchmarks with and without eCache. eCache improves performance by over 16x!

@bsalamat, Yes, I am aware that the current benchmark test we are playing is too "simple" :D

And another thing I'd like to amend is, in current logic, GenericPredicates are frequently invalidated as a whole by new Pod added event. This should explains why the tests results are close.

We should think what we kind of changes we should make to improve eCache's performance

Actually, after #66862 is in, the only impact of ecache is map accesses in lookupResult(), unfortunately, according to CPU profiling, I am pretty sure map access is not O(1) in Golang, and just as you mentioned, it's slower than simple mathematical operation.

What I think we should do is to find a list of predicates which are more computationally expensive and use eCache only for those, instead of using the eCache for all the predicates.

This is exactly what's in my mind. Thanks! Will continue on this direction, while instead of bypass specific predicates, I would prefer if we can measure them in the future.

And I was also hoping to invalidate PodFitsResources separately or skip it during ecache, it will be a noticeable improvement for the ecache hit rate.

k8s-github-robot pushed a commit that referenced this pull request Aug 22, 2018

Kubernetes Submit Queue
Merge pull request #67681 from misterikkit/reviewer
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add misterikkit to sig-scheduling REVIEWERS.

I have met the following criteria.
- member for at least 3 months
- primary reviewer for at least 5 PRs
  - #63603
  - #63665 (and related PRs)
  - #63839
  - #65714
  - #66862
- reviewed or merged at least 20 PRs
  reviewed 13: https://github.com/pulls?utf8=%E2%9C%93&q=is%3Apr+archived%3Afalse+is%3Amerged+repo%3Akubernetes%2Fkubernetes+commenter%3Amisterikkit+in%3Acomment+assignee%3Amisterikkit+
  merged 22: https://github.com/pulls?utf8=%E2%9C%93&q=is%3Apr+author%3Amisterikkit+archived%3Afalse+is%3Amerged+repo%3Akubernetes%2Fkubernetes+


**Release note**:
```release-note
NONE
```
/cc @bsalamat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.