New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use runtime.NumCPU() instead of a fixed value for parallel scheduler threads #73934

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
3 participants
@bsalamat
Copy link
Contributor

bsalamat commented Feb 11, 2019

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
Uses the same number of parallel threads as the number of available cores. This should give us a slight performance benefit in larger clusters which use large machines with many cores for their master nodes.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

/sig scheduling
/priority important-longterm

Does this PR introduce a user-facing change?:

NONE
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 11, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 11, 2019

@bsalamat: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-bazel-test d0ebeef link /test pull-kubernetes-bazel-test
pull-kubernetes-integration d0ebeef link /test pull-kubernetes-integration

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@bsalamat

This comment has been minimized.

Copy link
Contributor Author

bsalamat commented Feb 12, 2019

There are some data races which are not caused by this change. I am trying to address them in #73943.

@wgliang

This comment has been minimized.

Copy link
Member

wgliang commented Feb 12, 2019

Emm... I don't really agree with this, I have previously discussed the related issue #59460. We will have performance loss on machines with less number of CPU if the default use runtime.NumCPU().

This should give us a slight performance benefit in larger clusters which use large machines with many cores for their master nodes. But it also brings uncertainty.

If we want to achieve this effect, I prefer the scheduler to have the ability to adjust itself, for example, 16 by default, if runtime.NumCPU()>16 uses runtime.NumCPU().

@wgliang

This comment has been minimized.

Copy link
Member

wgliang commented Feb 12, 2019

In addition, I once found that setting runtime.NumCPU()*2+4 in a production environment can get better performance than runtime.NumCPU().

@bsalamat

This comment has been minimized.

Copy link
Contributor Author

bsalamat commented Feb 12, 2019

@wgliang My own past experience, plus a bunch of other research works, show that it is hard to find a single formula for the number of parallel threads based on the number of available CPU cores. The best number of threads depends on the workload (I/O heavy, memory heavy, lock contentions, etc) and also the number of cores, amount of cache, cache pollution, and memory bandwidth to name a few.
So, I don't think that NumCPU() * 2 + 4 is necessarily the best formula. I am pretty sure it won't work well for machines which have a large number of cores (say a 96 core machine).
I think the best is to use the same number as the number of cores plus a small fixed number. WDYT?

@wgliang

This comment has been minimized.

Copy link
Member

wgliang commented Feb 13, 2019

So, I don't think that NumCPU() * 2 + 4 is necessarily the best formula.

Yes, NumCPU() * 2 + 4 is just the conclusion I got in a certain scenario, it is not representative and versatile. I want to explain that NumCPU() is not necessarily the best default value.

I think the best is to use the same number as the number of cores plus a small fixed number.

Looks like a viable option. And will we consider exposing it to the user as an optional parameter?

@wgliang

This comment has been minimized.

Copy link
Member

wgliang commented Feb 13, 2019

In addition, I think we'd better have the performance data comparison before and after the change as the basis for our changes. :)

@bsalamat

This comment has been minimized.

Copy link
Contributor Author

bsalamat commented Feb 14, 2019

And will we consider exposing it to the user as an optional parameter?

No, not really. If only 0.1% of the users understand what a config option does and even those 0.1% prefer not to change the option, then the config option shouldn't exist. That's my opinion, of course. I feel this would have been such a config option and as a result shouldn't exist. Instead, we should use a value that works best in almost all cases.

In addition, I think we'd better have the performance data comparison before and after the change as the basis for our changes.

I totally agree. The only problem is that my own dev machine has 12 cores. The effect of this change should be much more visible on machines that have more than 16 cores, ideally larger than 24 cores to make a bigger difference. If I find such a machine, I will try to run the benchmarks and will send results, but at this point I am not so optimistic that I will find such a machine soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment