Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the default $BMO_CONCURRENCY for scale #906

Merged
merged 2 commits into from
Jun 8, 2021

Conversation

zaneb
Copy link
Member

@zaneb zaneb commented Jun 1, 2021

Instead of defaulting the $BMO_CONCURRENCY value (the maximum number of
Hosts to reconcile concurrently) to a hard-coded value of 3, set it
instead to the number of CPU threads available, but constrained to a
range between 2 and 8.

We never default to 1 so that we don't inadvertantly rely on
single-threadedness in tests. 8 seems to be a reasonable value for a
large scale deployment, while still being substantially below the
default $PROVISIONING_LIMIT of 20.

There remains no restriction on the value that can be passed in the
environment variable.

Fixes #905

@zaneb zaneb requested a review from andfasano June 1, 2021 16:48
@metal3-io-bot metal3-io-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 1, 2021
@zaneb
Copy link
Member Author

zaneb commented Jun 1, 2021

/test-integration

Copy link

@flaper87 flaper87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Thanks for the PR

Copy link

@flaper87 flaper87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@metal3-io-bot
Copy link
Contributor

@flaper87: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dtantsur
Copy link
Member

dtantsur commented Jun 2, 2021

/approve

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dtantsur

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2021
@@ -1312,7 +1313,13 @@ func (r *BareMetalHostReconciler) updateEventHandler(e event.UpdateEvent) bool {
// SetupWithManager reigsters the reconciler to be run by the manager
func (r *BareMetalHostReconciler) SetupWithManager(mgr ctrl.Manager) error {

maxConcurrentReconciles := 3
maxConcurrentReconciles := runtime.NumCPU()
if maxConcurrentReconciles > 8 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using an hard-coded value, I was thinking that it could be useful to set as the upper limit something like min(PROVISIONIG_LIMIT/2.5, 1), just to enforce the ratio between the two values in case the user will set a particularly low value for PROVISIONING_LIMIT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to keep this configurable as an option (via env)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to keep this configurable as an option (via env)?

It is still configurable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using an hard-coded value, I was thinking that it could be useful to set as the upper limit something like min(PROVISIONIG_LIMIT/2.5, 1), just to enforce the ratio between the two values in case the user will set a particularly low value for PROVISIONING_LIMIT.

I can't imagine anyone setting a smaller limit except on very small hardware, in which case the CPU count should scale the default appropriately for us. I can imagine people setting a bigger limit, but I don't think we want to increase the default concurrency in that case. I don't think we should tie these two settings together because ultimately they limit different things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about tying them together, it's just about the currently proposed value for the default upper limit of maxConcurrentReconciles - and I think there was an error in my previous comment, as I meant something like:

maxConcurrentReconciles = min(PROVISIONING_LIMIT/2.5, runtime.NumCPU())

which for a default PROVISIONING_LIMIT value of 20 will limit maxConcurrentReconciles to 8 on a system with more than 8 cores.

Given that the current proposal is to define the default value based on the number of available CPU, the basic idea is to try using all the available cpus as much possible - even though the benefits could be marginal (in general I'd expect that BMO will run within a container, which could have already resources and/or limits set), and at the same time trying to honor the fact that maxConcurrentReconciles << PROVISIONING_LIMIT

Copy link
Member

@furkatgofurov7 furkatgofurov7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, the only thing is we have to update the documentation whether we hard-code or set the upper limit to $BMO_CONCURRENCY accordingly.

zaneb added 2 commits June 7, 2021 09:40
Instead of defaulting the $BMO_CONCURRENCY value (the maximum number of
Hosts to reconcile concurrently) to a hard-coded value of 3, set it
instead to the number of CPU threads available, but constrained to a
range between 2 and 8.

We never default to 1 so that we don't inadvertantly rely on
single-threadedness in tests. 8 seems to be a reasonable value for a
large scale deployment, while still being substantially below the
default $PROVISIONING_LIMIT of 20.

There remains no restriction on the value that can be passed in the
environment variable.

Fixes metal3-io#905
We never call the provisioner in this state, so speed it up by never
checking the current registration status.
@zaneb
Copy link
Member Author

zaneb commented Jun 7, 2021

/test-integration

@andfasano
Copy link
Member

The current changes could be a good improvement in respect to the previous situation, and looks like there could be some room for improvements in the future

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2021
@metal3-io-bot metal3-io-bot merged commit f5228fa into metal3-io:master Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimise concurrency for scale
7 participants