Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for scale up/down delay #2598

Merged

Conversation

44past4
Copy link
Contributor

@44past4 44past4 commented Nov 28, 2019

Fix for #2597 for 1.8

@bskiba

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 28, 2019
@k8s-ci-robot
Copy link
Contributor

Welcome @44past4!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 28, 2019
@44past4
Copy link
Contributor Author

44past4 commented Nov 28, 2019

/assign @wojtek-t

@@ -98,6 +98,8 @@ func calculateResources(numNodes uint64, resources []Resource) *corev1.ResourceR
overhead := computeResourceOverheadValueString(numNodes, r)
newRes := r.Base
newRes.Add(resource.MustParse(overhead))
// Force initialization of newRes.s so that it could be logged
newRes.String()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seriously? You really need?
How this is logged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this call this value can be logged as:
{i:{value:0 scale:0} d:{Dec:0xc4202b5d70} s: Format:DecimalSI}
WIth this call this becomes:
{i:{value:0 scale:0} d:{Dec:0xc4202b5d70} s:455m Format:DecimalSI}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you print it? If you would log it with:
klog.("%s", <your var>)
if would work fine (as this is calling String() method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it seems like a better way to print it - things like d:{Dec:0xc4202b5d70} aren't particularly useful :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is not printed directly but instead it is nested in k8s.io/api/core/v1/ResourceRequirements which is printed with %+v.
I will change it in a way that I will serialize ResourceRequirements to json before displaying it.

log.V(3).Infof("Resources are not within the expected limits, Actual: %+v, accepted range: %+v. Skipping resource update because of scale up/down delay", *resources, *expResources)
continue
}

log.Infof("Resources are not within the expected limits, updating the deployment. Actual: %+v Expected: %+v", *resources, *expResources)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you make this log consistent with the one above? Also could you add a log verbosity level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log message has been updated. I have not added log verbosity because I believe that this message will be logged only when actual resize will be performed and therefore it is ok to log this message without a verbosity level being set.

scaleDown operation = iota
scaleUp operation = iota
)

// checkResource determines whether a specific resource needs to be over-written.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@mwielgus mwielgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as on #2599

Please provide more information on the E2E tests performed.

log.V(4).Infof("Resources are within the expected limits. Actual: %+v Expected: %+v", *resources, *expResources)
continue
}

now := time.Now()
if (op == scaleDown && now.Before(lastChange.Add(scaleDownDelay))) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to the specific line in test where this thing is properly unit-tested. I don't see a line where we check what happens if last change is slightly before or slightly after scale****Delay. But maybe I'm missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have extracted logic for checking resources for deployment and updating them to a separate method and added unit tests for this.
When it comes E2E tests I have build an image with addon-resizer, deployed it to a Kubernetes cluster first without any delay being set and then with scale up and scale down being set and I have scaled cluster from 3 to 10 nodes and then back to 3 nodes.

@44past4 44past4 force-pushed the cooldown-1.8 branch 3 times, most recently from 5077ba6 to 893093b Compare December 17, 2019 07:18
override, op = checkResource(threshold, reqs, expReqs, resourceType)
if override {
return true, op
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This silently assumes that we don't go up in some resource and down on the other resource.

This is probably fine, but it requires an explicit comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a comment about the operation type to the function description.

if (op == scaleDown && now.Before(lastChange.Add(scaleDownDelay))) ||
(op == scaleUp && now.Before(lastChange.Add(scaleUpDelay))) {
if log.V(3) {
log.V(3).Infof("Resources are not within the expected limits, Actual: %+v, accepted range: %+v. Skipping resource update because of scale up/down delay", jsonOrValue(*resources), jsonOrValue(*expResources))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think this should be logged unconditionally - this is important log and lack of it may introduce a lot of misunderstanding - please change to simply log.Infof

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging this unconditionally could spam logs with hundreds of the same messages were only the first one is really useful for the user. Therefore I have decided to change the implementation in a way that I will log this message unconditionally but only for the first time we when we are skipping the the override.

@44past4 44past4 force-pushed the cooldown-1.8 branch 2 times, most recently from 7701275 to a007438 Compare December 18, 2019 08:23
// updateResources counts the number of nodes, estimates the expected
// ResourceRequirements, compares them to the actual ResourceRequirements, and
// updates the deployment with the expected ResourceRequirements if necessary.
// It returns true if deployment has been updated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the information about the returned values.

log.V(4).Infof("Resources are within the expected limits. Actual: %+v Expected: %+v", *resources, *expResources)
continue
// If there's a difference, go ahead and set the new values.
overwrite, op := shouldOverwriteResources(int64(threshold), resources.Limits, resources.Requests, expResources.Limits, expResources.Requests)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consistency - you're using overwrite in some places and override in other
Can you please unify it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed all occurrences of override to overwrite.

@44past4 44past4 force-pushed the cooldown-1.8 branch 2 times, most recently from 251e7ca to 5c35e71 Compare December 18, 2019 09:10
@wojtek-t
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 18, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2019
@k8s-ci-robot k8s-ci-robot merged commit 0aa7321 into kubernetes:addon-resizer-release-1.8 Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants