Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start autoscaler in panic mode. #4795

Merged
merged 6 commits into from Jul 19, 2019

Conversation

vagababov
Copy link
Contributor

When autoscaler restarts it has no memory of the previous stats and immediately scales the deployment down. Under my load test conditions,
Autoscaler goes from 7 to 1 then to 2, only then to 6 pods, before finally reaching 7 after a few cycles.
This is obviously not a desired property, so this change fixes that behavior, by
starting Autoscaler in panic mode with panic pod count equal to the current pod count
if we have more than 1 serving pods right now.

Obviously if the deployment is scaled to 0, there's no reason to change logic
and if it is 1, then we won't scale below 1 anyway for the next stable window, so no need to panic either.

/assign @mattmoor @markusthoemmes

This is the GA scope of #2930

/lint

This starts autoscaler in panic mode and does not scale down deployment after autoscaler restart.

When autoscaler restarts it has no memory of the previous stats and immediately scales the deployment down. Under my load test conditions,
Autoscaler goes from 7 to 1 then to 2, only then to 6 pods, before finally reaching 7 after a few cycles.
This is obviously not a desired property, so this change fixes that behavior, by
starting Autoscaler in panic mode with panic pod count equal to the current pod count
if we have more than 1 serving pods right now.

Obviously if the deployment is scaled to 0, there's no reason to change logic
and if it is 1, then we won't scale below 1 anyway for the next stable window, so no need to panic either.

/assign @mattmoor @markusthoemmes

This is the GA scope of knative#2930
@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 18, 2019
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jul 18, 2019
@knative-prow-robot knative-prow-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 18, 2019
Copy link
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vagababov: 0 warnings.

In response to this:

When autoscaler restarts it has no memory of the previous stats and immediately scales the deployment down. Under my load test conditions,
Autoscaler goes from 7 to 1 then to 2, only then to 6 pods, before finally reaching 7 after a few cycles.
This is obviously not a desired property, so this change fixes that behavior, by
starting Autoscaler in panic mode with panic pod count equal to the current pod count
if we have more than 1 serving pods right now.

Obviously if the deployment is scaled to 0, there's no reason to change logic
and if it is 1, then we won't scale below 1 anyway for the next stable window, so no need to panic either.

/assign @mattmoor @markusthoemmes

This is the GA scope of #2930

/lint

This starts autoscaler in panic mode and does not scale down deployment after autoscaler restart.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple but very effective. Love it ❤️

@@ -24,6 +24,7 @@ import (
"sync"
"time"

"github.com/wacul/ptr"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂ Can we just add this to knative/pkg rather than juggling packages for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need it at all. A local variable will do as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work too :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/knative/pkg/pull/519/files
Well, as soon as matt does this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be done

@knative-prow-robot knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 18, 2019
@vagababov vagababov marked this pull request as ready for review July 18, 2019 22:44
@knative-prow-robot knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 18, 2019
@knative-prow-robot knative-prow-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 19, 2019
@knative-prow-robot knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 19, 2019
@vagababov
Copy link
Contributor Author

This is ready

Copy link
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 19, 2019
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattmoor, vagababov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 19, 2019
@vagababov
Copy link
Contributor Author

/test pull-knative-serving-integration-tests

hmmm.. 🚀

@knative-prow-robot knative-prow-robot merged commit b52b020 into knative:master Jul 19, 2019
@vagababov vagababov deleted the 2930-panic branch September 20, 2019 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/autoscale cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants