Modified proposal for configurable HPA to use a single field to specify allowed changes by arjunrn · Pull Request #1234 · kubernetes/enhancements

arjunrn · 2019-09-08T16:02:14Z

I have update the proposal with changes to make the configurable scaling behavior more intuitive and in line with suggestions received during when this proposal was initial made #853
The changes I have made are:

Remove the delaySeconds options for scale down and scale up. This option is mean to prevent flapping of the replicas. But as suggested in the original PR this is a stabilization feature and I have changed the name to indicate this. I also believe there is no reason anyone would want to stabilize while scaling up. In fact this is non-intuitive because it looks similar to the periodSeconds use to specify the scaling rules. So now there is only one place where the stabilizationWindow can be specified and it is at the same level as the scale up and scale down behavior. The delaySeconds field has been renamed to stabilizationWindowSeconds.
Renamed the constraints field to behavior because this configuration specifies how scaling behaves. And also it could be used in the future to specify other aspects of scaling like the tolerances.
Instead of specifying a pod and percent fields for scaling I have changed it to a single field called maxAllowedChange. This is similar to maxSurge and maxUnavailable in a deployment where the value is percent or absolute number of pods. I believe this more intuitive than have 2 fields and one of the values being chosen based on which one is higher at a certain point of time. The pod and percent changes can be specified as a list of policies and the user can specify which policy will be selected based on evaluation at runtime.

Note: There is one downside to the new scheme. It cannot replicate the current scale up behavior because the current behavior adds 4 pods if there are less than 4 pods and 100% of the pods after that. This behavior is currently unspecified in the documentation and probably nobody relies on it. Also this behavior will only change if the user explicitly uses the new behavior field.

k8s-ci-robot · 2019-09-08T16:02:15Z

Welcome @arjunrn!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2019-09-08T16:02:22Z

Hi @arjunrn. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

arjunrn · 2019-09-08T16:03:06Z

/assign @mwielgus

josephburnett · 2019-09-09T10:05:01Z

/assign @josephburnett

josephburnett · 2019-09-11T12:57:51Z

keps/sig-autoscaling/20190307-configurable-scale-velocity-for-hpa.md

+  previous recommendations to prevent flapping of the number of replicas.
+- `scaleUp` specifies the rules which are used to control scaling behavior while scaling up
+  - `periodSecond` the amount of time in seconds for which the rule should hold true.
+  - `maxAllowedChanged` the maximum allowed changed in replicas in the given period. Can be an absolute


I find it confusing to use a single field for both absolute and percentage values. How will you tell the difference? Is 0.5 a percentage or absolute value?

What about something more along the lines of @thockin's suggestion of having "Pods" and "Percent" policies?

constraints: scaleUp: # allow the max of 3 pods or 100% to be added every 15 seconds - policy: Pods value: 3 periodSeconds: 15 - policy: Percent value: 100 periodSeconds: 15 scaleDown: # allow the max of 2 pods or 20% to be removed every 15 seconds - policy: Pods value: 2 periodSeconds: 15 - policy: Percent value: 20 periodSeconds: 15 stabilizationWindowSeconds: 120 # downscale stabilization window of 2 minutes

Note: I changed the scaleDown field to be a list of policies so you can specify both Pods and Percent. This would be useful for:

getting the autoscaler off the ground (i.e. allowing increase of 1->4, but still limiting to 200%)

allowing for future policy (I don't know what).

I thought about this and the reason I went with a single value is because there is precedent for this as suggested by @liggitt here. Also the deployment controller currently uses the same type of value to specify increase in number of pods. Also there are already many hidden levers in the HPA controller which makes it harder for end-users to predict or analyze HPA behavior. Having a single value reduces the thinking you need to do when there are multiple policies.

Having multiple policies would make the HPA more flexible but at the moment I don't see the real need for it other than the use case of going from 1 -> 4 pods. Nor do I know if any type of workload which would need behavior like this. Also since we are going to split the code paths I don't think there is a need to preserve the old behavior, especially since it's not documented. As for a value like "0.5" we could look into how it is treated in the deployment controller.

That said if the maintainers think that multiple policies is the way to go then I will modify the proposal.

There are examples of policy lists in the autoscaling space, such as MetricSource.

Given that this will live in the same resource (HPA) as MetricTarget, I would prefer to follow that pattern over the Deployment pattern. I see your point about a single value being easier to understand, but I would prefer flexibility in this case.

I don't see the real need for it

A hypothetical example: what if there were a new policy which limits the core-hours used per day? A sort of "cost" constraint.

constraints: scaleUp: - policy: Cost cpuCoresHours: 240 # 10 cores per day on average periodSeconds: 86400 # budgeted over a 24 hour window

We don't have to design it now. But I would like to avoid an autoscaling v2beta3 by leaving space for it. 🤞

Signed-off-by: Arjun Naik <arjun.rn@gmail.com>

arjunrn · 2019-09-18T14:04:14Z

As discussed with @josephburnett I have updated the KEP so that multiple policies can be specified for scaling behavior. But default the highest change is chosen but this can be changed by specified a value for selectPolicy.

josephburnett

We should also add some graduation criteria.

keps/sig-autoscaling/20190307-configurable-scale-velocity-for-hpa.md

Additionally, fixed some typos and added one more use case why the stabilizationWindowSeconds in the scaleUp section might be useful

Put stabilizationWindowSeconds in the scaleUp/scaleDown section

Signed-off-by: Arjun Naik <arjun.rn@gmail.com>

josephburnett · 2019-10-09T14:10:42Z

keps/sig-autoscaling/20190307-configurable-scale-velocity-for-hpa.md

+```yaml
+behavior:
+  scaleUp:
+    stabilizationWindowSeconds: 300


Why not delay?

Well, originally, I named it "delay".
Though, it might be useful to show that we "stabilize recommendations". So it is not "just delay", it is "delay, and then react given previous recommendations".
So, "stabilization" does make sense.
I'm ok with any name, to be honest. :)

delay kind of implies it's a one time thing. stabilizationWindow would be more appropriate considering it's a moving window in which previous recommendations are considered to stabilize the scaling. I will explain this when I update the documentation for this feature.

keps/sig-autoscaling/20190307-configurable-scale-velocity-for-hpa.md

josephburnett · 2019-10-09T14:12:51Z

keps/sig-autoscaling/20190307-configurable-scale-velocity-for-hpa.md

+}
+
+type HPAScalingRules struct {
+  StabilizationWindowSeconds *int32


I think we were going to add Tolerance, even if it's just in the KEP and not implemented yet.

In the non-goals I mention tolerance as one of the non-goals.

josephburnett

/lgtm
/approve

Thanks @gliush and @arjunrn for preparing this KEP! Looking forward to the implementation.

Signed-off-by: Arjun Naik <arjun.rn@gmail.com>

josephburnett · 2019-10-10T09:35:03Z

/lgtm

josephburnett · 2019-10-10T09:39:06Z

/assign @mwielgus

josephburnett · 2019-10-10T12:10:31Z

keps/sig-autoscaling/20190307-configurable-scale-velocity-for-hpa.md

 creation-date: 2019-03-07
-last-updated: 2019-03-07
+last-updated: 2019-09-16
 status: provisional


You should change this to implementable

@gliush @arjunrn

Signed-off-by: Arjun Naik <arjun.rn@gmail.com>

josephburnett · 2019-10-10T13:19:09Z

/lgtm

mwielgus · 2019-10-14T12:47:57Z

/approve

k8s-ci-robot · 2019-10-14T12:48:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arjunrn, josephburnett, mwielgus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~keps/sig-autoscaling/OWNERS~~ [mwielgus]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 8, 2019

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 8, 2019

k8s-ci-robot requested review from jdumars and mwielgus September 8, 2019 16:02

k8s-ci-robot assigned mwielgus Sep 8, 2019

arjunrn changed the title ~~Modified proposal for configurable HPA to use a single field~~ Modified proposal for configurable HPA to use a single field to specify allowed changes Sep 9, 2019

k8s-ci-robot assigned josephburnett Sep 9, 2019

josephburnett suggested changes Sep 11, 2019

View reviewed changes

arjunrn force-pushed the update-configurable-hpa branch from aee4b6c to a5d553c Compare September 16, 2019 07:14

Update the Configurable HPA kep

b2a7ab6

Signed-off-by: Arjun Naik <arjun.rn@gmail.com>

arjunrn force-pushed the update-configurable-hpa branch from a5d553c to b2a7ab6 Compare September 18, 2019 13:45

josephburnett reviewed Sep 18, 2019

View reviewed changes

keps/sig-autoscaling/20190307-configurable-scale-velocity-for-hpa.md Show resolved Hide resolved

gliush reviewed Sep 21, 2019

View reviewed changes

keps/sig-autoscaling/20190307-configurable-scale-velocity-for-hpa.md Outdated Show resolved Hide resolved

gliush reviewed Sep 21, 2019

View reviewed changes

keps/sig-autoscaling/20190307-configurable-scale-velocity-for-hpa.md Show resolved Hide resolved

gliush and others added 3 commits October 8, 2019 00:57

Put stabilizationWindowSeconds in the scaleUp/scaleDown section

f156d62

Additionally, fixed some typos and added one more use case why the stabilizationWindowSeconds in the scaleUp section might be useful

Merge pull request #1 from gliush/gliush/update-configurable-hpa

c8920e8

Put stabilizationWindowSeconds in the scaleUp/scaleDown section

Updated with note about tolerances and graduation criteria

66cf043

Signed-off-by: Arjun Naik <arjun.rn@gmail.com>

josephburnett reviewed Oct 9, 2019

View reviewed changes

keps/sig-autoscaling/20190307-configurable-scale-velocity-for-hpa.md Outdated Show resolved Hide resolved

josephburnett reviewed Oct 9, 2019

View reviewed changes

josephburnett approved these changes Oct 9, 2019

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2019

Fixed typo

cedb725

Signed-off-by: Arjun Naik <arjun.rn@gmail.com>

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2019

Fixed spelling

9a4eb77

Signed-off-by: Arjun Naik <arjun.rn@gmail.com>

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2019

gliush approved these changes Oct 10, 2019

View reviewed changes

josephburnett reviewed Oct 10, 2019

View reviewed changes

josephburnett mentioned this pull request Oct 10, 2019

Configurable scale velocity for HPA #853

Closed

Changed KEP status to implementable

92fc931

Signed-off-by: Arjun Naik <arjun.rn@gmail.com>

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 14, 2019

k8s-ci-robot merged commit 4493118 into kubernetes:master Oct 14, 2019

k8s-ci-robot added this to the v1.17 milestone Oct 14, 2019

arjunrn deleted the update-configurable-hpa branch October 14, 2019 12:51

This was referenced Oct 14, 2019

Configurable HorizontalPodAutoscaler: business logic change gliush/kubernetes#2

Open

Configurable HorizontalPodAutoscaler kubernetes/kubernetes#74525

Merged

Conversation

arjunrn commented Sep 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Sep 8, 2019

Uh oh!

k8s-ci-robot commented Sep 8, 2019

Uh oh!

arjunrn commented Sep 8, 2019

Uh oh!

josephburnett commented Sep 9, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arjunrn commented Sep 18, 2019

Uh oh!

josephburnett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josephburnett left a comment

Choose a reason for hiding this comment

Uh oh!

josephburnett commented Oct 10, 2019

Uh oh!

josephburnett commented Oct 10, 2019

Uh oh!

josephburnett Oct 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josephburnett commented Oct 10, 2019

Uh oh!

mwielgus commented Oct 14, 2019

Uh oh!

k8s-ci-robot commented Oct 14, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

arjunrn commented Sep 8, 2019 •

edited

Loading

josephburnett Oct 10, 2019 •

edited

Loading