Skip to content

Modified proposal for configurable HPA to use a single field to specify allowed changes#1234

Merged
k8s-ci-robot merged 7 commits intokubernetes:masterfrom
arjunrn:update-configurable-hpa
Oct 14, 2019
Merged

Modified proposal for configurable HPA to use a single field to specify allowed changes#1234
k8s-ci-robot merged 7 commits intokubernetes:masterfrom
arjunrn:update-configurable-hpa

Conversation

@arjunrn
Copy link
Contributor

@arjunrn arjunrn commented Sep 8, 2019

I have update the proposal with changes to make the configurable scaling behavior more intuitive and in line with suggestions received during when this proposal was initial made #853
The changes I have made are:

  1. Remove the delaySeconds options for scale down and scale up. This option is mean to prevent flapping of the replicas. But as suggested in the original PR this is a stabilization feature and I have changed the name to indicate this. I also believe there is no reason anyone would want to stabilize while scaling up. In fact this is non-intuitive because it looks similar to the periodSeconds use to specify the scaling rules. So now there is only one place where the stabilizationWindow can be specified and it is at the same level as the scale up and scale down behavior. The delaySeconds field has been renamed to stabilizationWindowSeconds.
  2. Renamed the constraints field to behavior because this configuration specifies how scaling behaves. And also it could be used in the future to specify other aspects of scaling like the tolerances.
  3. Instead of specifying a pod and percent fields for scaling I have changed it to a single field called maxAllowedChange. This is similar to maxSurge and maxUnavailable in a deployment where the value is percent or absolute number of pods. I believe this more intuitive than have 2 fields and one of the values being chosen based on which one is higher at a certain point of time. The pod and percent changes can be specified as a list of policies and the user can specify which policy will be selected based on evaluation at runtime.

Note: There is one downside to the new scheme. It cannot replicate the current scale up behavior because the current behavior adds 4 pods if there are less than 4 pods and 100% of the pods after that. This behavior is currently unspecified in the documentation and probably nobody relies on it. Also this behavior will only change if the user explicitly uses the new behavior field.

@k8s-ci-robot
Copy link
Contributor

Welcome @arjunrn!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 8, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @arjunrn. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 8, 2019
@arjunrn
Copy link
Contributor Author

arjunrn commented Sep 8, 2019

/assign @mwielgus

@arjunrn arjunrn changed the title Modified proposal for configurable HPA to use a single field Modified proposal for configurable HPA to use a single field to specify allowed changes Sep 9, 2019
@josephburnett
Copy link
Contributor

/assign @josephburnett

previous recommendations to prevent flapping of the number of replicas.
- `scaleUp` specifies the rules which are used to control scaling behavior while scaling up
- `periodSecond` the amount of time in seconds for which the rule should hold true.
- `maxAllowedChanged` the maximum allowed changed in replicas in the given period. Can be an absolute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it confusing to use a single field for both absolute and percentage values. How will you tell the difference? Is 0.5 a percentage or absolute value?

What about something more along the lines of @thockin's suggestion of having "Pods" and "Percent" policies?

constraints:
  scaleUp: # allow the max of 3 pods or 100% to be added every 15 seconds
  - policy: Pods
    value: 3
    periodSeconds: 15
  - policy: Percent
    value: 100
    periodSeconds: 15
  scaleDown: # allow the max of 2 pods or 20% to be removed every 15 seconds
  - policy: Pods
    value: 2
    periodSeconds: 15
  - policy: Percent
    value: 20
    periodSeconds: 15
  stabilizationWindowSeconds: 120 # downscale stabilization window of 2 minutes

Note: I changed the scaleDown field to be a list of policies so you can specify both Pods and Percent. This would be useful for:

  1. getting the autoscaler off the ground (i.e. allowing increase of 1->4, but still limiting to 200%)
  2. allowing for future policy (I don't know what).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this and the reason I went with a single value is because there is precedent for this as suggested by @liggitt here. Also the deployment controller currently uses the same type of value to specify increase in number of pods. Also there are already many hidden levers in the HPA controller which makes it harder for end-users to predict or analyze HPA behavior. Having a single value reduces the thinking you need to do when there are multiple policies.

Having multiple policies would make the HPA more flexible but at the moment I don't see the real need for it other than the use case of going from 1 -> 4 pods. Nor do I know if any type of workload which would need behavior like this. Also since we are going to split the code paths I don't think there is a need to preserve the old behavior, especially since it's not documented. As for a value like "0.5" we could look into how it is treated in the deployment controller.

That said if the maintainers think that multiple policies is the way to go then I will modify the proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are examples of policy lists in the autoscaling space, such as MetricSource.

Given that this will live in the same resource (HPA) as MetricTarget, I would prefer to follow that pattern over the Deployment pattern. I see your point about a single value being easier to understand, but I would prefer flexibility in this case.

I don't see the real need for it

A hypothetical example: what if there were a new policy which limits the core-hours used per day? A sort of "cost" constraint.

constraints:
  scaleUp:
  - policy: Cost
    cpuCoresHours: 240    # 10 cores per day on average
    periodSeconds: 86400  # budgeted over a 24 hour window

We don't have to design it now. But I would like to avoid an autoscaling v2beta3 by leaving space for it. 🤞

@arjunrn arjunrn force-pushed the update-configurable-hpa branch from aee4b6c to a5d553c Compare September 16, 2019 07:14
Signed-off-by: Arjun Naik <arjun.rn@gmail.com>
@arjunrn arjunrn force-pushed the update-configurable-hpa branch from a5d553c to b2a7ab6 Compare September 18, 2019 13:45
@arjunrn
Copy link
Contributor Author

arjunrn commented Sep 18, 2019

As discussed with @josephburnett I have updated the KEP so that multiple policies can be specified for scaling behavior. But default the highest change is chosen but this can be changed by specified a value for selectPolicy.

Copy link
Contributor

@josephburnett josephburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add some graduation criteria.

gliush and others added 3 commits October 8, 2019 00:57
Additionally, fixed some typos and added one more use case
why the stabilizationWindowSeconds in the scaleUp section might be
useful
Put stabilizationWindowSeconds in the scaleUp/scaleDown section
Signed-off-by: Arjun Naik <arjun.rn@gmail.com>
```yaml
behavior:
scaleUp:
stabilizationWindowSeconds: 300
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not delay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, originally, I named it "delay".
Though, it might be useful to show that we "stabilize recommendations". So it is not "just delay", it is "delay, and then react given previous recommendations".
So, "stabilization" does make sense.
I'm ok with any name, to be honest. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delay kind of implies it's a one time thing. stabilizationWindow would be more appropriate considering it's a moving window in which previous recommendations are considered to stabilize the scaling. I will explain this when I update the documentation for this feature.

}

type HPAScalingRules struct {
StabilizationWindowSeconds *int32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we were going to add Tolerance, even if it's just in the KEP and not implemented yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the non-goals I mention tolerance as one of the non-goals.

Copy link
Contributor

@josephburnett josephburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks @gliush and @arjunrn for preparing this KEP! Looking forward to the implementation.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2019
Signed-off-by: Arjun Naik <arjun.rn@gmail.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2019
Signed-off-by: Arjun Naik <arjun.rn@gmail.com>
@josephburnett
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2019
@josephburnett
Copy link
Contributor

/assign @mwielgus

creation-date: 2019-03-07
last-updated: 2019-03-07
last-updated: 2019-09-16
status: provisional
Copy link
Contributor

@josephburnett josephburnett Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should change this to implementable

@gliush @arjunrn

Signed-off-by: Arjun Naik <arjun.rn@gmail.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2019
@josephburnett
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2019
@mwielgus
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arjunrn, josephburnett, mwielgus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 14, 2019
@k8s-ci-robot k8s-ci-robot merged commit 4493118 into kubernetes:master Oct 14, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Oct 14, 2019
@arjunrn arjunrn deleted the update-configurable-hpa branch October 14, 2019 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants