Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable scale velocity for HPA #883

Merged

Conversation

@gliush
Copy link
Contributor

commented Mar 7, 2019

No description provided.

@k8s-ci-robot k8s-ci-robot requested review from jdumars and mwielgus Mar 7, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2019

Hi @gliush. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jdumars

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

/ok-to-test

@mwielgus

This comment has been minimized.

Copy link
Contributor

commented Mar 11, 2019

@thockin Could you review the api change here?

gliush added some commits Apr 1, 2019

@liggitt liggitt added this to In progress in API Reviews Apr 17, 2019

@liggitt liggitt moved this from In progress to Assigned in API Reviews Apr 17, 2019

Adds Hysteresis to the HPA constraints
* Adds "delaySeconds" field into the HPA constraints
* Makes sure that it could be configured by the "stabilization window"
command line argument
* Adds user story
* Fixes typos
Fixes small typos
Fixes typos
Use another default values
Some additional explanations added
Combines two pseudocode parts
Also adds new section for the delay option
@josephburnett

This comment has been minimized.

Copy link

commented Apr 29, 2019

/lgtm
@thockin @mwielgus I think this is good-to-go. Could you take a look?

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Apr 29, 2019

@josephburnett: changing LGTM is restricted to assignees, and only kubernetes/enhancements repo collaborators may be assigned issues.

In response to this:

/lgtm
@thockin @mwielgus I think this is good-to-go. Could you take a look?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gliush

This comment has been minimized.

Copy link
Contributor Author

commented Apr 29, 2019

@josephburnett:
We have a KEP freeze tomorrow, Apr 30 for k8s-1.15 (#853 (comment))
Is it possible to have this KEP approved so that I can merge it and start working on the implementation to make it part of 1.15?

@josephburnett

This comment has been minimized.

Copy link

commented Apr 30, 2019

@gliush it's not up to me whether it's approved or not. I've already pinged @thockin and @mwielgus that it's ready for review. 🤞

@mwielgus

This comment has been minimized.

Copy link
Contributor

commented May 1, 2019

/lgtm

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 1, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gliush, josephburnett, mwielgus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit f74637d into kubernetes:master May 1, 2019

3 checks passed

cla/linuxfoundation gliush authorized
Details
pull-enhancements-verify Job succeeded.
Details
tide In merge pool.
Details
@liggitt

This comment has been minimized.

Copy link
Member

commented May 1, 2019

a couple notes for clarity:

  • this merged still in provisional state, and the expectation is that things will be in implementable by feature freeze
  • I didn't see an ack from @thockin on the updates after his review (that doesn't necessarily have to happen by feature freeze, but an API review ack is still needed before an implementation merges)

@liggitt liggitt removed this from Assigned in API Reviews May 1, 2019

@thockin

This comment has been minimized.

Copy link
Member

commented May 1, 2019

A KEP should not be a full API review, so it seems appropriate that this merge without that final approval. It would be bad if the ultimate API was radically different than what is in the KEP, but the reality is that APIs almost always evolve as they are implemented :)

Re-reading now.

@thockin
Copy link
Member

left a comment

Mostly API comments to sort through as you finalize the UX. This is a complicated API, so I'll encourage you to seek out simplifications, even at the cost of some flexibility. We can always add stuff, but removing is very hard.


- `constraints`:
- `scaleUp`:
- `percent = 900` (i.e., to increase the number of pods 10 times per minute is ok).

This comment has been minimized.

Copy link
@thockin

thockin May 1, 2019

Member

For final API we may want to try a few variants to see what UX works best and is least confusing. E.g. if I want to allow 10x growth, does it make sense to say "grow by 900%" or "grow to 1000%" ?

For examples like "grow by 100%" it seems pretty obvious, just less so at larger numbers :)

type HPAScaleConstraintRateValue struct {
Pods *int32
Percent *int32
PeriodSeconds *int32

This comment has been minimized.

Copy link
@thockin

thockin May 1, 2019

Member

Is this smoothed over the period or done at edges? E.g. 3 pods per minute could be 1 pod per 20 seconds or 3 pods all at once after 60 seconds. Just specify the behavior.

This comment has been minimized.

Copy link
@thockin

thockin May 1, 2019

Member

Is it ok for pods and percent to share a period? e.g. do I need to be able to specify "3 pods per minute or 100% per 20 seconds" ?

Create an HPA with the following constraints:

- `constraints`:
- `scaleDown`:

This comment has been minimized.

Copy link
@thockin

thockin May 1, 2019

Member

This is a little clunky -- if there are ever new modes here (I don't have even a hypothetical example, but pretend :), then any YAML that carries this payload will not have that new field.

There are other API conventions being strengthened around one-of sets, which want a discriminator field, so maybe this looks like:

constraints:
  scaleDown:  # 3 pods per 10 minutes
    policy: Pods
    value: 3
    periodSeconds: 600
constraints:
  scaleDown:  # no scale-down allowed
    policy: Disabled

Or if pods and percent are not mutually exclusive (?) then something like:

constraints:
  scaleDown:  {} # no scale-down allowed

This changes the defaults from being on the leaf fields to being on the struct. If scaleDown is not specified, the default value is { percent: 100, pods: 4 }, but if it is specified, the fields inside default to 0.

Something like that.

- `constraints`:
- `scaleDown`:
- `pods = 5`
- `delaySeconds = 600`

This comment has been minimized.

Copy link
@thockin

thockin May 1, 2019

Member

is delay specific to down-scaling or also for up? The YAML listed here is confusing


```golang
type HPAScaleConstraintValue struct {
Rate *HPAScaleConstraintRateValue

This comment has been minimized.

Copy link
@thockin

thockin May 1, 2019

Member

what does it mean if this is not specified? Make sure to document and think through all optionality

MinReplicas *int32
MaxReplicas int32
Metrics []MetricSpec
Constraints *HPAScaleConstraints

This comment has been minimized.

Copy link
@thockin

thockin May 1, 2019

Member

FWIW I still dislike "constraints" as a term here. "policy" or somethng makes more sense to me, but I probably won't fight very hard on this

```golang
type HPAScaleConstraintValue struct {
Rate *HPAScaleConstraintRateValue
DelaySeconds *int32

This comment has been minimized.

Copy link
@thockin

thockin May 1, 2019

Member

if I understand, this is not a "delay" as much as a "window" ? Please think about the name that best captures the semantics.

Also document bounds. Can I set a delay of 14 days? How much buffer are we willing to allocate (and lose if the controller crashes) ?


Example for `CurReplicas = 10` and HPA controller cycle once per a minute:

- First 9 minutes the algorithm will do nothing except gathering recommendations.

This comment has been minimized.

Copy link
@thockin

thockin May 1, 2019

Member

where are these stored? If the controller goes down, is it all lost? That should feedback into our API limits so we don't promise to store too much, and then cause a huge problem when the controller dies for whatever reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.