Fix HPA feedback from writing status.replicas to spec.replicas. #79035

josephburnett · 2019-06-14T14:30:53Z

/kind bug

What this PR does / why we need it:

There are various reasons that the HPA will decide not the change the current scale. Two important ones are when missing metrics might change the direction of scaling, and when the recommended scale is within tolerance of the current scale.

The way that ReplicaCalculator signals it's desire to not change the current scale is by returning the current scale. However the current scale is from scale.Status.Replicas and can be larger than scale.Spec.Replicas (e.g. during Deployment rollout with configured surge). This causes a positive feedback loop because scale.Status.Replicas is written back into scale.Spec.Replicas, further increasing the current scale.

This PR fixes the feedback loop by plumbing the replica count from spec through horizontal.go and replica_calculator.go so the calculator can punt with the right value.

It also introduces separate types for replica counts derived from scale.Spec and scale.Status (specReplicas and statusReplicas respectively) to guard against this kind of cross-talk. When returning a desired scale to be written into spec, the calculator must either return the given, current scale, or explicitly call newSpecReplicas.

With separate types, other sources of cross-talk became compiler errors. E.g. recording Status.Replicas as an initial recommendation. This would manifest if a deployment was in the process of rolling out when the HPA reboots.

Which issue(s) this PR fixes:

Fixes #78712
Fixes #72775

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

k8s-ci-robot · 2019-06-14T14:30:59Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: login-issues@jira.linuxfoundation.org

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot · 2019-06-14T14:31:01Z

Hi @josephburnett. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mwielgus · 2019-06-14T16:24:41Z

/ok-to-test

josephburnett · 2019-06-14T16:59:03Z

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

New comment.

josephburnett · 2019-06-14T17:21:05Z

If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.

Done.

josephburnett · 2019-06-14T18:49:28Z

/test pull-kubernetes-integration

josephburnett · 2019-06-14T18:50:15Z

/test pull-kubernetes-verify

krzysztof-jastrzebski · 2019-07-02T15:21:10Z

/lgtm

Cherry pick #79035 to 1.13 (Fix HPA feedback from writing status.replicas to spec.replicas)

Cherry pick #79035 to 1.14 (Fix HPA feedback from writing status.replicas to spec.replicas)

…-#79035-upstream-release-1.15 Automated cherry pick of #79035: There are various reasons that the HPA will decide not the

juanpmarin · 2019-07-16T13:21:30Z

Hi, is this currently released ?

Handle replica counts derived from spec and status as separate types so we don't accidentally write observed replicas from status back into spec, causing a positive feedback loop (kubernetes#79035).

During a Deployment update there may be more Pods in the scale target ref status than in the spec. This test verifies that we do not scale to the status value. Instead we should stay at the spec value. Fails before kubernetes#79035 and passes after.

austinpray · 2019-08-28T23:13:22Z

Hi, is this currently released ?

@juanpmarin it looks like this is gonna be in 1.16+

λ  kubernetes master ✓ git tag --contains 39c4875321991f305d51e30481a66701b6b76f5f
v1.16.0-alpha.1
v1.16.0-alpha.2
v1.16.0-alpha.3
v1.16.0-beta.0
v1.16.0-beta.1
v1.17.0-alpha.0

kimxogus · 2019-08-29T01:08:30Z

@juanpmarin This is also cherry picked to 1.13(#79707), 1.14(#79708), 1.15(#79709, #79727). Latest version of them may have this fix too.

paalkr · 2019-09-29T07:17:58Z

I would just like to draw some attention to #72775. Many are still facing issues with rolling updates.

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 14, 2019

k8s-ci-robot requested review from fgrzadkowski and MaciekPytel June 14, 2019 14:31

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 14, 2019

josephburnett mentioned this pull request Jun 14, 2019

HPA scales up with no reason / blank reason #78712

Closed

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 14, 2019

mwielgus requested review from mwielgus and removed request for fgrzadkowski June 14, 2019 16:25

josephburnett force-pushed the hparunaway branch from 2df8e2e to d63688d Compare June 14, 2019 16:57

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 14, 2019

mwielgus added this to the v1.15 milestone Jun 14, 2019

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 14, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 2, 2019

k8s-ci-robot merged commit cf7662d into kubernetes:master Jul 2, 2019

k8s-ci-robot added a commit that referenced this pull request Jul 3, 2019

Merge pull request #79707 from josephburnett/release-1.13

04b7e66

Cherry pick #79035 to 1.13 (Fix HPA feedback from writing status.replicas to spec.replicas)

josephburnett mentioned this pull request Jul 3, 2019

Automated cherry pick of #79035: There are various reasons that the HPA will decide not the #79727

Merged

k8s-ci-robot added a commit that referenced this pull request Jul 3, 2019

Merge pull request #79708 from josephburnett/release-1.14

d94c7dd

Cherry pick #79035 to 1.14 (Fix HPA feedback from writing status.replicas to spec.replicas)

k8s-ci-robot added a commit that referenced this pull request Jul 4, 2019

Merge pull request #79727 from josephburnett/automated-cherry-pick-of…

2be156d

…-#79035-upstream-release-1.15 Automated cherry pick of #79035: There are various reasons that the HPA will decide not the

This was referenced Jul 12, 2019

Add josephburnett to podautoscaler OWNERS. #80077

Merged

Configurable HorizontalPodAutoscaler #74525

Merged

josephburnett mentioned this pull request Jul 12, 2019

Separate spec and status replicas by type. #80097

Closed

josephburnett pushed a commit to josephburnett/kubernetes that referenced this pull request Jul 24, 2019

DO NOT SUBMIT unfixing kubernetes#79035 for e2e testing.

366882c

josephburnett mentioned this pull request Aug 6, 2019

Test more replicas than spec. #81019

Merged

shukla2112 mentioned this pull request Sep 18, 2019

pod hpa would create extra pods during deployment rolling update when there is no load at all during the rolling upgrade #72775

Closed

paalkr mentioned this pull request Oct 21, 2019

Rolling update of deployments creates MAX number of pods X 2 according to HPA, regardless of current load #84142

Closed

josephburnett mentioned this pull request Jun 21, 2021

fix:hpaController set hpa DesiredReplicas=0 in corner case when can not compute or get metrics #102728

Closed

josephburnett mentioned this pull request Jun 2, 2022

fix: use scale.Status.Replicas to get the target's current scale #108300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix HPA feedback from writing status.replicas to spec.replicas. #79035

Fix HPA feedback from writing status.replicas to spec.replicas. #79035

josephburnett commented Jun 14, 2019 •

edited

Loading

k8s-ci-robot commented Jun 14, 2019

k8s-ci-robot commented Jun 14, 2019

mwielgus commented Jun 14, 2019

josephburnett commented Jun 14, 2019

josephburnett commented Jun 14, 2019

josephburnett commented Jun 14, 2019

josephburnett commented Jun 14, 2019

krzysztof-jastrzebski commented Jul 2, 2019

juanpmarin commented Jul 16, 2019

austinpray commented Aug 28, 2019

kimxogus commented Aug 29, 2019 •

edited

Loading

paalkr commented Sep 29, 2019

Fix HPA feedback from writing status.replicas to spec.replicas. #79035

Fix HPA feedback from writing status.replicas to spec.replicas. #79035

Conversation

josephburnett commented Jun 14, 2019 • edited Loading

k8s-ci-robot commented Jun 14, 2019

k8s-ci-robot commented Jun 14, 2019

mwielgus commented Jun 14, 2019

josephburnett commented Jun 14, 2019

josephburnett commented Jun 14, 2019

josephburnett commented Jun 14, 2019

josephburnett commented Jun 14, 2019

krzysztof-jastrzebski commented Jul 2, 2019

juanpmarin commented Jul 16, 2019

austinpray commented Aug 28, 2019

kimxogus commented Aug 29, 2019 • edited Loading

paalkr commented Sep 29, 2019

josephburnett commented Jun 14, 2019 •

edited

Loading

kimxogus commented Aug 29, 2019 •

edited

Loading