Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HPA feedback from writing status.replicas to spec.replicas. #79035

Merged
merged 1 commit into from Jul 2, 2019

Conversation

@josephburnett
Copy link
Contributor

josephburnett commented Jun 14, 2019

/kind bug

What this PR does / why we need it:

There are various reasons that the HPA will decide not the change the current scale. Two important ones are when missing metrics might change the direction of scaling, and when the recommended scale is within tolerance of the current scale.

The way that ReplicaCalculator signals it's desire to not change the current scale is by returning the current scale. However the current scale is from scale.Status.Replicas and can be larger than scale.Spec.Replicas (e.g. during Deployment rollout with configured surge). This causes a positive feedback loop because scale.Status.Replicas is written back into scale.Spec.Replicas, further increasing the current scale.

This PR fixes the feedback loop by plumbing the replica count from spec through horizontal.go and replica_calculator.go so the calculator can punt with the right value.

It also introduces separate types for replica counts derived from scale.Spec and scale.Status (specReplicas and statusReplicas respectively) to guard against this kind of cross-talk. When returning a desired scale to be written into spec, the calculator must either return the given, current scale, or explicitly call newSpecReplicas.

With separate types, other sources of cross-talk became compiler errors. E.g. recording Status.Replicas as an initial recommendation. This would manifest if a deployment was in the process of rolling out when the HPA reboots.

Which issue(s) this PR fixes:

Fixes #78712
Fixes #72775

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jun 14, 2019

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jun 14, 2019

Hi @josephburnett. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mwielgus

This comment has been minimized.

Copy link
Contributor

mwielgus commented Jun 14, 2019

/ok-to-test

@mwielgus mwielgus requested review from mwielgus and removed request for fgrzadkowski Jun 14, 2019
@josephburnett josephburnett force-pushed the josephburnett:hparunaway branch from 2df8e2e to d63688d Jun 14, 2019
@josephburnett

This comment has been minimized.

Copy link
Contributor Author

josephburnett commented Jun 14, 2019

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

New comment.

@josephburnett

This comment has been minimized.

Copy link
Contributor Author

josephburnett commented Jun 14, 2019

If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.

Done.

@josephburnett

This comment has been minimized.

Copy link
Contributor Author

josephburnett commented Jun 14, 2019

/test pull-kubernetes-integration

@josephburnett

This comment has been minimized.

Copy link
Contributor Author

josephburnett commented Jun 14, 2019

/test pull-kubernetes-verify

@krzysztof-jastrzebski

This comment has been minimized.

Copy link
Contributor

krzysztof-jastrzebski commented Jul 2, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Jul 2, 2019
@k8s-ci-robot k8s-ci-robot merged commit cf7662d into kubernetes:master Jul 2, 2019
23 checks passed
23 checks passed
cla/linuxfoundation josephburnett authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
k8s-ci-robot added a commit that referenced this pull request Jul 3, 2019
Cherry pick #79035 to 1.13 (Fix HPA feedback from writing status.replicas to spec.replicas)
k8s-ci-robot added a commit that referenced this pull request Jul 3, 2019
Cherry pick #79035 to 1.14 (Fix HPA feedback from writing status.replicas to spec.replicas)
k8s-ci-robot added a commit that referenced this pull request Jul 4, 2019
…-#79035-upstream-release-1.15

Automated cherry pick of #79035: There are various reasons that the HPA will decide not the
josephburnett added a commit to josephburnett/kubernetes that referenced this pull request Jul 12, 2019
Handle replica counts derived from spec and status as separate types
so we don't accidentally write observed replicas from status back into
spec, causing a positive feedback loop (kubernetes#79035).
@juanpmarin

This comment has been minimized.

Copy link

juanpmarin commented Jul 16, 2019

Hi, is this currently released ?

josephburnett added a commit to josephburnett/kubernetes that referenced this pull request Jul 18, 2019
Handle replica counts derived from spec and status as separate types
so we don't accidentally write observed replicas from status back into
spec, causing a positive feedback loop (kubernetes#79035).
josephburnett added a commit to josephburnett/kubernetes that referenced this pull request Jul 24, 2019
josephburnett added a commit to josephburnett/kubernetes that referenced this pull request Aug 6, 2019
During a Deployment update there may be more Pods in the scale target
ref status than in the spec. This test verifies that we do not scale
to the status value. Instead we should stay at the spec value.

Fails before kubernetes#79035 and passes after.
max88991 pushed a commit to max88991/kubernetes that referenced this pull request Aug 22, 2019
During a Deployment update there may be more Pods in the scale target
ref status than in the spec. This test verifies that we do not scale
to the status value. Instead we should stay at the spec value.

Fails before kubernetes#79035 and passes after.
gnufied added a commit to gnufied/kubernetes that referenced this pull request Aug 22, 2019
During a Deployment update there may be more Pods in the scale target
ref status than in the spec. This test verifies that we do not scale
to the status value. Instead we should stay at the spec value.

Fails before kubernetes#79035 and passes after.
@austinpray

This comment has been minimized.

Copy link

austinpray commented Aug 28, 2019

Hi, is this currently released ?

@juanpmarin it looks like this is gonna be in 1.16+

λ  kubernetes master ✓ git tag --contains 39c4875321991f305d51e30481a66701b6b76f5f
v1.16.0-alpha.1
v1.16.0-alpha.2
v1.16.0-alpha.3
v1.16.0-beta.0
v1.16.0-beta.1
v1.17.0-alpha.0
@kimxogus

This comment has been minimized.

Copy link

kimxogus commented Aug 29, 2019

@juanpmarin This is also cherry picked to 1.13(#79707), 1.14(#79708), 1.15(#79709, #79727). Latest version of them may have this fix too.

@paalkr

This comment has been minimized.

Copy link

paalkr commented Sep 29, 2019

I would just like to draw some attention to #72775. Many are still facing issues with rolling updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.