New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not count failed pods as unready in HPA controller #60648

Merged
merged 1 commit into from Mar 2, 2018

Conversation

@bskiba
Member

bskiba commented Mar 1, 2018

What this PR does / why we need it:
Currently, when performing a scale up, any failed pods (which can be present for example in case of evictions performed by kubelet) will be treated as unready. Unready pods are treated as if they had 0% utilization which will slow down or even block scale up.

After this change, failed pods are ignored in all calculations. This way they do not influence neither scale up nor scale down replica calculations.

@MaciekPytel @DirectXMan12

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #55630

Special notes for your reviewer:

Release note:

Stop counting failed pods as unready in HPA controller to avoid failed pods incorrectly affecting scale up replica count calculation.
Do not count failed pods as unready in HPA controller
Currently, when performing a scale up, any failed pods (which can be present for example in case of evictions performed by kubelet) will be treated as unready. Unready pods are treated as if they had 0% utilization which will slow down or even block scale up.

After this change, failed pods are ignored in all calculations. This way they do not influence neither scale up nor scale down replica calculations.
@DirectXMan12

This comment has been minimized.

Contributor

DirectXMan12 commented Mar 1, 2018

This should probably have a release note filled out, because it's a change in behavior.

@bskiba

This comment has been minimized.

Member

bskiba commented Mar 1, 2018

Fair point, added.

@bskiba

This comment has been minimized.

Member

bskiba commented Mar 1, 2018

@DirectXMan12 Since the bug seems to be quite an inconvenience (the only workaround I know of is to manually remove the evicted pods and since at least 1.7.5 the evicted pods seem to stay around for a fairly long amount of time - #55051 (comment)) do you think this could go into 1.10?

@DirectXMan12

This comment has been minimized.

Contributor

DirectXMan12 commented Mar 2, 2018

yeah, I'll add it to the milestone. This seems like it could prevent the HPA from working at all, which makes it a decently bad bug.

@DirectXMan12 DirectXMan12 added this to the v1.10 milestone Mar 2, 2018

@DirectXMan12

This comment has been minimized.

Contributor

DirectXMan12 commented Mar 2, 2018

/kind bug
/approve
/lgtm

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Mar 2, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bskiba, DirectXMan12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@DirectXMan12

This comment has been minimized.

Contributor

DirectXMan12 commented Mar 2, 2018

/sig autoscaling
/priority critical-urgent

@DirectXMan12

This comment has been minimized.

Contributor

DirectXMan12 commented Mar 2, 2018

/status approved-for-milestone

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Mar 2, 2018

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@DirectXMan12 @bskiba

Pull Request Labels
  • sig/autoscaling: Pull Request will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Help
@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Mar 2, 2018

Automatic merge from submit-queue (batch tested with PRs 60732, 60689, 60648, 60704). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit 30eb1aa into kubernetes:master Mar 2, 2018

13 checks passed

Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation bskiba authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment