New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.6] Bug 1899406: HPA: Ignore deleted pods. #465
[release-4.6] Bug 1899406: HPA: Ignore deleted pods. #465
Conversation
When a pod is deleted, it is given a deletion timestamp. However the pod might still run for some time during graceful shutdown. During this time it might still produce CPU utilization metrics and be in a Running phase. Currently the HPA replica calculator attempts to ignore deleted pods by skipping over them. However by not adding them to the ignoredPods set, their metrics are not removed from the average utilization calculation. This allows pods in the process of shutting down to drag down the recommmended number of replicas by producing near 0% utilization metrics. In fact the ignoredPods set is misnomer. Those pods are not fully ignored. When the replica calculator recommends to scale up, 0% utilization metrics are filled in for those pods to limit the scale up. This prevents overscaling when pods take some time to startup. In fact, there should be 4 sets considered (readyPods, unreadyPods, missingPods, ignoredPods) not just 3. This change renames ignoredPods as unreadyPods and leaves the scaleup limiting semantics. Another set (actually) ignoredPods is added to which delete pods are added instead of being skipped during grouping. Both ignoredPods and unreadyPods have their metrics removed from consideration. But only unreadyPods have 0% utilization metrics filled in upon scaleup.
@openshift-cherrypick-robot: Bugzilla bug 1899405 has been cloned as Bugzilla bug 1899406. Retitling PR to link against new bug. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@openshift-cherrypick-robot: This pull request references Bugzilla bug 1899406, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @rphillips |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: openshift-cherrypick-robot, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/bugzilla refresh |
@joelsmith: This pull request references Bugzilla bug 1899406, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 6 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
@openshift-cherrypick-robot: All pull requests linked via external trackers have merged: Bugzilla bug 1899406 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Cherry pick #465 in cloud provider azure to 1.20: Cleanup subnet in frontend IP configs
Cherry pick #465 in cloud provider azure to 1.19: Cleanup subnet in frontend IP configs
This is an automated cherry-pick of #462
/assign joelsmith