New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HPA e2e] Reduce possible number of scale steps to minimize stabilization test flakiness #116040
Conversation
Hi @pbeschetnov. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/ok-to-test
LGTM label has been added. Git tree hash: d19aaccd9abe3c760bc038e073a97cdc593d34c7
|
/retest flaky #116061 |
/retest |
/assign @mwielgus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mwielgus, pbeschetnov The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
/kind flake
What this PR does / why we need it:
This PR reduces HPA stabilization test flakines.
Which issue(s) this PR fixes:
In this test we scale up from 2 -> 4, and back 4 -> 2 using scaleUp/scaleDown stabilization windows = 3m.
Consider scale up: the resource consumer asks for CPU usage of 4 replicas, sometimes it's not that precise and it appears to consume only 3 replicas usage. So, after the stabilization windows passes, we scale up from 2 to 3. Then we have to wait another 3m to scale up to 4 replicas. The test expects to scale up in one step 2 -> 4 and spend 3m on that. In reality it is 2 -> 3 -> 4 (6m). Finally, the test timeouts because of that.
I propose to eliminate possible intermediate steps and scale always between 2 and 3 replicas. This doesn't sacrifice precision, because it's verified in other HPA tests.
Does this PR introduce a user-facing change?