Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: e2e: HPA ContainerResource - Lower requests b/c multiple containers will leave pending pods on existing test infra #104441

Merged

Conversation

jsturtevant
Copy link
Contributor

@jsturtevant jsturtevant commented Aug 18, 2021

What type of PR is this?

/kind failing-test

What this PR does / why we need it:

It lowers the requests for the HPA when testing the the side car resource consumption.

The existing tests Kubernetes e2e suite.[sig-autoscaling] [Feature:HPA] Horizontal pod autoscaling (scale resource: CPU) [Serial] [Slow] Deployment Should scale from 1 pod to 3 pods and from 3 to 5 works on the current infra with the same number for CPU requests.

Since the new tests adds a second container for each pod it double the amount of cpu required and causes the pod to go to pending and the test to timeout.

Which issue(s) this PR fixes:

Fixes #104427

Special notes for your reviewer:

Another option considered was to increase the testing infra VM size. I assume other might have smaller sized vms for testing as well and this would break others. This tests the same functionality but does so with out requiring a change infrastructure.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/sig testing
/sig autoscaling
/sig windows

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/windows Categorizes an issue or PR as relevant to SIG Windows. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 18, 2021
@k8s-ci-robot
Copy link
Contributor

@jsturtevant: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Aug 18, 2021
@jsturtevant
Copy link
Contributor Author

/cc @viveksyngh

@k8s-ci-robot
Copy link
Contributor

@jsturtevant: GitHub didn't allow me to request PR reviews from the following users: viveksyngh.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @viveksyngh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jsturtevant
Copy link
Contributor Author

/assign @bskiba

@marosset
Copy link
Contributor

I'm in favor of using the same values for Linux and Windows but if for some reason anyone objects we could leave the linux values as default and use these updated values if --node-os-distro=windows is passed to e2e.test.

@bskiba
Copy link
Member

bskiba commented Aug 19, 2021

@krzysied, @josephburnett would one of you be able to take a look?

@krzysied
Copy link
Contributor

/assign

targetCPUUtilizationPercent: 20,
minPods: 1,
maxPods: 5,
firstScale: 3,
firstScaleStasis: stasis,
cpuBurst: 700,
cpuBurst: 500,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why every other value was divided by 2 and this one was changed differently?
(I understand it works anyway cause we set the max pods - I want to understand why "scaling" the test was not an option)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I miss understood this value, I though it was the amount of cpu put on to the pods to force scaling the pods. I thought as long as this value is above the cpu request it would cause scaling. Leaving it at 700 would work as well but I though should dial back to keep in line with other values.

I want to understand why "scaling" the test was not an option

I don't understand what you mean by "scaling" the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed this value is the amount of cpu to consume on the container:

rc.ConsumeCPU(scaleTest.cpuBurst)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that burst number was calculated to trigger exactly 5 replicas (even if max pods number were higher). But it seems that it was previously adjusted for 7 replicas which is above the limit. My bad.

By "scaling" the test I meant diving each cpu value by 2, so the ratio will be conserved.

@krzysied
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 19, 2021
@jsturtevant
Copy link
Contributor Author

/assign @bskiba
for approval

@bskiba
Copy link
Member

bskiba commented Aug 20, 2021

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bskiba, jsturtevant, krzysied

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 20, 2021
@k8s-ci-robot k8s-ci-robot merged commit 64e422d into kubernetes:master Aug 20, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Aug 20, 2021
@marosset marosset added this to Done (v1.23) in SIG-Windows Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
SIG-Windows
  
Done (v1.23)
Development

Successfully merging this pull request may close these issues.

New HPA ContainerResource e2e tests failing on Windows jobs
5 participants