[e2e] Increase the vCPU quota limit for EC2 instances #3002

Ankitasw · 2021-12-06T15:18:13Z

What type of PR is this?
/kind failing-test

What this PR does / why we need it:
This PR fixes the vCPU quota limit issues while executing E2E upstream.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

squashed commits
includes documentation
adds unit tests
adds or updates e2e tests

Release note:

None

Ankitasw · 2021-12-06T15:45:02Z

/priority critical-urgent
/triage accepted

sedefsavas · 2021-12-06T16:18:36Z

test/e2e/shared/defaults.go

+		ServiceCode:         "ec2",
+		QuotaName:           "Running On-Demand G and VT instances",
+		QuotaCode:           "L-DB2E81BA",
+		DesiredMinimumValue: 32,


Do we need this many vCPUs? For 1 GPU node --> 4 vCPU enough.

Your current vCPU limit is Running On-Demand All G instances = 8 vCPU. A g4dn.xlarge instance has a footprint of 4 vCPU. Your account can already run two g4dn.xlarge instances, based on your G instances limit of 8.

I have reduced it to 8 for now

Ankitasw · 2021-12-06T17:01:38Z

/test pull-cluster-api-provider-aws-e2e

sedefsavas · 2021-12-06T23:13:25Z

Test took 5 hours and timed out.

Ankitasw · 2021-12-07T08:55:35Z

Test took 5 hours and timed out.

Yeah, that is weird, is it because we added some wrong value? Couldn't figure out the problem in local run as it works fine with my AWS account.

Ankitasw · 2021-12-07T09:01:59Z

/retest

Ankitasw · 2021-12-07T09:17:33Z

@sedefsavas I see below log

Requesting service quota increase for ec2/Running On-Demand G and VT instances to 8

But again its getting stuck at same place while acquiring resources, not sure how should we proceed

sedefsavas · 2021-12-07T13:17:11Z

This is the problem:

classiclb: 20
ec2: 0
eip: 100
igw: 20
ngw: 20
vpc: 20

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/3002/pull-cluster-api-provider-aws-e2e/1467902093854511104/artifacts/initial-resource-quotas.yaml

Until we figure out what the problem is, let's not run e2e test here. Takes 5 hours and blocks other PR's e2e tests.
I'd suggest to check if the change works locally by looking at your own AWS account to see if you can increase ec2 instances quota.

sedefsavas · 2021-12-07T13:24:33Z

test/e2e/shared/defaults.go

@@ -154,6 +154,13 @@ func getLimitedResources() map[string]*ServiceQuota {
 		QuotaCode:           "L-E9E9831D",
 		DesiredMinimumValue: 20,
 	}
+
+	serviceQuotas["ec2"] = &ServiceQuota{


crit. serviceQuotas is a map, it is being overwritten here as we already set it for regular instances above. Need to have a different name for normal instances and GPU ones.

sedefsavas · 2021-12-07T13:24:53Z

Also, we should update initial resources file as ec2-normal and ec2-GPU, since their quotas are different.

sedefsavas · 2021-12-08T08:55:13Z

Disabled for now: #3007

Ankitasw · 2021-12-08T17:56:51Z

I'd suggest to check if the change works locally by looking at your own AWS account to see if you can increase ec2 instances quota.

Current change works fine locally as the resource quota limit is sufficient in my AWS account, that's why trying to trigger and check in PR, looks like it should go through now

Ankitasw · 2021-12-09T08:09:06Z

/test pull-cluster-api-provider-aws-e2e

Ankitasw · 2021-12-10T14:10:41Z

/retest

Ankitasw · 2021-12-13T09:50:11Z

/test pull-cluster-api-provider-aws-e2e

Ankitasw · 2021-12-13T14:30:03Z

/assign @sedefsavas for approval

k8s-ci-robot · 2021-12-13T14:30:05Z

@Ankitasw: GitHub didn't allow me to assign the following users: for, approval.

Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @sedefsavas for approval

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sedefsavas · 2021-12-13T20:28:25Z

/lgtm
/approve

k8s-ci-robot · 2021-12-13T20:28:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sedefsavas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sedefsavas]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

[e2e] Increase the vCPU quota limit for EC2 instances

k8s-ci-robot requested review from dlipovetsky and richardcase December 6, 2021 15:18

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 6, 2021

Ankitasw force-pushed the e2e-gpu-test branch from 493016b to 479c309 Compare December 6, 2021 15:19

sedefsavas reviewed Dec 6, 2021

View reviewed changes

Ankitasw force-pushed the e2e-gpu-test branch from 479c309 to 4177f00 Compare December 6, 2021 16:45

Ankitasw requested a review from sedefsavas December 6, 2021 16:46

sedefsavas reviewed Dec 7, 2021

View reviewed changes

Ankitasw force-pushed the e2e-gpu-test branch from 4177f00 to b8c4f9e Compare December 8, 2021 17:50

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 8, 2021

Ankitasw requested a review from sedefsavas December 8, 2021 17:50

pydctw mentioned this pull request Dec 8, 2021

Fix AWSMachine controller trying to update rootVolume device name #3011

Merged

4 tasks

Ankitasw force-pushed the e2e-gpu-test branch from b8c4f9e to 49ec752 Compare December 9, 2021 07:53

Ankitasw changed the title ~~[e2e] Increase the vCPU quota limit for G instances~~ [e2e] Increase the vCPU quota limit for EC2 instances Dec 9, 2021

Ankitasw force-pushed the e2e-gpu-test branch from 49ec752 to e3025a1 Compare December 9, 2021 08:04

Ankitasw force-pushed the e2e-gpu-test branch 2 times, most recently from de3e511 to d1c333b Compare December 10, 2021 14:03

Ankitasw force-pushed the e2e-gpu-test branch 2 times, most recently from fb61838 to 50425d1 Compare December 13, 2021 09:48

[e2e] Increase the vCPU quota limit for EC2 instances

e2aa26f

Ankitasw force-pushed the e2e-gpu-test branch from 50425d1 to e2aa26f Compare December 13, 2021 14:07

k8s-ci-robot assigned sedefsavas Dec 13, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 13, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 13, 2021

k8s-ci-robot merged commit 0672892 into kubernetes-sigs:main Dec 13, 2021

k8s-ci-robot added this to the v1.x milestone Dec 13, 2021

Ankitasw deleted the e2e-gpu-test branch December 14, 2021 04:44

richardchen-db pushed a commit to databricks/cluster-api-provider-aws-1 that referenced this pull request Jan 14, 2023

Merge pull request kubernetes-sigs#3002 from Ankitasw/e2e-gpu-test

dc4f58f

[e2e] Increase the vCPU quota limit for EC2 instances

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[e2e] Increase the vCPU quota limit for EC2 instances #3002

[e2e] Increase the vCPU quota limit for EC2 instances #3002

Ankitasw commented Dec 6, 2021 •

edited

Loading

Ankitasw commented Dec 6, 2021

sedefsavas Dec 6, 2021

Ankitasw Dec 6, 2021

Ankitasw commented Dec 6, 2021

sedefsavas commented Dec 6, 2021

Ankitasw commented Dec 7, 2021 •

edited

Loading

Ankitasw commented Dec 7, 2021

Ankitasw commented Dec 7, 2021 •

edited

Loading

sedefsavas commented Dec 7, 2021 •

edited

Loading

sedefsavas Dec 7, 2021

sedefsavas commented Dec 7, 2021

sedefsavas commented Dec 8, 2021

Ankitasw commented Dec 8, 2021 •

edited

Loading

Ankitasw commented Dec 9, 2021

Ankitasw commented Dec 10, 2021

Ankitasw commented Dec 13, 2021

Ankitasw commented Dec 13, 2021

k8s-ci-robot commented Dec 13, 2021

sedefsavas commented Dec 13, 2021

k8s-ci-robot commented Dec 13, 2021

[e2e] Increase the vCPU quota limit for EC2 instances #3002

[e2e] Increase the vCPU quota limit for EC2 instances #3002

Conversation

Ankitasw commented Dec 6, 2021 • edited Loading

Ankitasw commented Dec 6, 2021

sedefsavas Dec 6, 2021

Choose a reason for hiding this comment

Ankitasw Dec 6, 2021

Choose a reason for hiding this comment

Ankitasw commented Dec 6, 2021

sedefsavas commented Dec 6, 2021

Ankitasw commented Dec 7, 2021 • edited Loading

Ankitasw commented Dec 7, 2021

Ankitasw commented Dec 7, 2021 • edited Loading

sedefsavas commented Dec 7, 2021 • edited Loading

sedefsavas Dec 7, 2021

Choose a reason for hiding this comment

sedefsavas commented Dec 7, 2021

sedefsavas commented Dec 8, 2021

Ankitasw commented Dec 8, 2021 • edited Loading

Ankitasw commented Dec 9, 2021

Ankitasw commented Dec 10, 2021

Ankitasw commented Dec 13, 2021

Ankitasw commented Dec 13, 2021

k8s-ci-robot commented Dec 13, 2021

sedefsavas commented Dec 13, 2021

k8s-ci-robot commented Dec 13, 2021

Ankitasw commented Dec 6, 2021 •

edited

Loading

Ankitasw commented Dec 7, 2021 •

edited

Loading

Ankitasw commented Dec 7, 2021 •

edited

Loading

sedefsavas commented Dec 7, 2021 •

edited

Loading

Ankitasw commented Dec 8, 2021 •

edited

Loading