Test Flake gke teardown failed; insufficient quota #1166

jlewi · 2018-07-10T22:39:42Z

Why are we getting quota error in teradown?

W
+ [[ -n /secret/gcp-credentials/key.json ]]
+ gcloud auth activate-service-account --key-file=/secret/gcp-credentials/key.json
Activated service account credentials for: [kubeflow-testing@kubeflow-ci.iam.gserviceaccount.com]
+ sleep 6
+ gcloud deployment-manager deployments update e2e-5d68 --config=/mnt/test-data-volume/kubeflow-presubmit-kubeflow-e2e-gke-1162-aa46776-2441-5d68/src/kube
flow/kubeflow/docs/gke/configs-e2e-5d68/cluster-kubeflow.yaml --project=kubeflow-ci
The fingerprint of the deployment is ArMJmlF6CvTg0PGxQThXZQ==
Waiting for update [operation-1531259135671-570ac0cd73ad8-e53c9132-f92d2eed]...
...................failed.
ERROR: (gcloud.deployment-manager.deployments.update) Error in Operation [operation-1531259135671-570ac0cd73ad8-e53c9132-f92d2eed]: errors:
- code: RESOURCE_ERROR
location: /deployments/e2e-5d68/resources/e2e-5d68-gpu-pool-v1
message: "{\"ResourceType\":\"container.v1.nodePool\",\"ResourceErrorCode\":\"403\"\
,\"ResourceErrorMessage\":{\"code\":403,\"message\":\"Insufficient regional quota\
\ to satisfy request for resource: \\\"INSTANCE_GROUPS\\\". The request requires\
\ '1.0' and is short '1.0'. The regional quota is '100.0' with '0.0' available.\"\
,\"status\":\"PERMISSION_DENIED\",\"statusMessage\":\"Forbidden\",\"requestPath\"\
:\"https://container.googleapis.com/v1/projects/kubeflow-ci/zones/us-east1-d/clusters/e2e-5d68/nodePools\"\
,\"httpMethod\":\"POST\"}}"
Traceback (most recent call last):
 File "/mnt/test-data-volume/kubeflow-presubmit-kubeflow-e2e-gke-1162-aa46776-2441-5d68/src/kubeflow/testing/py/kubeflow/testing/test_helper.py", line 96,
 in wrap_test
   test_case.test_func(test_case)
 File "/mnt/test-data-volume/kubeflow-presubmit-kubeflow-e2e-gke-1162-aa46776-2441-5d68/src/kubeflow/kubeflow/testing/run_with_retry.py", line 40, in run_
with_retry
   run()
 File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 49, in wrapped_f
   return Retrying(*dargs, **dkw).call(f, *args, **kw)
 File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 212, in call
   raise attempt.get()
 File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 247, in get
   six.reraise(self.value[0], self.value[1], self.value[2])
 File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 200, in call

The text was updated successfully, but these errors were encountered:

ankushagarwal · 2018-07-10T22:47:31Z

Looks like deleting the k8s cluster does not delete the GCP resources that the ingress controller creates. We might have to delete the ingress object before deleting the cluster if we want to cleanup gracefully.

ankushagarwal · 2018-07-13T11:02:32Z

After recent improvements this should no longer happen. let's reopen it if we see this.

* Refactor suggestion services folder structure * Move base health service to internal * Rename service and base service files

jlewi added area/testing priority/p1 labels Jul 10, 2018

jlewi mentioned this issue Jul 10, 2018

Skip project setup during deployment. #1162

Merged

ankushagarwal added priority/p2 and removed priority/p1 labels Jul 13, 2018

ankushagarwal closed this as completed Jul 13, 2018

yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Feb 15, 2021

Refactor suggestion services folder structure (kubeflow#1166)

116f9f0

* Refactor suggestion services folder structure * Move base health service to internal * Rename service and base service files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Flake gke teardown failed; insufficient quota #1166

Test Flake gke teardown failed; insufficient quota #1166

jlewi commented Jul 10, 2018

ankushagarwal commented Jul 10, 2018

ankushagarwal commented Jul 13, 2018

Test Flake gke teardown failed; insufficient quota #1166

Test Flake gke teardown failed; insufficient quota #1166

Comments

jlewi commented Jul 10, 2018

ankushagarwal commented Jul 10, 2018

ankushagarwal commented Jul 13, 2018