Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Flake gke teardown failed; insufficient quota #1166

Closed
jlewi opened this issue Jul 10, 2018 · 2 comments
Closed

Test Flake gke teardown failed; insufficient quota #1166

jlewi opened this issue Jul 10, 2018 · 2 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Jul 10, 2018

Why are we getting quota error in teradown?

W
+ [[ -n /secret/gcp-credentials/key.json ]]
+ gcloud auth activate-service-account --key-file=/secret/gcp-credentials/key.json
Activated service account credentials for: [kubeflow-testing@kubeflow-ci.iam.gserviceaccount.com]
+ sleep 6
+ gcloud deployment-manager deployments update e2e-5d68 --config=/mnt/test-data-volume/kubeflow-presubmit-kubeflow-e2e-gke-1162-aa46776-2441-5d68/src/kube
flow/kubeflow/docs/gke/configs-e2e-5d68/cluster-kubeflow.yaml --project=kubeflow-ci
The fingerprint of the deployment is ArMJmlF6CvTg0PGxQThXZQ==
Waiting for update [operation-1531259135671-570ac0cd73ad8-e53c9132-f92d2eed]...
...................failed.
ERROR: (gcloud.deployment-manager.deployments.update) Error in Operation [operation-1531259135671-570ac0cd73ad8-e53c9132-f92d2eed]: errors:
- code: RESOURCE_ERROR
location: /deployments/e2e-5d68/resources/e2e-5d68-gpu-pool-v1
message: "{\"ResourceType\":\"container.v1.nodePool\",\"ResourceErrorCode\":\"403\"\
,\"ResourceErrorMessage\":{\"code\":403,\"message\":\"Insufficient regional quota\
\ to satisfy request for resource: \\\"INSTANCE_GROUPS\\\". The request requires\
\ '1.0' and is short '1.0'. The regional quota is '100.0' with '0.0' available.\"\
,\"status\":\"PERMISSION_DENIED\",\"statusMessage\":\"Forbidden\",\"requestPath\"\
:\"https://container.googleapis.com/v1/projects/kubeflow-ci/zones/us-east1-d/clusters/e2e-5d68/nodePools\"\
,\"httpMethod\":\"POST\"}}"
Traceback (most recent call last):
 File "/mnt/test-data-volume/kubeflow-presubmit-kubeflow-e2e-gke-1162-aa46776-2441-5d68/src/kubeflow/testing/py/kubeflow/testing/test_helper.py", line 96,
 in wrap_test
   test_case.test_func(test_case)
 File "/mnt/test-data-volume/kubeflow-presubmit-kubeflow-e2e-gke-1162-aa46776-2441-5d68/src/kubeflow/kubeflow/testing/run_with_retry.py", line 40, in run_
with_retry
   run()
 File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 49, in wrapped_f
   return Retrying(*dargs, **dkw).call(f, *args, **kw)
 File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 212, in call
   raise attempt.get()
 File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 247, in get
   six.reraise(self.value[0], self.value[1], self.value[2])
 File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 200, in call
@ankushagarwal
Copy link
Contributor

Looks like deleting the k8s cluster does not delete the GCP resources that the ingress controller creates. We might have to delete the ingress object before deleting the cluster if we want to cleanup gracefully.

image

@ankushagarwal
Copy link
Contributor

After recent improvements this should no longer happen. let's reopen it if we see this.

yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Feb 15, 2021
* Refactor suggestion services folder structure

* Move base health service to internal

* Rename service and base service files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants