Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E tests fail because of too many requests trying to get workflow status #324

Closed
jlewi opened this issue Mar 7, 2019 · 7 comments
Closed

Comments

@jlewi
Copy link
Contributor

jlewi commented Mar 7, 2019

kubeflow/kubeflow#2557
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubeflow_kubeflow/2557/kubeflow-presubmit/6084/

Test is reported as success but its still a failure.

Pod probably exited with non-zero exit code

Here are the pod logs

INFO|2019-03-06T17:34:17|/src/kubeflow/testing/py/kubeflow/testing/argo_client.py|24| Workflow kubeflow-presubmit-unittests-2557-72ba595-6084-026a in namespace kubeflow-test-infra; phase=Succeeded
ERROR|2019-03-06T17:39:57|/src/kubeflow/testing/py/kubeflow/testing/run_e2e_workflow.py|285| Exception occurred: (429)
Reason: Too Many Requests
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'eaeba681-7910-4115-8020-141b359192ce', 'Content-Length': '43', 'X-Content-Type-Options': 'nosniff', 'Retry-After': '1', 'Date': 'Wed, 06 Mar 2019 17:39:57 GMT', 'Content-Type': 'text/plain; charset=utf-8'})
HTTP response body: Too many requests, please try again later.

Traceback (most recent call last):
  File "/src/kubeflow/testing/py/kubeflow/testing/run_e2e_workflow.py", line 271, in run
    status_callback=argo_client.log_status)
  File "/src/kubeflow/testing/py/kubeflow/testing/argo_client.py", line 98, in wait_for_workflows
    results = get_namespaced_custom_object_with_retries(namespace, n)
  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 212, in call
    raise attempt.get()
  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/local/lib/python2.7/dist-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/src/kubeflow/testing/py/kubeflow/testing/argo_client.py", line 70, in get_namespaced_custom_object_with_retries
    GROUP, VERSION, namespace, PLURAL, name)
  File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/apis/custom_objects_api.py", line 697, in get_namespaced_custom_object
    (data) = self.get_namespaced_custom_object_with_http_info(group, version, namespace, plural, name, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/apis/custom_objects_api.py", line 797, in get_namespaced_custom_object_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/api_client.py", line 321, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/api_client.py", line 155, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/api_client.py", line 342, in request
    headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/rest.py", line 231, in GET
    query_params=query_params)
  File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/rest.py", line 222, in request
    raise ApiException(http_resp=r)
ApiException: (429)
Reason: Too Many Requests
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'eaeba681-7910-4115-8020-141b359192ce', 'Content-Length': '43', 'X-Content-Type-Options': 'nosniff', 'Retry-After': '1', 'Date': 'Wed, 06 Mar 2019 17:39:57 GMT', 'Content-Type': 'text/plain; charset=utf-8'})
HTTP response body: Too many requests, please try again later.

Could this because we have too many workflows in the api server?

@jlewi
Copy link
Contributor Author

jlewi commented Mar 7, 2019

 kubectl get wf -l workflow_template=kfctl_test
error: the server doesn't have a resource type "wf"

@jlewi
Copy link
Contributor Author

jlewi commented Mar 7, 2019

I think we're going to have delete the namespace seems like there are too many workflows and the APIServer is unresponsive even to deleting individual workflows or by label.

@jlewi
Copy link
Contributor Author

jlewi commented Mar 7, 2019

See: #53 (comment)

@jlewi
Copy link
Contributor Author

jlewi commented Mar 7, 2019

Delete the namespace

kubectl delete namespace kubeflow-test-infra

@jlewi
Copy link
Contributor Author

jlewi commented Mar 7, 2019

kubectl create namespace kubeflow-test-infra
ks apply kubeflow-ci -c argo
ks apply kubeflow-ci -c debug-worker
ks apply kubeflow-ci -c nfs-external

Follow the instructions to create the secret.

  • Also create a secret named kubeflow-testing-credentials
  • Follow the instructions to create a secret containing a GitHub token.

The namespace is back up

@stale
Copy link

stale bot commented Jul 25, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. Thank you for your contributions.

@stale
Copy link

stale bot commented Aug 1, 2019

This issue has been closed due to inactivity.

@stale stale bot closed this as completed Aug 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant