Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e-pilot-noauth-v1alpha3-v2 cleanup failures #13249

Closed
howardjohn opened this issue Apr 11, 2019 · 4 comments
Closed

e2e-pilot-noauth-v1alpha3-v2 cleanup failures #13249

howardjohn opened this issue Apr 11, 2019 · 4 comments
Assignees
Milestone

Comments

@howardjohn
Copy link
Member

Example failure: https://circleci.com/gh/istio/istio/383750?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

2019-04-11T17:14:10.583226Z	info	Safe to ignore resource not found errors in kubectl delete -f /tmp/istio.e2e.981470280/yaml/istio.yaml
2019-04-11T17:14:10.583240Z	info	Deleting namespace istio-system
2019-04-11T17:18:02.020641Z	error	Failed to delete namespace istio-system after 180 seconds
2019-04-11T17:18:02.020748Z	info	Cleanup complete

Basically we delete the istio.yaml file which has a istio-system namespace in it. Then we later try to delete the istio-system namespace, which fails. We throw this error as "failed to delete after 180 seconds" like its timing out, but it actually just fails because it is already deleted.

@howardjohn
Copy link
Member Author

Actually I think this is not the issue, but it is still failing - unclear why istio-system is not getting deleted within 180s

@ozevren ozevren added this to the 1.2 milestone Apr 18, 2019
@geeknoid geeknoid modified the milestones: 1.3, 1.2 May 12, 2019
@pitlv2109
Copy link
Member

So we're not deleting istio-system (and other resources) twice? The log seems to suggest it...

@philrud
Copy link
Member

philrud commented Jun 12, 2019

Failure to delete the namespace would not result in test failure and is not a cause of that test failure. According to the code, the error would be reported, but the cleanup phase would still succeed and the test would finish successfully. I've tested that behavior by forcing a tight timeout with the following PR: #14766, the following CircleCI job succeeds even after failure to delete a namespace: https://circleci.com/gh/istio/istio/441730

The root cause of the test failure is the following:

2019-04-11T17:03:52.685392Z	info	Running command kubectl -n istio-system rollout status deployments/istio-galley --kubeconfig=
2019-04-11T17:13:52.041125Z	info	Command error: exit status 1
2019-04-11T17:13:52.041262Z	info	Deployment rollout ends after [9m59.373021199s] with err [deployments/istio-pilot in namespace istio-system failed]
2019-04-11T17:13:52.041279Z	error	Failed to deploy Istio.
2019-04-11T17:13:52.041290Z	error	Failed to complete Init. Error deployments/istio-pilot in namespace istio-system failed

From further logs looks like some pods are stuck in "Init" and Galley has never started responding health checks:

2019-04-11T17:13:52.674662Z	info	Command output: 
NAME                                                              READY     STATUS     RESTARTS   AGE
istio-citadel-78f7c84f79-tm669                                    1/1       Running    0          10m
istio-cleanup-secrets-b2229dd7e2a615e1b356fbf7ad79d3b9-8tv6c      1/1       Running    0          10m
istio-egressgateway-97c964788-5n55t                               0/1       Init:0/1   0          10m
istio-galley-5c549ffc5-9s8wj                                      0/1       Running    0          10m
istio-ingressgateway-598cfb7c77-5zs9k                             0/1       Init:0/1   0          10m
istio-pilot-6f8f56ccf4-vcxbr                                      1/2       Running    0          10m
istio-policy-6dc5dcddf6-cq6rd                                     2/2       Running    0          10m
istio-policy-6dc5dcddf6-fps6n                                     2/2       Running    0          10m
istio-security-post-install-b2229dd7e2a615e1b356fbf7ad79d3ln24s   1/1       Running    0          10m
istio-sidecar-injector-57d4b66555-xrkdg                           0/1       Running    0          10m
istio-telemetry-6998578f7c-79jvw                                  2/2       Running    0          10m
prometheus-758fdbcdb7-n8bfz                                       0/1       Init:0/1   0          10m

This looks pretty much like #12431
Closing this as a duplicate. This failure is >2 months old and apparently CircleCI resources were adjusted since then and we're still verifying whether infrastructure issues have been resolved since then.

@philrud philrud closed this as completed Jun 12, 2019
@philrud
Copy link
Member

philrud commented Jun 12, 2019

duplicate of #12431

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants