Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Test: CircleCI e2e-pilot-auth-v1alpha3-v2 #11929

Closed
esnible opened this issue Feb 21, 2019 · 12 comments
Closed

Flaky Test: CircleCI e2e-pilot-auth-v1alpha3-v2 #11929

esnible opened this issue Feb 21, 2019 · 12 comments

Comments

@esnible
Copy link
Contributor

esnible commented Feb 21, 2019

Different symptom than #7863

https://circleci.com/gh/istio/istio/332939?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link failed because of

2019-02-21T03:46:01.566970Z	info	Running command kubectl delete -n istio-system -f /tmp/istio.e2e.160502611/rule-503test-destinationrule-c.yaml931198571yaml --kubeconfig=
2019-02-21T03:46:02.092243Z	info	Command output: 
destinationrule "destination-rule-c" deleted
2019-02-21T03:46:07.092469Z	info	Running command kubectl delete -n istio-system -f testdata/networking/v1alpha3/ingressgateway.yaml --kubeconfig=
2019-02-21T03:46:07.254367Z	info	Command output: 
gateway "istio-ingressgateway" deleted
--- FAIL: TestIngressGateway503DuringRuleChange (63.37s)
	ingressgateway_test.go:258: Got non 200 status code while changing rules: map[200:499 503:1]
=== RUN   TestVirtualServiceMergingAtGateway
@esnible
Copy link
Contributor Author

esnible commented Feb 21, 2019

@clyang82
Copy link
Member

https://circleci.com/gh/istio/istio/337405?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link failed as well. I think it should share the same reason. the map was map[200:499 503:1]

@clyang82
Copy link
Member

After restart the circle ci, it is working now. Maybe it got fixed by others or it appears occasionally

@ozevren
Copy link
Contributor

ozevren commented Apr 3, 2019

https://prow.k8s.io/view/gcs/istio-prow/logs/istio-pilot-e2e-envoyv2-v1alpha3-k8s-latest/1204

Networking folks, can you please reassign appropriately to get it fixed?

@geeknoid geeknoid added this to the 1.2 milestone May 12, 2019
@costinm costinm removed their assignment May 17, 2019
@duderino
Copy link
Contributor

Unassigned @rshriram and @andraxylia because we have started a large effort to deflake our tests and I want someone to see this as available and pick it up

@yangminzhu yangminzhu self-assigned this Jun 12, 2019
@yangminzhu
Copy link
Contributor

The test failed 3 times (https://k8s-testgrid.appspot.com/istio-presubmits-master#circleci-e2e-pilot-auth-v1alpha3-v2&width=5) in the past 2 weeks and all caused by the same error (https://prow.k8s.io/view/gcs/istio-circleci/presubmit/master/e2e-pilot-auth-v1alpha3-v2/439664):

  Warning  Unhealthy              15s               kubelet, default-d4a1c8e2-2676-47b9-a877-c54822075b36  Liveness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  Unhealthy              14s               kubelet, default-d4a1c8e2-2676-47b9-a877-c54822075b36  Readiness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded

This seems very similar to #12431

@yangminzhu
Copy link
Contributor

e2e-pilot-noauth-v1alpha3-v2 seems failed due to the same reason (readiness/livness probe failed for galley and sidecar-injector.) for 1 time about a week ago: https://k8s-testgrid.appspot.com/istio-postsubmits-master#circleci-e2e-pilot-noauth-v1alpha3-v2&width=20

@ozevren
Copy link
Contributor

ozevren commented Jun 14, 2019

@yangminzhu
I opened #14847 to track that issue. Apart from that, are there any test failures that you are seeing?

@yangminzhu
Copy link
Contributor

yangminzhu commented Jul 10, 2019

The test is now moved to prow from circle ci: https://k8s-testgrid.appspot.com/istio-postsubmits-master#pilot-e2e-envoyv2-v1alpha3&width=20

It failed several times in postsubmits but it's all due to the specific tests flaky, for example: https://prow.k8s.io/view/gcs/istio-prow/logs/istio-pilot-e2e-envoyv2-v1alpha3-master/3292, https://prow.k8s.io/view/gcs/istio-prow/logs/istio-pilot-e2e-envoyv2-v1alpha3-master/3057, https://prow.k8s.io/view/gcs/istio-prow/logs/istio-pilot-e2e-envoyv2-v1alpha3-master/2972

Sometime it failed in early build step which also seems not related to the issue here: https://prow.k8s.io/view/gcs/istio-prow/logs/istio-pilot-e2e-envoyv2-v1alpha3-master/3160

@ozevren let me know if you found anything in Galley.

I will monitor the test for some more time to see if the same failure is happening on prow, we could close this if not which mean it's possibly related to the circle ci networking.

@sushicw
Copy link
Contributor

sushicw commented Jul 22, 2019

@yangminzhu Any update on this?

@yangminzhu
Copy link
Contributor

The test failed once in the past week:

=== RUN   TestGateway_HTTPIngress/HTTPIngressGateway
2019-07-19T00:03:13.026404Z	error	client request error command failed: "2019-07-19T00:03:12.994627Z\tfatal\tError Get http://istio-ingressgateway.istio-system/c: dial tcp 10.23.248.239:80: connect: connection refused\n\ncommand terminated with exit code 1\n" exit status 1 for http://istio-ingressgateway.istio-system/c in t from primary cluster
2019-07-19T00:03:13.026488Z	info	request counts map[]
2019-07-19T00:03:14.209973Z	error	client request error command failed: "2019-07-19T00:03:14.202873Z\tfatal\tError Get http://istio-ingressgateway.istio-system/c: dial tcp 10.23.248.239:80: connect: connection refused\n\ncommand terminated with exit code 1\n" exit status 1 for http://istio-ingressgateway.istio-system/c in t from primary cluster

https://storage.googleapis.com/istio-prow/logs/istio-pilot-e2e-envoyv2-v1alpha3-master/3967/build-log.txt

It seems it's due to the readiness probe failed for ingress-gateway:

  Normal   Pulled     12m                 kubelet, gke-gke-071819-9hf4abyc1-default-pool-5e3992b1-6kc1  Successfully pulled image "gcr.io/istio-testing/proxyv2:77368ec56a2480e27898ec7048bb36f4bbfc5997-e2e_pilotv2_v1alpha3"
  Normal   Created    12m                 kubelet, gke-gke-071819-9hf4abyc1-default-pool-5e3992b1-6kc1  Created container
  Normal   Started    12m                 kubelet, gke-gke-071819-9hf4abyc1-default-pool-5e3992b1-6kc1  Started container
  Warning  Unhealthy  12m (x17 over 12m)  kubelet, gke-gke-071819-9hf4abyc1-default-pool-5e3992b1-6kc1  Readiness probe failed: HTTP probe failed with statuscode: 503

https://storage.googleapis.com/istio-prow/logs/istio-pilot-e2e-envoyv2-v1alpha3-master/3967/artifacts/pilot-test-ca508d25d3d343318facd53809/istio-ingressgateway-7d85847846-jrf9r_describe.log

I'm not sure if this is any related to this issue or galley or is it just some transient network errors.

@howardjohn howardjohn modified the milestones: 1.2, Nebulous Future Sep 13, 2019
@yangminzhu
Copy link
Contributor

yangminzhu commented Sep 17, 2019

the test HTTPIngressGateway is no longer flaky for now, but it seems there is some error in deploying egressgateway:
https://prow.k8s.io/view/gcs/istio-prow/pr-logs/directory/pilot-e2e-envoyv2-v1alpha3_istio/447

2019-09-17T18:34:24.239536Z	info	Command error: exit status 1
2019-09-17T18:34:24.239709Z	info	Deployment rollout ends after [9m58.969082191s] with err [deployment.extensions/istio-egressgateway in namespace istio-system failed]
2019-09-17T18:34:24.239725Z	error	Failed to deploy Istio.
2019-09-17T18:34:24.239741Z	error	Failed to complete Init. Error deployment.extensions/istio-egressgateway in namespace istio-system failed

and the egressgateway is failing with:

2019-09-17T02:06:56.160426Z	info	Envoy proxy is NOT ready: failed to get server info: unknown field "hot_restart_version" in envoy_admin_v2alpha.CommandLineOptions
2019-09-17T02:06:58.160504Z	info	Envoy proxy is NOT ready: failed to get server info: unknown field "hot_restart_version" in envoy_admin_v2alpha.CommandLineOptions

Filed #17151 to fix the broken tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests