-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
istio-integ-k8s-tests timeout on Prow #13280
Comments
The mtls health check test itself has been passed stably for a while, looking from testgrid, https://testgrid.k8s.io/istio-postsubmits#integration-k8s-tests&show-stale-tests= Solution, for mtls health check test, recently someone from community helped to support per-annotation app prober rewrite. We can get rid of special Istio installation. I can send a PR to update that. Question, why I can't see it running now on testgrid? I didn't see it's skipped in the code. Any ideas? Just curious. |
We might not be able to do the same thing for sds related tests.. @lei-tang unless those workflows support per deployment test case. |
I assume they don't show in testgrid because the tests get killed instead of failing? Should be fixed though |
I tried to run the failured tests on a GKE cluster a few times and they all passed. My thoughts:
|
You can look at build log to see arguments used:
Looks like -p 1 is passed |
@incfly When running the failed tests on a GKE cluster, the Istio deployment only takes around 2 minutes, far less than 30 minutes. Not sure about the Istio deployment time on prow. For the integration tests that require helm template to configure the Istio deployment, I think new Istio depolyment is needed because Istio components (e.g. node agent, Citadel) needed to be restarted with new configuration and testing in a new Istio deployment also ensures that a previous test does not contaminate the current test. |
@howardjohn Thanks, "-p 1" is used so in theory these tests will run one by one without interfering each other. However, I have a few more speculations about the 30 minute timeout failures on prow test environments.
|
Looking at logs, Istio is successfully deployed. I am pretty sure it gets stuck on this: istio/tests/integration/security/healthcheck/mtls_healthcheck_test.go Lines 60 to 73 in 52a304e
I think we can change these to use galley maybe? Not sure if that will fix the problem though These tests are consistently failing for days; we just never hit the timeout window until recently it seems. I think we should disable these tests until they are fixed. |
PR to disable: #13305 |
Now with #13305 I see the other 2 security tests also timing out: https://k8s-gubernator.appspot.com/build/istio-prow/pr-logs/pull/istio_istio/13305/istio-integ-k8s-tests/6735 Not sure what it is about the security tests that make them different.. |
I am trying to debug why these tests fail in Prow environment while succeed when running through commands "go test integration-test-directory-name" on a GKE cluster. But the Prow test scripts (e.g., prow/istio-integ-k8s-tests.sh) fails to run on a desktop terminal (failed at getting a Boskos resource). |
Pretty sure security tests are not the issue, rather they happen to be the ones to fail because they are run last. See https://k8s-gubernator.appspot.com/build/istio-prow/pr-logs/pull/istio_istio/13348/istio-integ-k8s-tests/6845?log#log I ran only the security tests, and added some more debug statements (shouldn't change anything), and the tests pass. |
I am almost certain the issue is with the new locality LB tests recently added. Supporting evidence:
For now lets disable the locality test. It is broken anyways. |
Tests are failing after 2 hours. Example: https://k8s-gubernator.appspot.com/build/istio-prow/pr-logs/pull/istio_istio/13182/istio-integ-k8s-tests/6701
istio.io/istio/tests/integration/security/sds_citadel_flow and istio.io/istio/tests/integration/security/healthcheck are killed after taking 30min each
The text was updated successfully, but these errors were encountered: