-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e should retry if service is not available #118256
Conversation
the e2e framwork use active loops to wait for certain async operations, these loops need to retry on some operations and fail in others. For the functions that depend on some operations to happen, the apiserver may return 503 errors until that specific service is available, so we should retry on those too. Change-Id: Ib3d194184f6385b9d3d151c7055f27c97c21c3ff
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @pohly |
@aojea: GitHub didn't allow me to request PR reviews from the following users: joestringer. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems safe. The worst that can happen is that a test which encounters such an error when it is permanent keeps retrying until it times out and the fails with a report which mentions the error.
/lgtm
/approve
LGTM label has been added. Git tree hash: 621493692b8aac7d84133fe9dea40b3ecd198693
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…6-upstream-release-1.27 Automated cherry pick of #118256: e2e framework retry on Service unavailable errors
/kind bug
/kind flake
Found in this job https://github.com/cilium/cilium/actions/runs/5070707417?pr=25653
The tests uses VerifyPods to check the pods state
kubernetes/test/e2e/network/service.go
Line 1816 in 6911d3b
that uses
kubernetes/test/e2e/framework/pod/resource.go
Lines 98 to 100 in 6911d3b
that calls
podRunningMaybeResponding
withcheckResponding
to truekubernetes/test/e2e/framework/pod/resource.go
Lines 107 to 120 in 6911d3b
and here there is something I really don't understand much
WaitForPodsResponding
uses the apiserver proxy to connect to the pod :/, it does not seem the best way to do it but seems to be there for a long timekubernetes/test/e2e/framework/pod/wait.go
Lines 545 to 582 in 6911d3b
but anyway, this is for other discussion, the problem is that the Eventually loop does not retry in this specific error
kubernetes/test/e2e/framework/pod/wait.go
Lines 621 to 625 in 6911d3b
It seems that is fair game to retry on
IsServiceUnavailable
since we are just waiting for the proxy to pod service to be available