New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash Report: Error while performing Kubernetes API operation Pod exec: WebsocketError #5729
Comments
Thanks for providing the crash report! This is something with the availability of the Kubernetes server and intermittent networking issues, but we should retry those kinds of request. We'll look into making it a bit more reliable. |
Fixes #5729 We failed retrying WebsocketError during Pod exec operations, because the regex "WebsocketError: Unexpected server response" did not match anymore after we refactored toKubernetesError, because the formatting cahnged. Let's retry if just the name WebsocketError appears in the message.
Fixes #5729 We failed retrying WebsocketError during Pod exec operations, because the regex "WebsocketError: Unexpected server response" did not match anymore after we refactored toKubernetesError, because the formatting cahnged. Let's retry if just the name WebsocketError appears in the message.
@salotz thank you for reporting this! We found that this was a regression where we stopped retrying websocket errors after we reformatted an error message. The PR #5755 adds a fix that should make Garden now more reliable for you, by retrying intermittent errors like that. We also added a regression test case. |
**What this PR does / why we need it**: We failed retrying `WebsocketError` during Pod exec operations, because the regex "WebsocketError: Unexpected server response" did not match anymore after we refactored `toKubernetesError`, because the formatting changed. Let's retry if just the name `WebsocketError` appears in the message. **Which issue(s) this PR fixes**: Fixes #5729 **Special notes for your reviewer**:
**What this PR does / why we need it**: Try to choose a running Pod for execing into. Without this, users might get the following error persistently if the first pod happened to be crashlooping: ``` run.test✖ Failed processing Run type=kubernetes-exec name=test (took 12.35 sec). This is what happened: ──────────────────────────────────────────────────────────────────────────────────────────────────── Error while performing Kubernetes API operation Pod exec: WebsocketError Unexpected server response: 500 ──────────────────────────────────────────────────────────────────────────────────────────────────── ``` **Which issue(s) this PR fixes**: Fixes #5729 **Special notes for your reviewer**:
Crash report
Error message
What did you do?
Running in CI on GCP GKE, occasionally I get this error on different kinds of operations.
I'm guessing this is something to do with the Kubernetes server being unavailable?
Your environment
gardendev/garden-gcloud:0.13.23-alpine
Frequency
Sporadic
Workaround
Just restarting retrying.
I will try updating.
Additional context
If anything an explanation of what causes this error could help me, as I do think its the Kubernetes server. Just not sure how to fix that. If I'm using GKE I find it strange that the API server would go down for >10 minutes.
The text was updated successfully, but these errors were encountered: