Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash Report: Error while performing Kubernetes API operation Pod exec: WebsocketError #5729

Closed
salotz opened this issue Feb 13, 2024 · 2 comments · Fixed by #5755 or #5782
Closed

Comments

@salotz
Copy link

salotz commented Feb 13, 2024

Crash report

Error message

Error while performing Kubernetes API operation Pod exec: WebsocketError
Unexpected server response: 500
────────────────────────────────────────────────────────────────────────────────────────────────────
✔ test.test-ui-container    → Done (took 0.3 sec)
ℹ test.test-ui-container    → [verbose] Test type=kubernetes-exec name=test-ui-container is ready.
An error occurred while running step 4/5 — Run tests on the containers.
[debug] 
[debug] Error: 1 requested test action(s) failed!
    at handleProcessResults (file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:360870:23)
    at TestCommand.action (file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:830[768](https://gitlab.com/examol/ungulata/-/jobs/6157490104#L768):16)
    at async runStepCommand (file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:831096:12)
    at async WorkflowCommand.action (file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:830947:34)
    at async file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:360356:30
    at async file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:360144:20
    at async file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:848242:26
    at async file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:848417:33
    at async file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:360144:20
    at async GardenCli.run (file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:848415:32)
Error type: runtime
Wrapped errors:
⮑  Error: Test type=kubernetes-exec name=test-bovid-container failed: Error: Error while performing Kubernetes API operation Pod exec: WebsocketError
    Unexpected server response: 500
        at ProcessTaskNode.complete (file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:793684:32)
        at GraphSolver.completeTask (file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:794213:29)
        at GraphSolver.processNode (file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:794142:18)
    Error type: graph
    Wrapped errors:
    ⮑  Error: Error while performing Kubernetes API operation Pod exec: WebsocketError
        Unexpected server response: 500
            at toKubernetesError (file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:702486:12)
            at execWithRetry (file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:750228:27)
            at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
            at async file:///root/.local/share/garden/1703250148-E0Ztcl6.r/rollup/garden.mjs:750239:53
        Error type: kubernetes

What did you do?

Running in CI on GCP GKE, occasionally I get this error on different kinds of operations.

I'm guessing this is something to do with the Kubernetes server being unavailable?

Your environment

gardendev/garden-gcloud:0.13.23-alpine

Frequency

Sporadic

Workaround

Just restarting retrying.

I will try updating.

Additional context

If anything an explanation of what causes this error could help me, as I do think its the Kubernetes server. Just not sure how to fix that. If I'm using GKE I find it strange that the API server would go down for >10 minutes.

@twelvemo
Copy link
Collaborator

Thanks for providing the crash report! This is something with the availability of the Kubernetes server and intermittent networking issues, but we should retry those kinds of request. We'll look into making it a bit more reliable.

stefreak added a commit that referenced this issue Feb 20, 2024
Fixes #5729

We failed retrying WebsocketError during Pod exec operations, because the regex "WebsocketError: Unexpected server response" did not match
anymore after we refactored toKubernetesError, because the formatting
cahnged.

Let's retry if just the name WebsocketError appears in the message.
stefreak added a commit that referenced this issue Feb 20, 2024
Fixes #5729

We failed retrying WebsocketError during Pod exec operations, because the regex "WebsocketError: Unexpected server response" did not match
anymore after we refactored toKubernetesError, because the formatting
cahnged.

Let's retry if just the name WebsocketError appears in the message.
@stefreak
Copy link
Member

@salotz thank you for reporting this! We found that this was a regression where we stopped retrying websocket errors after we reformatted an error message.

The PR #5755 adds a fix that should make Garden now more reliable for you, by retrying intermittent errors like that. We also added a regression test case.

github-merge-queue bot pushed a commit that referenced this issue Feb 20, 2024
**What this PR does / why we need it**:

We failed retrying `WebsocketError` during Pod exec operations, because
the regex "WebsocketError: Unexpected server response" did not match
anymore after we refactored `toKubernetesError`, because the formatting
changed.

Let's retry if just the name `WebsocketError` appears in the message.

**Which issue(s) this PR fixes**:

Fixes #5729

**Special notes for your reviewer**:
github-merge-queue bot pushed a commit that referenced this issue Feb 26, 2024
**What this PR does / why we need it**:
Try to choose a running Pod for execing into. Without this, users might
get the following error persistently if the first pod happened to be
crashlooping:

```
run.test✖ Failed processing Run type=kubernetes-exec name=test (took 12.35 sec). This is what happened:

────────────────────────────────────────────────────────────────────────────────────────────────────
Error while performing Kubernetes API operation Pod exec: WebsocketError

Unexpected server response: 500
────────────────────────────────────────────────────────────────────────────────────────────────────

```

**Which issue(s) this PR fixes**:

Fixes #5729

**Special notes for your reviewer**:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants