Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: improve error messaging on crash #44739

Merged
merged 1 commit into from Jan 26, 2023

Conversation

nicks
Copy link
Contributor

@nicks nicks commented Jan 3, 2023

- What I did
improve error messaging on crash

- How I did it
Report the original HTTP body in the error message, rather than the JSON decoder error parsing the HTTP body

- How to verify it

Repro steps:

  • Run Docker Desktop
  • Run docker run busybox tail -f /dev/null
  • Run `pkill "Docker Desktop"

Expected:
An error message that indicates that Docker Desktop is shutting down.

Actual:
An error message that looks like this:

error waiting for container: invalid character 's' looking for beginning of value

here's an example:

docker/for-mac#6575 (comment)

After this change, you get an error message like:

error waiting for container: copying response body from Docker: unexpected EOF

which is a bit more explicit.

- Description for the changelog
improve client error messaging on engine crash

- A picture of a cute animal (not mandatory but encouraged)
PXL_20210126_144548001

//
// If there's a JSON parsing error, read the real error message
// off the body and send it to the client.
_, _ = io.ReadAll(stream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wrap stream with io.LimitReader (N=a few MiB)

Copy link
Contributor Author

@nicks nicks Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, that's a good callout. i don't think we even need multiple MB, multiple KB will do. Unless there's a reason to preserve error messages a few MB long? added!

@thaJeztah
Copy link
Member

This is a bit similar to containerd/containerd#4672, and a similar discussion about "what to return" on the PR to address that; containerd/containerd#7564

Some questions / quick thoughts I have;

  • Is this only for this specific endpoint? Or would other endpoint have the same issue?
  • For the Docker Desktop proxy; effectively it's proxying the API, but for (some?) endpoints not returning errors in the expected format; I had a very cursory look at the code; wondering if that proxy should be able to handle these errors better;
    • if the proxy itself is aware of "shutting down" or "starting", perhaps it would be good for a specific status code (but a well-formed JSON error, if the response is expected to be JSON)
  • I share concerns with Akihiro as to reading the whole body (although possibly not problematic for the /wait endpoint); mostly if other endpoints would be involved, we wouldn't know what to expect in the response; reading all into memory may not be the best of things.

@thaJeztah thaJeztah added area/api area/cli status/2-code-review kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. labels Jan 3, 2023
@nicks
Copy link
Contributor Author

nicks commented Jan 3, 2023

@thaJeztah

Re: is this only for this specific endpoint?

Maybe? The /wait endpoint might be the only one that does streaming in this way?

re: perhaps it would be good to have a specific status code:

The current endpoint assumes that the HTTP status code is sent before the crash. I like this post from the GRPC people on this general problem: https://carlmastrangelo.com/blog/why-does-grpc-insist-on-trailers, it's more of a protocol design challenge on how you model streaming RPCs on top of HTTP.

re: a well-formed JSON error

Yeah, I think it would be reasonable to change the proxy to return a JSON-encoded error. But AFAICT, we would have to change the response type. The current response type doesn't have a good way to distinguish between "Errors from the underlying container" and "errors talking to the docker engine"
https://github.com/nicks/moby/blob/e727cd52a42d9eb64e6bb906ca03cec0ad7be083/api/server/router/container/container_routes.go#L362

what do you think?

re: reading the whole body

Ya, that's a good callout. i don't think we even need multiple MB, multiple KB will do. Changed!

Copy link
Member

@neersighted neersighted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit, and the commits should be squashed down as one just amends the other. Otherwise, LGTM, thank you!

client/container_wait.go Outdated Show resolved Hide resolved
Repro steps:
- Run Docker Desktop
- Run `docker run busybox tail -f /dev/null`
- Run `pkill "Docker Desktop"

Expected:
An error message that indicates that Docker Desktop is shutting down.

Actual:
An error message that looks like this:

```
error waiting for container: invalid character 's' looking for beginning of value
```

here's an example:

docker/for-mac#6575 (comment)

After this change, you get an error message like:

```
error waiting for container: copying response body from Docker: unexpected EOF
```

which is a bit more explicit.

Signed-off-by: Nick Santos <nick.santos@docker.com>
@nicks
Copy link
Contributor Author

nicks commented Jan 26, 2023

@neersighted thanks! squashed them!

@nicks nicks requested review from neersighted and removed request for AkihiroSuda January 26, 2023 20:52
@tianon tianon merged commit 13f4262 into moby:master Jan 26, 2023
@nicks nicks deleted the nicks/invalid-character branch January 27, 2023 01:21
@thaJeztah
Copy link
Member

Oh! For some reason this one slipped off my list.

Maybe? The /wait endpoint might be the only one that does streaming in this way?

I'll create a tracking issue to look into that. Possibly we need a utility for handling some of this (so that it can be handled consistently).

re: perhaps it would be good to have a specific status code:

The current endpoint assumes that the HTTP status code is sent before the crash.

🤦 yeah I didn't think that suggestion through really. Lost out of sight that we're dealing with streaming endpoint so an 200 Header would already be sent (which also has been source of much confusing). Definitely another area where we need better utilities / SDK for handling these (too easy to use the client "the wrong way").

I like this post from the GRPC people on this general problem: https://carlmastrangelo.com/blog/why-does-grpc-insist-on-trailers, it's more of a protocol design challenge on how you model streaming RPCs on top of HTTP.

Heh this made me chuckle:

"While the point of this post is not to point fingers" (points finger)

Definitely interesting post yes. TIL about Trailers (or perhaps I knew about them but it got lost in memory).

The current response type doesn't have a good way to distinguish between "Errors from the underlying container" and "errors talking to the docker engine"

I think it should be possible to disambiguate. While the docker engine API itself may not have that ability (there is no real "upstream", well, to the extend that the "backend" in the daemon is the upstream), in the (Docker Desktop) "proxy" situation, it's the proxy that makes a connection with the actual API; if that fails, it could send the right header (and error-format) instead of sending a HTTP 200 "ok".

But again here, we must improve (make it easier to) handle the JSONstream responses, as the "actual" error will be in that response, and handling that it currently rather quirky (the least), and there may be a lot of duplication around handling those.

@nicks
Copy link
Contributor Author

nicks commented Jan 31, 2023

ya, i'd definitely support some sort of machinery to handle this parsing consistently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api area/cli kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. process/cherry-picked status/2-code-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants