Join GitHub today
ContainerWait on remove: don't stuck on rm fail #34999
Currently, if a container removal has failed for some reason, any client waiting for removal (e.g.
This commit addresses that by allowing
Note that this feature is only available for API version >= 1.34. The old behavior (i.e. do not return from wait on removal error) is being kept for current and older clients, except for the added empty
Now, docker-cli would need a separate commit to bump the API to 1.34 and to show an error returned, if any. The current version of docker-cli will work as is.
referenced this pull request
Sep 28, 2017
I don’t know, I’m not sure this is right. It’s a breaking change for sure. The existing behavior is correct in that the container is not removed.
The specific client hang issue could be fixed by after some interval checking on the status (ie, is it in a “Dead” state.
I thought about it, and tried all combinations of old/new client/daemon and it works either same or better. Can you think of any other client that can break so I can check it? Do you mean that with this change, a client can no longer assume the container is removed? This can be fixed a client (check the container status after getting the response, or just check for
To me, the current API of not reporting a removal error is sort of useless/broken. If we are interested in "container was removed" event, we are most probably also interested in the "container removal failure", too.
In case we want to be extra careful about API, we can deprecate this one and add another one, which works like the one in this PR.
Surely, but this would look like a kludge, a work around broken API. Current way is simple and elegant. Polling for status is not.
I have updated the patch to enable old-style behavior for API version < 1.34. Tested with unmodified docker-cli (which hangs on removal error as it was before), as well as patched client which works as expected (prints an error).
OK, I now have three working versions of this patch, all implementing slightly different behavior for older clients in case of removal failure. They are:
Once again, the above are just variations of how to handle old clients when removal has failed. New clients will receive an error as described in the commit message, and can handle it.
We need to choose one. Number 1 is surely an incompatibility in the API. Number 2 is keeping the old (bad) API as is -- the problem is clients are stuck in waiting for something that might never happen. Number 3 is somewhat ugly way to report an error while keeping the API compatible.
Please share your opinion. As it fixes the client stuck I'd love this to move forward.
I tested all three variations:
1. docker-cli from 17.05-ce (client API 1.29 == using legacy wait)
It does not exit with non-zero in case engine is closing the connection. I tested it just by stopping a daemon while having a few
So, we have a bug in legacy wait code, but it's not the subject of this PR (although it made me think looking into engine legacy wait implementation, as it suffers from the same issue and I didn't realize it).
2. docker-cli from 17.06-ce (client API 1.30)
IMHO it's pretty good (and way better than just being stuck).
3. docker-cli with my patch (API 1.34)
Docker versions 1.13 through 17.05 don’t use the container wait API for `docker run ...` anyway. It used the Events API as indicated by the error messages.…
Sent from my iPhone
On Oct 17, 2017, at 21:18, Kirill Kolyshkin ***@***.***> wrote: TL;DR: option number 3 works as intended; need to see the legacyWait, too (but it's a separate issue) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.