New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
container: add a timeout for deletion #47713
base: master
Are you sure you want to change the base?
Conversation
Hmm, I'm not sure I understand the failure here,
it looks like it tries to submit and kill an invalid exec request, I don't think this code should be invoked on that path? |
Commenting from my phone, but 👋 👋👋 good seeing you! Hope you're doing well |
Thanks, I am! This PR is not urgent, but it would be nice to understand the failures here. |
8fa8abc
to
988e948
Compare
On a heavily loaded host, we were experiencing long container(d) deletion times: containerd[8907]: time="2024-03-25T13:47:45.938479195Z" level=debug msg="event forwarded" ns=moby topic=/tasks/exit type=containerd.events.TaskExit # our control plane logic deletes the successfully exited container via # the docker API, and... containerd[8907]: time="2024-03-25T13:47:47.202055216Z" level=debug msg="failed to delete task" error="context deadline exceeded" id=a618057629b35e3bfea82d5ce4cbb057ba979498496428dfe6935a1322b94add Before 4bafaa0 ("Refactor libcontainerd to minimize c8d RPCs") when this happens, the docker API reports a 255 exit code and no error: 0a7ddd027c0497d5a titus-executor-[900884]: Processing msg from a docker: main container exited with code 255 which is especially confusing. After 4bafaa0, the behavior has changed to report the container's real exit code, although there is still a hard coded timeout after which containerd will (try to) stop cleaning up. We would like to wait for this cleanup, so let's add a user configurable DeleteTimeout here. Reported-by: Hechao Li <hli@netflix.com> Signed-off-by: Tycho Andersen <tandersen@netflix.com>
988e948
to
9463a5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; Would also appreciate input from other maintainers though
This doesn't seem like the right place to me. You mention you want to wait for delete, what exactly do you mean by this, and how are you looking at it today? |
Labels map[string]string // List of labels set to this container | ||
StopSignal string `json:",omitempty"` // Signal to stop a container | ||
StopTimeout *int `json:",omitempty"` // Timeout (in seconds) to stop a container | ||
DeleteTimeout *int `json:",omitempty"` // Timeout (in seconds) to wait for a container to be deleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to have a closer look at the full proposal (sorry; really behind on my backlog); mostly was curious if this is persisted to disk, and if it needs to be, or only to be passed when performing a delete request.
For StopTimeout and StopSignal, the time-out is related to what's running in the container (e.g. database that you want to provide a time to do a graceful shutdown)
For DeleteTimeout, I wonder if it's a property of the container or not
Discussed this with other maintainers yesterday and seeing where the existing behavior can be problematic: Right now we have a timeout on containerd task deletion, which on a loaded node could be hit. Where the real problem lies is if the timeout is hit and then you remove the (dockerd) container, the task will be leaked and effectively have a zombie task, shim process, and containerd container. Some observations in the existing code:
|
Do you want the daemon to wait indefinitely until the task has been deleted so as not to leak c8d tasks, or do you want the delete API call to block until the task has been deleted so that your orchestrator knows when cleanup is complete? If it's the latter, this PR is insufficient as there are timeouts on the waits for the container to leave the Running state in the API handler code paths. I'm going to assume you want the former. (Note that the daemon only transitions a container from the Running state to Stopped until after the task has been deleted – or the delete times out.) Leaking c8d tasks on heavily loaded systems (or runtimes which are slow to delete tasks) is bad for everyone. I would ideally want to find a solution which does not require any configuration, by actually waiting indefinitely. The daemon has a separate event queue for each (c8d) container, so blocking indefinitely handling one event will not block events for other containers from being handled. The trouble is, the Docker container's state is only transitioned from Running to Stopped after the delete operation returns, and until then the container continues to be reported as Running in the Engine API. And a container can't be restarted until after the old c8d task has been deleted, so we can't just transition the container to Stopped immediately upon receiving the c8d |
@cpuguy83 I checked into that -- only other events for the same Container ID are blocked. Each Container ID gets its own separate event queue inside libcontainerd and each event callback is invoked from a new goroutine. The existing implementation where the event handler callback blocks until |
On a heavily loaded host, we were experiencing long container(d) deletion times:
Before 4bafaa0 ("Refactor libcontainerd to minimize c8d RPCs") when this happens, the docker API reports a 255 exit code and no error:
which is especially confusing. After 4bafaa0, the behavior has changed to report the container's real exit code, although there is still a hard coded timeout after which containerd will (try to) stop cleaning up. We would like to wait for this cleanup, so let's add a user configurable DeleteTimeout here.
Reported-by: Hechao Li hli@netflix.com