New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime-rs: improving io performance using dragonball's vsock fd passthrough #7483
Conversation
Can one of the admins verify this patch? |
@frezcirno I assumed that there should be some modification to the dragonball vmm, but it turned out to be not the case? |
Hi, the main functionality of passfd has merged into dragonball in this openanolis/dragonbal-sandbox PR #278 already, which is done before the original dragonball repo archives and moves into kata. Moreover, another precondition of this PR, which fixes a passfd type issue, is on the way, so this pr should be merged after that. |
@frezcirno another precondition of this PR has already been merged in dragonball-sandbox. Do you mean that it needs to be ported to kata-containers (now that dragonball is part of kata)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. Nice PR, thanks! A few comments are inline.
src/runtime-rs/crates/runtimes/virt_container/src/container_manager/container.rs
Outdated
Show resolved
Hide resolved
src/runtime-rs/crates/runtimes/virt_container/src/container_manager/process.rs
Show resolved
Hide resolved
Sorry, but I don't find this fix in openanolis/dragonball-sandbox or in kata-containers. Is it merged? I searched for the fix change and didn't find a similar commit or PR?
Yes, this PR won't work normally without that fix to dragonball. |
@frezcirno oops, I was looking at a different PR. Then could you help port another precondition of this PR to kata-containers ? |
6f3132e
to
e823eea
Compare
src/runtime-rs/crates/runtimes/virt_container/src/container_manager/container.rs
Outdated
Show resolved
Hide resolved
src/runtime-rs/crates/runtimes/virt_container/src/container_manager/process.rs
Outdated
Show resolved
Hide resolved
b0a082c
to
33716fa
Compare
71b3b6b
to
813583b
Compare
813583b
to
308cc5c
Compare
308cc5c
to
0471f83
Compare
Hi @frezcirno I took a look and found that there are no other problems. There are two questions you need to figure out:
|
Hi @justxuewei. The problem is that the The In the nginx image, For the CI, maybe we could use another nginx image without using the procfs? AFAIK, the nginx now supports specifying Thanks, |
0ee571e
to
0ff8dd3
Compare
Hi @frezcirno, thanks for your efforts to investigate this issue. It's ok to me overall, but there is one question remaining: Is this feature enabled by default? If yes, it means some images, like aforementioned |
src/runtime-rs/crates/runtimes/virt_container/src/container_manager/io/passfd_io.rs
Show resolved
Hide resolved
/test |
93e19fd
to
93e6007
Compare
/test |
/test-arm |
Hi @frezcirno. Your pull request possibly breaks CI, since I raised another PR without changes at here. I observed that it stucks at one of You could deploy a K8s cluster on local to do the test. I have a document about how to create a test cluster. You can ping me ( |
@frezcirno Please rebase atop of latest main containing a CI fix. |
Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
Two toml options, `use_passfd_io` and `passfd_listener_port` are introduced to enable and configure dragonball's vsock fd passthrough io feature. This commit is a preparation for vsock fd passthrough io feature. Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
Currently in the kata container, every io read/write operation requires an RPC request from the runtime to the agent. This process involves data copying into/from an RPC request/response, which are high overhead. To solve this issue, this commit utilize the vsock fd passthrough, a newly introduced feature in the Dragonball hypervisor. This feature allows other host programs to pass a file descriptor to the Dragonball process, directly as the backend of an ordinary hybrid vsock connection. The runtime-rs now utilizes this feature for container process io. It open the stdin/stdout/stderr fifo from containerd, and pass them to Dragonball, then don't bother with process io any more, eliminating the need for an RPC for each io read/write operation. In passfd io mode, the agent uses the vsock connections as the child process's stdin/stdout/stderr, eliminating the need for a pipe to bump data (in non-tty mode). Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
When one end of the connection close, the epoll event will be triggered forever. We should close the connection and kill the connection. Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
When container exits, the agent should clean up the term master fd, otherwise the fd will be leaked. Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
In linux, when a FIFO is opened and there are no writers, the reader will continuously receive the HUP event. This can be problematic when creating containers in detached mode, as the stdin FIFO writer is closed after the container is created, resulting in this situation. In passfd io mode, open stdin fifo with O_RDWR|O_NONBLOCK to avoid the HUP event. Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
Partially fix some issues related to container io detach and attach. Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
Support the hybrid fd passthrough mode with passing pipe fd, which can specify this connection kept even when the pipe peer closed, and this connection can be reget wich re-opening the pipe. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
We want the io connection keep connected when the containerd closed the io pipe, thus it can be attached on the io stream. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
In passfd io mode, when not using a terminal, the stdout/stderr vsock streams are directly used as the stdout/stderr of the child process. These streams are non-blocking by default. The stdout/stderr of the process should be blocking, otherwise the process may encounter EAGAIN error when writing to stdout/stderr. Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
This patch uses a biased select to avoid stdin data loss in case of CloseStdinRequest. Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
This patch adds O_NONBLOCK flag when open stdout and stderr FIFOs to avoid blocking. Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
Fix rustfmt and clippy warnings detected by CI. Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
Linux forbids opening an existing socket through /proc/<pid>/fd/<fd>, making some images relying on the special file /dev/stdout(stderr), /proc/self/fd/1(2) fail to boot in passfd io mode, where the stdout/stderr of a container process is a vsock socket. For back compatibility, a pipe is introduced between the process and the socket, and its read end is set as stdout/stderr of the container process instead of the socket. The agent will do the forwarding between the pipe and the socket. Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
There is a race condition in agent HVSOCK_STREAMS hashmap, where a stream may be taken before it is inserted into the hashmap. This patch add simple retry logic to the stream consumer to alleviate this issue. Fixes: kata-containers#6714 Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
568017a
to
222de4f
Compare
/test |
Currently, in the kata container, every io read/write operation requires an RPC request from the runtime to the agent.
This process involves unnecessary data copying into/from an RPC request/response, which introduces high overhead. In scenarios where containers have multiple process streams, this results in poor performance and additional CPU consumption.
To solve this issue, this PR proposes utilizing the vsock fd passthrough #7585, a newly introduced feature in the Dragonball hypervisor. This feature allows other host programs to pass a file descriptor to the Dragonball process directly as the backend of an ordinary hybrid vsock connection. The detail is depicted in the following diagram:
Changes Made in this PR:
use_passfd_io
andpassfd_listener_port
, are introduced to enable and configure the feature.Implementation Details:
Agent side:
accept()
and saves the(hostport, stream)
pairs for later use.CreateContainer
orExecProcess
requests, the agent gets the port info, finds corresponding streams, and uses them as the child process's stdin/stdout/stderr. Note that if a terminal is required, the agent spawns two tokio tasks to copy from the stdin stream to the term_master, and from the term_master to the stdout stream.Runtime-rs side:
OK <hostport>
results.CreateContainerRequest
andExecProcessRequest
carry the stream's hostport information.Dragonball vmm:
New Protobuf Fields:
Three extra u32 fields (stdin, stdout, and stderr stream ports) are added to the
CreateContainerRequest
andExecProcessRequest
structs.This PR is part of the student open-source practice program hosted in GitLink Code Camp, similar to the GSoC.
cc mentor @lifupan