Terminals and Standard IO
Note that the default configuration of
runc (foreground, new terminal) is
generally the best option for most users. This document exists to help explain
what the purpose of the different modes is, and to try to steer users away from
common mistakes and misunderstandings.
In general, most processes on Unix (and Unix-like) operating systems have 3
standard file descriptors provided at the start, collectively referred to as
"standard IO" (
0: standard-in (
stdin), the input stream into the process
1: standard-out (
stdout), the output stream from the process
2: standard-error (
stderr), the error stream from the process
When creating and running a container via
runc, it is important to take care
to structure the
stdio the new container's process receives. In some ways
containers are just regular processes, while in other ways they're an isolated
sub-partition of your machine (in a similar sense to a VM). This means that the
structure of IO is not as simple as with ordinary programs (which generally
just use the file descriptors you give them).
Other File Descriptors
Before we continue, it is important to note that processes can have more file
descriptors than just
stdio. By default in
runc no other file descriptors
will be passed to the spawned container process. If you wish to explicitly pass
file descriptors to the container you have to use the
These ancillary file descriptors don't have any of the strange semantics
discussed further in this document (those only apply to
stdio) -- they are
passed untouched by
It should be noted that
--preserve-fds does not take individual file
descriptors to preserve. Instead, it takes how many file descriptors (not
LISTEN_FDS) should be passed to the container. In the
% runc run --preserve-fds 5 <container>
runc will pass the first
5 file descriptors (
LISTEN_FDS has not been configured) to the container.
In addition to
LISTEN_FDS file descriptors are passed
automatically to allow for
systemd-style socket activation. To extend the
% LISTEN_PID=$pid_of_runc LISTEN_FDS=3 runc run --preserve-fds 5 <container>
runc will now pass the first
8 file descriptors (and it will also pass
LISTEN_PID=1 to the container). The first
5) were passed due to
LISTEN_FDS and the other
10) were passed due to
--preserve-fds. You should keep this in mind if
runc directly in something like a
systemd unit file. To disable
LISTEN_FDS-style passing just unset
Be very careful when passing file descriptors to a container process. Due
to some Linux kernel (mis)features, a container with access to certain types of
file descriptors (such as
O_PATH descriptors) outside of the container's root
file system can use these to break out of the container's pivoted mount
namespace. This has resulted in CVEs in the past.
runc supports two distinct methods for passing
stdio to the container's
When first using
runc these two modes will look incredibly similar, but this
can be quite deceptive as these different modes have quite different
runc spec will create a configuration that will create a new
terminal: true). However, if the
terminal: ... line is not
config.json then pass-through is the default.
In general we recommend using new terminal, because it means that tools like
sudo will work inside your container. But pass-through can be useful if you
know what you're doing, or if you're using
runc as part of a non-interactive
In new terminal mode,
runc will create a brand-new "console" (or more
precisely, a new pseudo-terminal using the container's namespaced
/dev/pts/ptmx) for your contained process to use as its
When you start a process in new terminal mode,
runc will do the following:
- Create a new pseudo-terminal.
- Pass the slave end to the container's primary process as its
- Send the master end to a process to interact with the
stdiofor the container's primary process (details below).
It should be noted that since a new pseudo-terminal is being used for
communication with the container, some strange properties of pseudo-terminals
might surprise you. For instance, by default, all new pseudo-terminals
translate the byte
'\n' to the sequence
'\r\n' on both
stderr. In addition there are a whole range of
ioctls(2) that can only
interact with pseudo-terminal
NOTE: In new terminal mode, all three
stdiofile descriptors are the same underlying file. The reason for this is to match how a shell's
stdiolooks to a process (as well as remove race condition issues with having to deal with multiple master pseudo-terminal file descriptors). However this means that it is not really possible to uniquely distinguish between
stderrfrom the caller's perspective.
If you have already set up some file handles that you wish your contained
process to use as its
stdio, then you can ask
runc to pass them through to
the contained process (this is not necessarily the same as
passing of file descriptors -- details below). As an example
terminal: false is set in
% echo input | runc run some_container > /tmp/log.out 2>& /tmp/log.err
Here the container's various
stdio file descriptors will be substituted with
stdinwill be sourced from the
stdoutwill be output into
/tmp/log.outon the host.
stderrwill be output into
/tmp/log.erron the host.
It should be noted that the actual file handles seen inside the container may
be different based on the mode
runc is being used in (for
instance, the file referenced by
1 could be
/tmp/log.out directly or a pipe
runc is using to buffer output, based on the mode). However the net
result will be the same in either case. In principle you could use the new
terminal mode in a pipeline, but the difference will become
more clear when you are introduced to
runc's detached mode.
runc itself runs in two modes:
You can use either terminal mode with either
However, there are considerations that may indicate preference for one mode
over another. It should be noted that while two types of modes (terminal and
runc) are conceptually independent from each other, you should be aware of
the intricacies of which combination you are using.
In general we recommend using foreground because it's the most
straight-forward to use, with the only downside being that you will have a
runc process. Detached mode is difficult to get right and
generally requires having your own
The default (and most straight-forward) mode of
runc. In this mode, your
runc command remains in the foreground with the container process as a child.
stdio is buffered through the foreground
runc process (irrespective of
which terminal mode you are using). This is conceptually quite similar to
running a normal process interactively in a shell (and if you are using
in a shell interactively, this is what you should use).
stdio will be buffered in this mode, some very important
peculiarities of this mode should be kept in mind:
With new terminal mode, the container will see a pseudo-terminal as its
stdio(as you might expect). However, the
stdioof the foreground
runcprocess will remain the
stdiothat the process was started with -- and
runcwill copy all
stdioand the container's
stdio. This means that while a new pseudo-terminal has been created, the foreground
runcprocess manages it over the lifetime of the container.
With pass-through mode, the foreground
stdiois not passed to the container. Instead, the container's
stdiois a set of pipes which are used to copy data between
stdioand the container's
stdio. This means that the container never has direct access to host file descriptors (aside from the pipes created by the container runtime, but that shouldn't be an issue).
The main drawback of the foreground mode of operation is that it requires a
runc process. If you kill the foreground
process then you will no longer have access to the
stdio of the container
(and in most cases this will result in the container dying abnormally due to
SIGPIPE or some other error). By extension this means that any bug in the
runc process (such as a memory leak) or a stray
OOM-kill sweep could result in your container being killed through no fault
of the user. In addition, there is no way in foreground mode of passing a
file descriptor directly to the container process as its
These shortcomings are obviously sub-optimal and are the reason that
an additional mode called "detached mode".
In contrast to foreground mode, in detached mode there is no long-running
runc process once the container has started. In fact, there is no
runc process at all. However, this means that it is up to the
caller to handle the
runc has set it up for you. In a shell
this means that the
runc command will exit and control will return to the
shell, after the container has been set up.
You can run
runc in detached mode in one of the following ways:
runc run -d ...which operates similar to
runc runbut is detached.
runc createfollowed by
runc startwhich is the standard container lifecycle defined by the OCI runtime specification (
runc createsets up the container completely, waiting for
runc startto begin execution of user code).
The main use-case of detached mode is for higher-level tools that want to be
runc. By running
runc in detached mode, those tools have
far more control over the container's
runc getting in the
way (most wrappers around
containerd use detached mode
for this reason).
Unfortunately using detached mode is a bit more complicated and requires more
care than the foreground mode -- mainly because it is now up to the caller to
stdio of the container.
In detached mode, pass-through actually does what it says on the tin -- the
stdio file descriptors of the
runc process are passed through (untouched)
to the container's
stdio. The purpose of this option is to allow a user to
stdio for a container themselves and then force
runc to just use
stdio (without any pseudo-terminal funny business). If
you don't see why this would be useful, don't use this option.
You must be incredibly careful when using detached pass-through (especially
in a shell). The reason for this is that by using detached pass-through you
are passing host file descriptors to the container. In the case of a shell,
stdio is going to be a pseudo-terminal (on your host). A
malicious container could take advantage of TTY-specific
TIOCSTI to fake input into the host shell (remember that in detached
mode, control is returned to your shell and so the terminal you've given the
container is being read by a shell prompt).
There are also several other issues with running non-malicious containers in a
shell with detached pass-through (where you pass your shell's
stdio to the
Output from the container will be interleaved with output from your shell (in a non-deterministic way), without any real way of distinguishing from where a particular piece of output came from.
Any input to
stdinwill be non-deterministically split and given to either the container or the shell (because both are blocked on a
read(2)of the same FIFO-style file descriptor).
They are all related to the fact that there is going to be a race when either
your host or the container tries to read from (or write to)
problem is especially obvious when in a shell, where usually the terminal has
been put into raw mode (where each individual key-press should cause
NOTE: There is also currently a known problem where using detached pass-through will result in the container hanging if the
stderris a pipe (though this should be a temporary issue).
Detached New Terminal
When creating a new pseudo-terminal in detached mode, and fairly obvious
problem appears -- how do we use the new terminal that
runc created? Unlike
runc has created a new set of file descriptors that need to
be used by something in order for container communication to work.
The way this problem is resolved is through the use of Unix domain sockets.
There is a feature of Unix sockets called
SCM_RIGHTS which allows a file
descriptor to be sent through a Unix socket to a completely separate process
(which can then use that file descriptor as though they opened it). When using
runc in detached new terminal mode, this is how a user gets access to the
pseudo-terminal's master file descriptor.
To this end, there is a new option (which is required if you want to use
in detached new terminal mode):
--console-socket. This option takes the path
to a Unix domain socket which
runc will connect to and send the
pseudo-terminal master file descriptor down. The general process for getting
the pseudo-terminal master is as follows:
- Create a Unix domain socket at some path,
runc createwith the argument
recvmsg(2)retrieve the file descriptor sent using
- Now the manager can interact with the
stdioof the container, using the retrieved pseudo-terminal master.
runc exits, the only process with a copy of the pseudo-terminal master
file descriptor is whoever read the file descriptor from the socket.
runcdoesn't support abstract socket addresses (due to it not being possible to pass an
argvwith a null-byte as the first character). In the future this may change, but currently you must use a valid path name.