New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to access container stdin/stdout/stderr directly? #3639

Open
tesujimath opened this Issue Apr 10, 2017 · 29 comments

Comments

Projects
None yet
6 participants
@tesujimath
Copy link
Contributor

tesujimath commented Apr 10, 2017

I am wanting to capture the raw stdout from my container, which currently appears with timestamp and container prefixes. I presume this is because stdout is in log mode, and I need to set it into stream mode, by using the per-application options.

However, I can't work out where to pass these on the rkt command line. Here is my attempt (scroll all the way to the right past some uninteresting volume mounts, etc):

inscrutable$ sudo /usr/bin/rkt --insecure-options=image run --set-env=HOME=/home/guestsi --volume home,kind=host,source=/home/guestsi --volume dataset,kind=host,source=/dataset --volume bifo,kind=host,source=/bifo --volume volume-config,kind=empty,uid=511,gid=511 --volume volume-data,kind=empty,uid=511,gid=511 docker://biocontainers/samtools --mount volume=home,target=/home/guestsi --mount volume=dataset,target=/dataset --mount volume=bifo,target=/bifo --user=511 --group=511 --stdout=stream -- flagstat /home/guestsi/playpen/sam/small.bam
[sudo] password for guestsi: 
run: error parsing app image arguments: unknown flag: --stdout

I tried with --stdout in various other places, and got other errors.

What am I misunderstanding here?

@euank

This comment has been minimized.

Copy link
Member

euank commented Apr 11, 2017

The documentation is remiss for not mentioning that those are experimental features gated behind the RKT_EXPERIMENT_ATTACH=true environment variable being set.

Assuming setting that environment variable is what's missing, this can be a documentation issue.

@tesujimath

This comment has been minimized.

Copy link
Contributor Author

tesujimath commented Apr 11, 2017

OK, thanks, that helped a bit. But now I find that --stdout=stream doesn't actually produce any output at all.
Here it is with log mode (sorry for all the command line flags, which are necessary to get my container to run without warnings):

inscrutable# RKT_EXPERIMENT_ATTACH=true /usr/bin/rkt --insecure-options=image run --volume volume-config,kind=empty,uid=511,gid=511 --volume volume-data,kind=empty,uid=511,gid=511 --volume home,kind=host,source=/home/guestsi docker://biocontainers/samtools --mount volume=home,target=/home/guestsi --user=511 --group=511 --stdout=log -- flagstat /home/guestsi/playpen/sam/small.bam
[ 3384.583849] samtools[5]: 200 + 0 in total (QC-passed reads + QC-failed reads)
[ 3384.584845] samtools[5]: 0 + 0 secondary
[ 3384.585727] samtools[5]: 0 + 0 supplementary
[ 3384.586236] samtools[5]: 0 + 0 duplicates
[ 3384.586697] samtools[5]: 96 + 0 mapped (48.00% : N/A)
[ 3384.587133] samtools[5]: 200 + 0 paired in sequencing
[ 3384.587584] samtools[5]: 100 + 0 read1
[ 3384.588018] samtools[5]: 100 + 0 read2
[ 3384.588989] samtools[5]: 26 + 0 properly paired (13.00% : N/A)
[ 3384.589594] samtools[5]: 52 + 0 with itself and mate mapped
[ 3384.589998] samtools[5]: 44 + 0 singletons (22.00% : N/A)
[ 3384.590194] samtools[5]: 24 + 0 with mate mapped to a different chr
[ 3384.590354] samtools[5]: 10 + 0 with mate mapped to a different chr (mapQ>=5)
inscrutable# 

I'm wanting to capture raw stdout on the host, but with --stdout=stream, I get nothing at all:

inscrutable# RKT_EXPERIMENT_ATTACH=true /usr/bin/rkt --insecure-options=image run --volume volume-config,kind=empty,uid=511,gid=511 --volume volume-data,kind=empty,uid=511,gid=511 --volume home,kind=host,source=/home/guestsi docker://biocontainers/samtools --mount volume=home,target=/home/guestsi --user=511 --group=511 --stdout=stream -- flagstat /home/guestsi/playpen/sam/small.bam
inscrutable# 

Similarly, --stdout=tty produces no output either.

Any ideas?

@tesujimath

This comment has been minimized.

Copy link
Contributor Author

tesujimath commented Apr 11, 2017

PS. using official rkt 1.25.0 RPM

@lucab

This comment has been minimized.

Copy link
Member

lucab commented Apr 19, 2017

If you want to use the experimental attach modes (stream/tty), you also need to attach via rkt attach or some other custom I/O handler. See https://coreos.com/rkt/docs/latest/devel/log-attach-design.html and https://coreos.com/rkt/docs/latest/subcommands/attach.html.

@tesujimath

This comment has been minimized.

Copy link
Contributor Author

tesujimath commented Apr 20, 2017

Ah, sorry, hadn't noticed rkt attach. I'll read up on that and try again.

Thanks for your patience.

@lucab

This comment has been minimized.

Copy link
Member

lucab commented Apr 20, 2017

No problem. Also, have a look at the endpoint in --mode=list, you can directly interface your custom logic via those. Please note that all of the iottymux subsystem is highly experimental, and there are some known open bugs.

@tesujimath

This comment has been minimized.

Copy link
Contributor Author

tesujimath commented Apr 23, 2017

So, I read up on rkt attach, and tried using it. As follows:

# RKT_EXPERIMENT_ATTACH=true /usr/bin/rkt --interactive --insecure-options=image run --net=host docker://biocontainers/bwa:latest --user=511 --group=511 --stdout=stream --exec bash

In another terminal

# rkt list --full | grep running
a8a47476-92a9-46eb-9443-6b5fd75c6fcf	bwa				registry-1.docker.io/biocontainers/bwa:latest			sha512-cbb4f9182164	running		2017-04-24 10:02:28.633 +1200 NZST	2017-04-24 10:02:28.916 +1200 NZST	

# RKT_EXPERIMENT_ATTACH=true rkt attach --mode list a8a47476-92a9-46eb-9443-6b5fd75c6fcf
iottymux: runtime failure: open /rkt/iottymux/bwa/endpoints: no such file or directory
stage1-attach: error executing "iottymux": exit status 254
attach: attach failed: error executing stage1 entrypoint: exit status 254

I think this is because the systemd on CentOS 7 is too old. It is v219, but the design doc says this feature works with v232 and later.

Is this a showstopper with regard to getting hold of stdout directly from a container on CentOS 7?

(We are having severe performance problems with journald, because of the huge amount of stdout. With bioinformatics applications, this can be hundreds of GB.)

My full environment:

rkt Version: 1.25.0
appc Version: 0.8.10
Go Version: go1.7.4
Go OS/Arch: linux/amd64
Features: -TPM +SDJOURNAL
--
Linux 3.10.0-514.10.2.el7.x86_64 x86_64
--
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
--
systemd 219
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN
@tesujimath

This comment has been minimized.

Copy link
Contributor Author

tesujimath commented Apr 24, 2017

So I had another idea. Recall that my goal is to pipe huge quantities (100s GB) of data between containers stdin/stdout without going through journald, that I am on CentOS 7 with systemd v219, and that upgrading systemd to v232 seems infeasible (to me, since Red Hat heavily patched the upstream in their systemd RPM).

But what I could do is mount /proc/$$/fd from the host into the container, and arrange for the container to redirect its stdin/stdout/stderr through these host file descriptors, before running the actual target program. A brief test doing this for stdout showed it doing the right thing.

Does this seem like a reasonable idea to avoid the journald overhead?

@tesujimath tesujimath changed the title How to pass per-application options? How to access container stdin/stdout/stderr directly? Apr 24, 2017

@lucab

This comment has been minimized.

Copy link
Member

lucab commented Apr 24, 2017

Which stage1 are you using? The >=232 dependency is on stage1 systemd, if you are using stage1-coreos you should be already fine.

Also, looking at the specifics (kernel, systemd) of this system, it seems pretty ancient and I'd likely expect misbehaviors on the front of cgroups/namespaces/overlayfs. Isn't there some newer release of this distro? /cc @brianredbeard

But what I could do is mount /proc/$$/fd from the host into the container, and arrange for the container to redirect its stdin/stdout/stderr through these host file descriptors, before running the actual target program.

This is pretty much what --stdout=stream does, except that those fds are socket-units in stage1 context and bridging to host happens via rkt-attach.

@tesujimath

This comment has been minimized.

Copy link
Contributor Author

tesujimath commented Apr 24, 2017

I'm using the default stage1, which would be the stage1-coreos I believe. Glad to hear that the systemd dependency is only on stage1, not on the host. (RHEL 7 is not so much ancient, as, well, stable.)

Would there be another reason why I get that stage1 error?

iottymux: runtime failure: open /rkt/iottymux/bwa/endpoints: no such file or directory

Are there more useful diagnostics I can provide? For example, interrupting stage1 and having a look what's gone wrong inside?

@lucab

This comment has been minimized.

Copy link
Member

lucab commented Apr 24, 2017

I've just realized you are running with --interactive, which bypasses all the single-stream options (this should probably be reported as a separate bug, as the two options are mutually exclusive).

I tried replacing that with --stdout=stream --stdin=stream and your container seems to work fine. Please note however that you won't get a TTY in this case, see #3652.

@tesujimath

This comment has been minimized.

Copy link
Contributor Author

tesujimath commented Apr 24, 2017

Ah, yes, that was because I was careful to follow the instructions on the man page:

In order for an application to be attachable:

  • it must be started in interactive mode ...

If I just run it without that, there would seem to be a race for rkt attach to actually attach before any output is lost? Or does the pipe simply fill and block in that case?

@lucab

This comment has been minimized.

Copy link
Member

lucab commented Apr 24, 2017

Ah, yes, that was because I was careful to follow the instructions on the man page

Bad wording on my side, that should read "attachable mode".

If I just run it without that, there would seem to be a race for rkt attach to actually attach before any output is lost? Or does the pipe simply fill and block in that case?

Correct, there is no buffering involved at the moment, see #3587.

@tesujimath

This comment has been minimized.

Copy link
Contributor Author

tesujimath commented Apr 24, 2017

OK, thanks for the clarification. I'll try again, with a work-around for the lack of buffering.

@tesujimath

This comment has been minimized.

Copy link
Contributor Author

tesujimath commented Apr 26, 2017

I've basically got this working, but there are a few frustrations.

How best to obtain the UUID of the container so I can run rkt attach is not clear. I wanted to use rkt prepare, which tells me that, but this experimental functionality doesn't seem to be supported there. (I presume you plan to add it.) Instead, I have to wait around for a while, then look at the output of rkt list and try to find my container. And only then can I call rkt attach.

Lack of buffering meant I had to work quite hard on the synchronisation. Sleep is fragile on a busy system, as you're never quite sure how long to wait. Instead, I mounted another directory from the host into the container, and wrote a wrapper rkt-run-slave that runs in the container, using inotify, so that the external event (rkt attach has run) may be notified by the host into the container by creating a file. rkt-run-slave waits until this has happened before running the target program in the container. This works nicely, but seems like a work-around for a problem which shouldn't exist.

It would be nice if the stream attachment could be done inside rkt run with all the synchronisation handled automatically, so the container program doesn't get started until the streams are all attached.

I've added this functionality to rktrunner. Not quite finished, it's a work in progress at this time.

@lucab

This comment has been minimized.

Copy link
Member

lucab commented Apr 26, 2017

Regarding "obtaining UUID", you are probably looking for --uuid-file-save (which should be expanded to more subcommands, see #3005).

Regarding "waiting for a while", I think rkt status --wait-ready will help you determining when the pod is ready. I think a similar option for app status may also make sense, as I'm not sure the one above solves all your problems.

Regarding inotify, just beware that it may misbehave on overlayfs on non-root-recent kernels.

@tesujimath

This comment has been minimized.

Copy link
Contributor Author

tesujimath commented Apr 30, 2017

OK, thanks, I'm now using these (rkt run --uuid-file-save and rkt status --wait-ready) and it's working nicely.

I haven't hit any issues with inotify misbehaviour so far.

@tesujimath tesujimath closed this Apr 30, 2017

@lucab

This comment has been minimized.

Copy link
Member

lucab commented May 2, 2017

@tesujimath nice to hear! (aside from all the stream-attach related bugs, which are unfortunately still there in this experimental phase)

Perhaps that uuid-save+wait pattern is not documented well enough. We'll gladly take any PR for the user-docs or external blog-posts showing an actual workflow, like in your usecase.

@tesujimath

This comment has been minimized.

Copy link
Contributor Author

tesujimath commented May 3, 2017

@lucab thanks for your help on this. I am indeed hitting some problems, with what I presume are stream-attach related bugs. Getting errors from the Go runtime like this:

runtime: failed to create new OS thread (have 5 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)

I've also seen errors like this:

stage1-attach: error executing "iottymux": signal: broken pipe
attach: attach failed: error executing stage1 entrypoint: exit status 254

And occasionally like this:

Unable to open "/proc/15083/root": No such file or directory
stage1-attach: error executing "iottymux": exit status 2
attach: attach failed: error executing stage1 entrypoint: exit status 254

I've not yet had much success trying to isolate these problems. I've started looking into using the fly stage1 as a way of avoiding some of these nasties. That may be sufficient for our use cases, although I see there are some issues with fly (I hit #3662 almost straight away).

Given these problems, I thought I'd better reopen this issue.

@tesujimath tesujimath reopened this May 3, 2017

@tesujimath

This comment has been minimized.

Copy link
Contributor Author

tesujimath commented May 3, 2017

@lucab BTW I am not ignoring your suggestion about describing our use of rkt in a series of blog posts. I'll probably get round to that at some stage, and will ping you back then, in case you want to link to them.

@oberstet

This comment has been minimized.

Copy link

oberstet commented Jun 4, 2017

Why is this rkt attach stuff needed for stdin/out comm. with a container - at least with a single app in a pod?

IMO this doesn't seem to fit the rkt philos.: "it is like running an executable .. no daemons, no fuzz ..".

Is it possible to bind additional FDs (besides 0, 1 and 2) to communicate between the parent process that started rkt and the container started by rkt?

@lucab

This comment has been minimized.

Copy link
Member

lucab commented Jun 7, 2017

Because a pod is not a single app, and the supervision/muxing have to be handled somewhere inside it. Also, this doesn't clash with the general design, as everything is still rooted under a separate process tree.

Is it possible to bind additional FDs (besides 0, 1 and 2) to communicate between the parent process that started rkt and the container started by rkt?

A specific subcase of this is possible with socket-activation, however I think this doesn't work in general as there are several points in the exec chain where we ensure fd are O_CLOEXEC to avoid escapes.

@oberstet

This comment has been minimized.

Copy link

oberstet commented Jun 7, 2017

@lucab Do you know a container system for Linux that allows real composability like Unix executables directly hooked up via pipes, pushing potentially mass data over?

I understand, neither Docker nor rkt was designed for that ..

@brianredbeard

This comment has been minimized.

Copy link
Contributor

brianredbeard commented Jun 7, 2017

@oberstet While not a container system per se, potentially something like the vector packet processing and CICN/libcinet components of fd.io to optimize the transport would be what you're looking for. I know that folks at the various companies sponsoring the project have prioritized delivery of these pieces of software for containerized workloads.

I didn't really intend to side track this, only point out that as folks have higher throughput requirements there are pieces in flight (in the true old school Bell Labs "UNIX" philosophy that UNIX is a control plane platform for manipulating an external data plane like 5ESS)

@euank

This comment has been minimized.

Copy link
Member

euank commented Jun 7, 2017

Do you know a container system for Linux that allows real composability like Unix executables directly hooked up via pipes, pushing potentially mass data over?

systemd-nspawn by itself, rather than via's rkt's use of it, is able to handle pipes pretty well. All you have to do is not use the '--boot' flag.

An example, modified from the nspawn docs:

# machinectl pull-raw --verify=no https://dl.fedoraproject.org/pub/fedora/linux/releases/25/CloudImages/x86_64/images/Fedora-Cloud-Base-25-1.3.x86_64.qcow2
# echo -e "hello\nworld" | systemd-nspawn -M 'Fedora-Cloud-Base-25-1.3.x86_64' --network-veth -q -- tac
world
hello

'runc' could also do something similar, though it has some additional complexity to support containerd/docker and such, and doesn't have any image management.

rkt conceivably could have a stage1 or a flag that, like fly, doesn't support multiple apps and can thus exec directly through.
It already has '--interactive' which is sort-of a flag for this kind of thing, but doesn't work properly with pipes.
The overall idea though is a bit at odds with pods.

@oberstet

This comment has been minimized.

Copy link

oberstet commented Jun 7, 2017

@euank thanks a lot for these insights and hints! very interesting.

having a simple flag to rkt (single app + interactive) would be I think exactly what I am after: the convenience/high-level abstraction that rkt provides me (compared to raw runc or systemd-nspawn) and the "full" composability.

"composability" is a stated goal of rkt, and from my impression, above would fit into that quite well - but I am in no position to judge of course - just my limited perspective.

The overall idea though is a bit at odds with pods.

Right.

I've read that rkt wants to be "best buddy" to k8s, but there might be a stretch/conflict in goals with "composability".

But then, above flag could be used outside a k8s context - for those of us that do something else. Being able to create a processing graph out of piped containers comes to mind. My use case is a container manager (non k8s/mesos/..) that a) allows to tap into the live container log stream remotely or b) can talk a real protocol over pipes with the stuff inside the container.

@trusch

This comment has been minimized.

Copy link
Contributor

trusch commented Jun 9, 2017

@oberstet

This comment has been minimized.

Copy link

oberstet commented Jun 9, 2017

@trusch using TCP complicates matters (listening ports), and will indeed have less throughput / more overhead compared to (buffered) pipes (at least 2 copies vs 1).

@euank "Containers are started as a child process of runC .." - mmh. Is there something that doesn't fork, but replaces the executed image (https://linux.die.net/man/3/execlp)?

@tesujimath

This comment has been minimized.

Copy link
Contributor Author

tesujimath commented Jun 26, 2017

@lucab I got round to writing up some blog posts on our use cases for rkt with BioContainers as per your suggestion. You will see that we solved the stdio piping problem by switching to fly-stage1. Please feel free to link to the blog if you think it would be of interest to other users of rkt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment