[WIP] stage1/logging: introduce a new journal logging mode #3727

fabiokung · 2017-06-22T21:41:11Z

It is similar to the existing log mode that forwards all logs to systemd, with the difference that output will not show up on the console as well (StanddardOutput|Error=journal instead of journal+console).

In addition to that, a read-only view of the pod journal will be bind mounted inside each app, so one more apps can decide to forward output from the pod reading that journal.

Signed-off-by: Fabio Kung fabio.kung@gmail.com

ghost · 2017-06-22T21:41:13Z

Can one of the admins verify this patch?

fabiokung · 2017-06-22T21:43:11Z

Thoughts on this? I'm trying to find a good way to allow app sidecars (containerized inside the pod) to act as log forwarders for the pod.

There's the existing iomux and output streams + socket activation stuff on systemd, but it seems abandoned and it is unclear it was the intended use case. For now, it seemed simpler to make the pod journal available inside apps, so they can read from it and forward logs.

fabiokung · 2017-06-22T23:13:48Z

WIP, BindReadOnlyPaths is only supported on systemd >= 233.

However, I'm still interested on design feedback and/or alternatives.

It is similar to the existing `log` mode that forwards all logs to systemd, with the difference that output will not show up on the console as well (StanddardOutput|Error=journal instead of journal+console). In addition to that, a read-only view of the pod journal will be bind mounted inside each app, so one more apps can decide to forward output from the pod reading that journal. Signed-off-by: Fabio Kung <fabio.kung@gmail.com>

... in their respective stage2/rootfs/etc/machine-id paths. This allows apps to read the Pod journal when it is available. systemd has a similar mechanism with systemd-firstboot.service and systemd-machine-id-setup(1). The stage1 image could use that service, since nspawn is already passing the container_uuid[1] ENV var, but writing the machine-id files directly was simpler than bringing in a new systemd service/dependency. [1]: https://www.freedesktop.org/wiki/Software/systemd/ContainerInterface/ Signed-off-by: Fabio Kung <fabio.kung@gmail.com>

lucab · 2017-07-10T09:39:04Z

There's the existing iomux and output streams + socket activation stuff on systemd, but it seems abandoned and it is unclear it was the intended use case.

It is unfinished, yes. The usecase is to have a custom stdin/stdout/stderr handler for scenario where we don't want to go through journald (e.g. custom log forwarders) or we need to attach/detach to apps (e.g. detachable interactive sessions). Those forwarder/attachers are in the host environment though, not in the pod itself.

Thoughts on this? I'm trying to find a good way to allow app sidecars (containerized inside the pod) to act as log forwarders for the pod.

Except for the systemd version incompatibility, I'm not much of a fan of the bindmount as it generically tries to sidestep and bend app's FS separation for a very specific purpose. Also, you'll soon realize you can't distinguish stdout vs stderr lines.

Some other options worth considering:

teaching the app not to print on stdout/stderr, but use proper logging to sidecar instead. This would also avoid the journald/io overhead, support structured logging, and looks cleaner to me. But it doesn't offer a generic solution baked in the container runtime, of course.
having a custom stage1 image, with a dependency on stage1-coreos, which just overwrites the iottymux binary or install a drop-in for each app with the bindmount. This would be a bit more generic than the above, but still no in rkt stage1-coreos itself.
teaching stream mode that those entries are handled by a sidecar app. This seems clean but I fear it introduces an implicit ordering on application start.

/cc @alban as they do similar stage1-dependency tricks and may also need some iottymux progress to get rid of journal2cri.

lucab · 2017-07-10T09:44:15Z

stage1/init/init.go

 	machineIDBytes := append([]byte(machineID), '\n')
 	if err := ioutil.WriteFile(mPath, machineIDBytes, 0644); err != nil {
 		return nil, nil, errwrap.Wrap(errors.New("error writing /etc/machine-id"), err)
 	}
 	if err := user.ShiftFiles([]string{mPath}, &p.UidRange); err != nil {
 		return nil, nil, errwrap.Wrap(errors.New("error shifting /etc/machine-id"), err)
 	}
+	// also create /etc/machine-id inside each app, so they can read the pod journal when available


This seems useful standalone too, I would suggest you to split it out on its own PR. Better to do a stat() before and skip it if present though, to avoid messing with app rootfs.

Sure, I can split that into a different PR.

Doing the stat before would break some very common cases though. Unfortunately, many common base images have a /etc/machine-id baked in as artifacts of how they were built. E.g.:

rkt --insecure-options=image run docker://debian:jessie -- -c "cat /etc/machine-id" [106023.438284] debian[5]: 5d12f049cbc39eb705e94d4b4fe4e580

fabiokung · 2017-07-10T20:09:07Z

Thanks for the feedback! I'm aware of the stdout/err separation issue with systemd's journal mode. My plan was to send a PR to systemd. That does not seem too hard to fix (famous last words).

I'm also not as worried with the bind mounts because we do similar things for pod volumes shared across all apps. This would be a special (implicit?) case of that.

As for the options you suggest, I considered some of them. Some more thoughts inline:

teaching the app not to print on stdout/stderr, but use proper logging to sidecar instead. This would also avoid the journald/io overhead, support structured logging, and looks cleaner to me. But it doesn't offer a generic solution baked in the container runtime, of course.

We run arbitrary code, and it would be pretty hard to enforce and/or prescribe that for all apps. It's a different avenue that we may explore at some point, but for now it is not very realistic.

having a custom stage1 image, with a dependency on stage1-coreos, which just overwrites the iottymux binary or install a drop-in for each app with the bindmount. This would be a bit more generic than the above, but still no in rkt stage1-coreos itself.

I started this that way, but soon realized that customizing units generated by stage1-coreos is not trivial. It is currently done by the <stage1>/init entrypoint, which immediately execs into systemd. Customizing units means patching the source code for /init and maintaining a fork of it. There are no good hooks or good ways to wrap it and install unit drop-ins.

This may be a good area to focus on a different PR: extensibility of systemd units generated by stage1-coreos. In which case I'd be happy to maintain this as an alternative stage1 image, instead of baking it onto vanilla rkt if you are not interested in having apps reading from the pod journal.

teaching stream mode that those entries are handled by a sidecar app. This seems clean but I fear it introduces an implicit ordering on application start.

Ordering on app start is something I will need to work on anyway, because even with journald, graceful shutdown of a Pod means we need to stop the log uploader sidecar last, and guarantee that all shutdown logs get shipped as well.

My reservation about using the stream mode is having to re-implement a bunch of stuff we get from journald "for free": structured logging, buffering and rate limiting, inspection from the host, configuration knobs, and (most importantly) avoid blocking apps writing to stdout/err.

alban · 2017-07-10T20:48:16Z

@fabiokung the following code adds some mounts via systemd units for each app without patching /init, using a systemd templated unit:
prepare-app@.service.d/10-bind-mount-kernel-header.conf
https://github.com/kinvolk/stage1-builder/blob/master/builder#L133

fabiokung · 2017-07-10T20:49:37Z

@alban ah, drop-ins on prepare-app@.service is a neat trick I didn't know! Thanks, that works.

fabiokung · 2017-07-22T02:42:11Z

Closing this, I'll keep it as a separate stage1 image, and break up the /etc/machine-id changes in stage2 into another PR.

fabiokung changed the title ~~stage1/units: introduce a new journal logging mode~~ stage1/logging: introduce a new journal logging mode Jun 22, 2017

fabiokung changed the title ~~stage1/logging: introduce a new journal logging mode~~ [WIP] stage1/logging: introduce a new journal logging mode Jun 22, 2017

fabiokung force-pushed the journal-apps branch from 7cf26f1 to 3815da9 Compare June 26, 2017 18:27

fabiokung force-pushed the journal-apps branch 2 times, most recently from a2016bf to 92504f1 Compare June 26, 2017 19:11

fabiokung force-pushed the journal-apps branch from 92504f1 to fb50d97 Compare June 26, 2017 23:17

lucab added component/stage1 needs/second-opinion technology/systemd kind/design labels Jul 10, 2017

lucab reviewed Jul 10, 2017

View reviewed changes

fabiokung closed this Jul 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] stage1/logging: introduce a new journal logging mode #3727

[WIP] stage1/logging: introduce a new journal logging mode #3727

fabiokung commented Jun 22, 2017

ghost commented Jun 22, 2017

fabiokung commented Jun 22, 2017

fabiokung commented Jun 22, 2017

lucab commented Jul 10, 2017

lucab Jul 10, 2017 •

edited

fabiokung Jul 10, 2017

fabiokung commented Jul 10, 2017 •

edited

alban commented Jul 10, 2017

fabiokung commented Jul 10, 2017

fabiokung commented Jul 22, 2017

[WIP] stage1/logging: introduce a new journal logging mode #3727

[WIP] stage1/logging: introduce a new journal logging mode #3727

Conversation

fabiokung commented Jun 22, 2017

ghost commented Jun 22, 2017

fabiokung commented Jun 22, 2017

fabiokung commented Jun 22, 2017

lucab commented Jul 10, 2017

lucab Jul 10, 2017 • edited

Choose a reason for hiding this comment

fabiokung Jul 10, 2017

Choose a reason for hiding this comment

fabiokung commented Jul 10, 2017 • edited

alban commented Jul 10, 2017

fabiokung commented Jul 10, 2017

fabiokung commented Jul 22, 2017

lucab Jul 10, 2017 •

edited

fabiokung commented Jul 10, 2017 •

edited