Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

[WIP] stage1/logging: introduce a new journal logging mode #3727

Closed
wants to merge 2 commits into from

Conversation

fabiokung
Copy link
Contributor

It is similar to the existing log mode that forwards all logs to systemd, with the difference that output will not show up on the console as well (StanddardOutput|Error=journal instead of journal+console).

In addition to that, a read-only view of the pod journal will be bind mounted inside each app, so one more apps can decide to forward output from the pod reading that journal.

Signed-off-by: Fabio Kung fabio.kung@gmail.com

@ghost
Copy link

ghost commented Jun 22, 2017

Can one of the admins verify this patch?

@fabiokung fabiokung changed the title stage1/units: introduce a new journal logging mode stage1/logging: introduce a new journal logging mode Jun 22, 2017
@fabiokung
Copy link
Contributor Author

Thoughts on this? I'm trying to find a good way to allow app sidecars (containerized inside the pod) to act as log forwarders for the pod.

There's the existing iomux and output streams + socket activation stuff on systemd, but it seems abandoned and it is unclear it was the intended use case. For now, it seemed simpler to make the pod journal available inside apps, so they can read from it and forward logs.

@fabiokung fabiokung changed the title stage1/logging: introduce a new journal logging mode [WIP] stage1/logging: introduce a new journal logging mode Jun 22, 2017
@fabiokung
Copy link
Contributor Author

WIP, BindReadOnlyPaths is only supported on systemd >= 233.

However, I'm still interested on design feedback and/or alternatives.

It is similar to the existing `log` mode that forwards all logs to
systemd, with the difference that output will not show up on the console
as well (StanddardOutput|Error=journal instead of journal+console).

In addition to that, a read-only view of the pod journal will be bind
mounted inside each app, so one more apps can decide to forward output
from the pod reading that journal.

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
@fabiokung fabiokung force-pushed the journal-apps branch 2 times, most recently from a2016bf to 92504f1 Compare June 26, 2017 19:11
... in their respective stage2/rootfs/etc/machine-id paths.

This allows apps to read the Pod journal when it is available.
systemd has a similar mechanism with systemd-firstboot.service and
systemd-machine-id-setup(1). The stage1 image could use that service,
since nspawn is already passing the container_uuid[1] ENV var, but
writing the machine-id files directly was simpler than bringing in a new
systemd service/dependency.

[1]: https://www.freedesktop.org/wiki/Software/systemd/ContainerInterface/

Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
@lucab
Copy link
Member

lucab commented Jul 10, 2017

There's the existing iomux and output streams + socket activation stuff on systemd, but it seems abandoned and it is unclear it was the intended use case.

It is unfinished, yes. The usecase is to have a custom stdin/stdout/stderr handler for scenario where we don't want to go through journald (e.g. custom log forwarders) or we need to attach/detach to apps (e.g. detachable interactive sessions). Those forwarder/attachers are in the host environment though, not in the pod itself.

Thoughts on this? I'm trying to find a good way to allow app sidecars (containerized inside the pod) to act as log forwarders for the pod.

Except for the systemd version incompatibility, I'm not much of a fan of the bindmount as it generically tries to sidestep and bend app's FS separation for a very specific purpose. Also, you'll soon realize you can't distinguish stdout vs stderr lines.

Some other options worth considering:

  • teaching the app not to print on stdout/stderr, but use proper logging to sidecar instead. This would also avoid the journald/io overhead, support structured logging, and looks cleaner to me. But it doesn't offer a generic solution baked in the container runtime, of course.
  • having a custom stage1 image, with a dependency on stage1-coreos, which just overwrites the iottymux binary or install a drop-in for each app with the bindmount. This would be a bit more generic than the above, but still no in rkt stage1-coreos itself.
  • teaching stream mode that those entries are handled by a sidecar app. This seems clean but I fear it introduces an implicit ordering on application start.

/cc @alban as they do similar stage1-dependency tricks and may also need some iottymux progress to get rid of journal2cri.

machineIDBytes := append([]byte(machineID), '\n')
if err := ioutil.WriteFile(mPath, machineIDBytes, 0644); err != nil {
return nil, nil, errwrap.Wrap(errors.New("error writing /etc/machine-id"), err)
}
if err := user.ShiftFiles([]string{mPath}, &p.UidRange); err != nil {
return nil, nil, errwrap.Wrap(errors.New("error shifting /etc/machine-id"), err)
}
// also create /etc/machine-id inside each app, so they can read the pod journal when available
Copy link
Member

@lucab lucab Jul 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems useful standalone too, I would suggest you to split it out on its own PR. Better to do a stat() before and skip it if present though, to avoid messing with app rootfs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can split that into a different PR.

Doing the stat before would break some very common cases though. Unfortunately, many common base images have a /etc/machine-id baked in as artifacts of how they were built. E.g.:

rkt --insecure-options=image run docker://debian:jessie -- -c "cat /etc/machine-id"
[106023.438284] debian[5]: 5d12f049cbc39eb705e94d4b4fe4e580

@fabiokung
Copy link
Contributor Author

fabiokung commented Jul 10, 2017

Thanks for the feedback! I'm aware of the stdout/err separation issue with systemd's journal mode. My plan was to send a PR to systemd. That does not seem too hard to fix (famous last words).

I'm also not as worried with the bind mounts because we do similar things for pod volumes shared across all apps. This would be a special (implicit?) case of that.

As for the options you suggest, I considered some of them. Some more thoughts inline:

  • teaching the app not to print on stdout/stderr, but use proper logging to sidecar instead. This would also avoid the journald/io overhead, support structured logging, and looks cleaner to me. But it doesn't offer a generic solution baked in the container runtime, of course.

We run arbitrary code, and it would be pretty hard to enforce and/or prescribe that for all apps. It's a different avenue that we may explore at some point, but for now it is not very realistic.

  • having a custom stage1 image, with a dependency on stage1-coreos, which just overwrites the iottymux binary or install a drop-in for each app with the bindmount. This would be a bit more generic than the above, but still no in rkt stage1-coreos itself.

I started this that way, but soon realized that customizing units generated by stage1-coreos is not trivial. It is currently done by the <stage1>/init entrypoint, which immediately execs into systemd. Customizing units means patching the source code for /init and maintaining a fork of it. There are no good hooks or good ways to wrap it and install unit drop-ins.

This may be a good area to focus on a different PR: extensibility of systemd units generated by stage1-coreos. In which case I'd be happy to maintain this as an alternative stage1 image, instead of baking it onto vanilla rkt if you are not interested in having apps reading from the pod journal.

  • teaching stream mode that those entries are handled by a sidecar app. This seems clean but I fear it introduces an implicit ordering on application start.

Ordering on app start is something I will need to work on anyway, because even with journald, graceful shutdown of a Pod means we need to stop the log uploader sidecar last, and guarantee that all shutdown logs get shipped as well.

My reservation about using the stream mode is having to re-implement a bunch of stuff we get from journald "for free": structured logging, buffering and rate limiting, inspection from the host, configuration knobs, and (most importantly) avoid blocking apps writing to stdout/err.

@alban
Copy link
Member

alban commented Jul 10, 2017

@fabiokung the following code adds some mounts via systemd units for each app without patching /init, using a systemd templated unit:
prepare-app@.service.d/10-bind-mount-kernel-header.conf
https://github.com/kinvolk/stage1-builder/blob/master/builder#L133

@fabiokung
Copy link
Contributor Author

@alban ah, drop-ins on prepare-app@.service is a neat trick I didn't know! Thanks, that works.

@fabiokung
Copy link
Contributor Author

Closing this, I'll keep it as a separate stage1 image, and break up the /etc/machine-id changes in stage2 into another PR.

@fabiokung fabiokung closed this Jul 22, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants