journald: Proposal to process logs externally before storing them #7557

aledomu · 2017-12-06T17:03:08Z

Submission type

Request for enhancement (RFE)

systemd version the issue has been seen with

235

Used distribution

irrelevant

This is related to:
#2447
#6432
#6361 (I'm not sure what this is about)

Ok, this is something that i had been thinking about since a few days ago, and I write here my conclusions/ideas, inspired by #2447 (comment).

The problem that @poettering is trying to point out is that implementing some kind of filtering logic in journald can be a source of problems and goes against what they look for in performance and simplicity (beware that journald can't be disabled at compile-time even for the most minimal build). I somewhat agree with him but I also recognize sometimes you need to do this workaround. So I strongly defend the need to filter messages before storing them, but I also think at the same time that it could be done outside of journald, or implement simpler ways of doing so.

It's not that I don't want to store any information (like here seems to be understood #2447 (comment)). In fact I want to, and if possible using journald to store and read the logs, because I like how integrated it is with the system and the structured logs feature. And I see that I'm not the only one who is asking for this. Summarizing my (probably our) requests: I want to collect and store the logs of the services with journald, but be able to do it selectively in a fine-grained manner, not the coarse-grained way of "just turn the logs off if you don't want to store anything". Why would someone store just some information? Because a sysadmin needs the logs to diagnose what happened in a system if there is any issue, but sometimes there's stuff you don't want to keep so nobody can force you to give it to them (you factually can't, it's not possible for you in this case). If you store that information, you can't refuse to give that to authorities, for example. Also, the point about "fix the program" could be correct when it just outputs too much irrelevant messages, but when you need to strip out some sensitive information before storing anything to actually achieve some privacy goals (for example) and the problem isn't the design of the program from a "technical" point of view, this no-solution has the same problem that systemd has tried to approach in many places, which is code and functionality duplication across a system (in this case, controlling what information is stored in logs). Moreover, it is very unrealistic to expect every developer out there to implement some kind of specific filtering in their programs, specially if using structured logs, which just becomes "enable/disable this field" and doable in a very simple way.

So I have two proposals:

Enable an opt-in and per-service out and in piping logic in journald, both in text stream and D-Bus object (or BUS1 in the future) ways. This way admins can filter messages the way it suits them better, they aren't even forced to use regexps if they don't want/need to. This has the nice side-effect that if taking the information from D-Bus it could become possible to structure unstructured logs before storing them, enabling full use of journald improvements.
Do the filtering in journald, but instead of doing regexps, just discard the fields of the service structured logs. It's possible as well not to register an entry based on any condition in any field (like a database), although my main concern is about completely discarding a field. All in a declarative manner in .service files like this, both with opt-in and opt-out options (it doesn't have to be these names nor syntax, I'm clueless about it):

[Log]
(Not)StoreField=Field1, Field2,..., FieldN     /* This can't refer to fields that journald itself writes */
(Not)StoreEntryIf="Field3 = "text", Field4 >= 100,...     /* Can read from journald's own fields. Needs logical gates and brackets to determine their scope. */

The con is that since you need the logs to be structured in the journald format, in order to use this in a realistically practical and effective way there should be some kind of "filter" anyways that structures the logs when they aren't structured (most programs still output only to stdout and stderr, and it seems that it will be a long way to happen a shift towards adapting to journald, if ever). I don't know if it's possible to make a program grab all stdout and stderr output from its child process and its subprocesses with guaranties (and block all that output and give to journald a structured log instead), in which case the previous structuring phase needed can be solved outside systemd's development without having to wait for the services required to support journald output, although it has the disadvantage that you need one process per-service to achieve this instead of centralizing management. If it isn't possible to grab all the text output externally this way, something like my first proposal is needed, but just exposing a D-Bus/BUS1 socket optionally enabled apart from the store/discard stuff should be enough (a text stream isn't needed since the program that structures the logs needs to support D-Bus/BUS1 to output the result anyway). The upside of this approach is that, if the service outputs logs in the journald format, there's no need to use any IPC mechanism and performance should be better (I'm not sure about this since I'm a newbie in all the IPC stuff). Of course, implementing any parsing logic in journald in the best case seems suboptimal when you can externalize this task. Although it would be inconvenient in the short to medium-term at least, I'm fine if only the store/discard feature is considered without any possibility to structure the logs in real-time before storing them.

I think that these solutions or something along these lines make it possible to filter messages without the risk of scope creep or too much complexity. Whatever downside filtering messages using these mechanisms has, unless it is too drastic, I don't see any big problem. There might be some trade-offs when using this feature (like performance, but when using BUS1 this might be greatly mitigated), but as I said, this is something optional that has to be enabled. In the first case at least, maybe just warning of the inconveniences and something like "you're on your own, we have little to do here" in the docs could be enough.

If you disagree or see some lacking points, please give a detailed rationale of why you think so, but keep in mind these things:

We need/want to do this, so don't tell us that our "wants/needs" are wrong.
We don't want to use journald as a mere collector. It is really nice for storing and specially retrieving logs, and we want just the information we want/need to keep centralized (although personally I have to understand the topic of corruption issues a bit better, but so far I haven't had any problem with that).
If you think that our practices as sysadmins in order to achieve these goals are wrong, tell us an alternative way of doing it or thinking about it without having to disable journald storage, but just don't say "won't implement anything, use something else or fix the program", regardless of how detailed and "technically correct" your rationale is (I get that if it's about too much messages or not properly designed output, then it's program's fault). If possible, tell something you haven't said yet.

The text was updated successfully, but these errors were encountered:

poettering added journal RFE 🎁 Request for Enhancement, i.e. a feature request labels Dec 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

journald: Proposal to process logs externally before storing them #7557

journald: Proposal to process logs externally before storing them #7557

aledomu commented Dec 6, 2017 •

edited

journald: Proposal to process logs externally before storing them #7557

journald: Proposal to process logs externally before storing them #7557

Comments

aledomu commented Dec 6, 2017 • edited

Submission type

systemd version the issue has been seen with

Used distribution

aledomu commented Dec 6, 2017 •

edited