Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-journald drops all bytes after '\0' #4863

Open
evverx opened this Issue Dec 10, 2016 · 7 comments

Comments

3 participants
@evverx
Copy link
Member

evverx commented Dec 10, 2016

I'm sorry for not being specific. I will have to update the description so that it won't cause confusion like #6838 (comment).

$ printf '<13>Sep 15 15:07:58 HOST: null\0here' | nc -w1 -u -U /run/systemd/journal/dev-log

$ journalctl -t HOST -o verbose
...
    MESSAGE=null

From #1460

Of course, this will make the patch more complicated, as this means we have to patch much further down to ensure we always keep the ptr+length bit intact.

poettering added a commit to poettering/systemd that referenced this issue Sep 15, 2017

journald: make maximum size of stream log lines configurable and bump…
… it to 48K

This adds a new setting LineMax= to journald.conf, and sets it by
default to 48K. When we convert stream-based stdout/stderr logging into
record-based log entries, read up to the specified amount of bytes
before forcing a line-break.

This also makes three related changes:

- When a NUL byte is read we'll not recognize this as alternative line
  break, instead of silently dropping everything after it. (see systemd#4863)

- The reason for a line-break is now encoded in the log record, if it
  wasn't a plain newline. Specifically, we distuingish "nul",
  "max-reached" and "eof", for line breaks due to NUL byte, due to the
  maximum line length as configured with LineMax= or due to end of
  stream. This data is stored in the new implicit _LINE_BREAK= field.
  It's not synthesized for plain \n line breaks.

- A randomized 128bit ID is assigned to each log stream.

With these three changes in place it's (mostly) possible to reconstruct
the original byte streams from log data, as (most) of the context of
the conversion from the byte stream to log records is saved now. (So,
the only bits we still drop are empty lines. Which might be something to
look into in a future change, and which is outside of the scope of this
work)

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=86465
Fixes: systemd#4863
Replaces: systemd#4875
@poettering

This comment has been minimized.

Copy link
Member

poettering commented Sep 15, 2017

Fix waiting in #6838.

poettering added a commit to poettering/systemd that referenced this issue Sep 15, 2017

journald: make maximum size of stream log lines configurable and bump…
… it to 48K

This adds a new setting LineMax= to journald.conf, and sets it by
default to 48K. When we convert stream-based stdout/stderr logging into
record-based log entries, read up to the specified amount of bytes
before forcing a line-break.

This also makes three related changes:

- When a NUL byte is read we'll not recognize this as alternative line
  break, instead of silently dropping everything after it. (see systemd#4863)

- The reason for a line-break is now encoded in the log record, if it
  wasn't a plain newline. Specifically, we distuingish "nul",
  "max-reached" and "eof", for line breaks due to NUL byte, due to the
  maximum line length as configured with LineMax= or due to end of
  stream. This data is stored in the new implicit _LINE_BREAK= field.
  It's not synthesized for plain \n line breaks.

- A randomized 128bit ID is assigned to each log stream.

With these three changes in place it's (mostly) possible to reconstruct
the original byte streams from log data, as (most) of the context of
the conversion from the byte stream to log records is saved now. (So,
the only bits we still drop are empty lines. Which might be something to
look into in a future change, and which is outside of the scope of this
work)

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=86465
See: systemd#4863
Replaces: systemd#4875

@evverx evverx removed the has-pr label Sep 15, 2017

poettering added a commit to poettering/systemd that referenced this issue Sep 21, 2017

journald: make maximum size of stream log lines configurable and bump…
… it to 48K

This adds a new setting LineMax= to journald.conf, and sets it by
default to 48K. When we convert stream-based stdout/stderr logging into
record-based log entries, read up to the specified amount of bytes
before forcing a line-break.

This also makes three related changes:

- When a NUL byte is read we'll not recognize this as alternative line
  break, instead of silently dropping everything after it. (see systemd#4863)

- The reason for a line-break is now encoded in the log record, if it
  wasn't a plain newline. Specifically, we distuingish "nul",
  "line-max" and "eof", for line breaks due to NUL byte, due to the
  maximum line length as configured with LineMax= or due to end of
  stream. This data is stored in the new implicit _LINE_BREAK= field.
  It's not synthesized for plain \n line breaks.

- A randomized 128bit ID is assigned to each log stream.

With these three changes in place it's (mostly) possible to reconstruct
the original byte streams from log data, as (most) of the context of
the conversion from the byte stream to log records is saved now. (So,
the only bits we still drop are empty lines. Which might be something to
look into in a future change, and which is outside of the scope of this
work)

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=86465
See: systemd#4863
Replaces: systemd#4875

keszybz added a commit that referenced this issue Sep 22, 2017

journald: make maximum size of stream log lines configurable and bump…
… it to 48K (#6838)

This adds a new setting LineMax= to journald.conf, and sets it by
default to 48K. When we convert stream-based stdout/stderr logging into
record-based log entries, read up to the specified amount of bytes
before forcing a line-break.

This also makes three related changes:

- When a NUL byte is read we'll not recognize this as alternative line
  break, instead of silently dropping everything after it. (see #4863)

- The reason for a line-break is now encoded in the log record, if it
  wasn't a plain newline. Specifically, we distuingish "nul",
  "line-max" and "eof", for line breaks due to NUL byte, due to the
  maximum line length as configured with LineMax= or due to end of
  stream. This data is stored in the new implicit _LINE_BREAK= field.
  It's not synthesized for plain \n line breaks.

- A randomized 128bit ID is assigned to each log stream.

With these three changes in place it's (mostly) possible to reconstruct
the original byte streams from log data, as (most) of the context of
the conversion from the byte stream to log records is saved now. (So,
the only bits we still drop are empty lines. Which might be something to
look into in a future change, and which is outside of the scope of this
work)

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=86465
See: #4863
Replaces: #4875
@poettering

This comment has been minimized.

Copy link
Member

poettering commented May 24, 2018

Since ec20fe5 we will no consider '\n', '\0' as well as line size limits and EOF all as fully equivalent EOL markers. This means the command proposed will now cleanly result in multiple log lines generated. Here's an example:

$ printf 'ab\0cd\nef' | systemd-cat --identifier drop-after-zero
$ journalctl -n 3
-- Logs begin at Thu 2016-08-18 23:09:37 CEST, end at Thu 2018-05-24 13:21:25 CEST. --
Mai 24 13:21:25 sigma drop-after-zero[51513]: ab
Mai 24 13:21:25 sigma drop-after-zero[51513]: cd
Mai 24 13:21:25 sigma drop-after-zero[51513]: ef

I am pretty sure this new behaviour makes a ton of sense, and no data is lost. In fact the structure log message contains enough information to even reconstruct whether the line was broken due to \n, \0, line limit or EOF through the _LINE_BREAK= property:

-- Logs begin at Thu 2016-08-18 23:09:37 CEST, end at Thu 2018-05-24 13:22:41 CEST. --
Thu 2018-05-24 13:22:41.713886 CEST [s=65e4e3725e08482e859073adf7e3ab40;i=559c3;b=47906844deef4cbe971b732ad7db2129;m=364e98d427;t=56cf1dea346de;x=179b7582b4966cdb]
    _UID=1000
    _GID=1000
    _SELINUX_CONTEXT=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
    _MACHINE_ID=c455a9847b7b47cb8a6d206e1a8d04c0
    _HOSTNAME=sigma
    _CAP_EFFECTIVE=0
    _TRANSPORT=stdout
    PRIORITY=6
    _BOOT_ID=47906844deef4cbe971b732ad7db2129
    SYSLOG_IDENTIFIER=drop-after-zero
    _LINE_BREAK=nul
    MESSAGE=ab
    _COMM=cat
    _STREAM_ID=91bb4a4b07c248cd98d8ef74f29f6219
    _PID=51541
Thu 2018-05-24 13:22:41.713886 CEST [s=65e4e3725e08482e859073adf7e3ab40;i=559c4;b=47906844deef4cbe971b732ad7db2129;m=364e98d427;t=56cf1dea346de;x=76f0cb041ca50029]
    _UID=1000
    _GID=1000
    _SELINUX_CONTEXT=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
    _MACHINE_ID=c455a9847b7b47cb8a6d206e1a8d04c0
    _HOSTNAME=sigma
    _CAP_EFFECTIVE=0
    _TRANSPORT=stdout
    PRIORITY=6
    _BOOT_ID=47906844deef4cbe971b732ad7db2129
    SYSLOG_IDENTIFIER=drop-after-zero
    _COMM=cat
    MESSAGE=cd
    _STREAM_ID=91bb4a4b07c248cd98d8ef74f29f6219
    _PID=51541
Thu 2018-05-24 13:22:41.727466 CEST [s=65e4e3725e08482e859073adf7e3ab40;i=559c5;b=47906844deef4cbe971b732ad7db2129;m=364e990933;t=56cf1dea37bea;x=baf41e133020694f]
    _UID=1000
    _GID=1000
    _SELINUX_CONTEXT=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
    _MACHINE_ID=c455a9847b7b47cb8a6d206e1a8d04c0
    _HOSTNAME=sigma
    _CAP_EFFECTIVE=0
    _TRANSPORT=stdout
    PRIORITY=6
    _BOOT_ID=47906844deef4cbe971b732ad7db2129
    _LINE_BREAK=eof
    SYSLOG_IDENTIFIER=drop-after-zero
    _COMM=cat
    MESSAGE=ef
    _STREAM_ID=91bb4a4b07c248cd98d8ef74f29f6219
    _PID=51541

@poettering poettering closed this May 24, 2018

@evverx

This comment has been minimized.

Copy link
Member Author

evverx commented May 24, 2018

It's not fixed. That's why I wrote in the description that I should update it and we discussed it in #6838 (comment).

@evverx evverx reopened this May 24, 2018

@poettering

This comment has been minimized.

Copy link
Member

poettering commented May 24, 2018

ah, indeed. ok, let's clarify on this issue hence, that the offending command is this:

printf '<13>Sep 15 15:07:58 HOST: null\0here' \| nc -w1 -u -U /run/systemd/journal/dev-log

And that indeed remains unfixed.

Sorry for the confusion

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Jun 6, 2018

Hmm, so while I agree that simply ignoring everything coming after a NUL byte in a BSD syslog message is definitely the wrong approach, I am not sure what the right approach though is. If we get an UDP or AF_UNIX datagram with an embedded NUL byte, does this means the datagram contains two log records and the NUL is the separator? or does it mean that NUL shall be considered part of the log message?

@evverx

This comment has been minimized.

Copy link
Member Author

evverx commented Jun 7, 2018

I think the latter. According to https://tools.ietf.org/html/rfc5424

This document does not impose any mandatory restrictions on the MSG
or PARAM-VALUE content. As such, they MAY contain control
characters, including the NUL character.

In some programming languages (most notably C and C++), the NUL
character (ABNF %d00) traditionally has a special significance as
string terminator. Most implementations of these languages assume
that a string will not extend beyond the first NUL character. This
is primarily a restriction of the supporting run-time libraries.
This restriction is often carried over to programs and script
languages written in those languages. As such, NUL characters must
be considered with great care and be properly handled. An attacker
may deliberately include NUL characters to hide information after
them. Incorrect handling of the NUL character may also invalidate
cryptographic checksums that are transmitted inside the message.

rsyslog seems to replace control characters while receiving messages by default, but it's configurable via other knobs like escape-cc, drop-cc and escape-cc.

keszybz added a commit to keszybz/systemd that referenced this issue Jun 8, 2018

journal: escape non-utf8 messages
$ printf '<13>Sep 15 15:07:58 HOST: null\0here' | nc -w1 -u -U /run/systemd/journal/dev-log
$ journalctl -n1 -o short-unix --no-hostname
1528452686.141537 HOST[18159]: null\000here

This mostly fixes systemd#4863.

The standard allows NUL-bytes and arbitrary crap to be embeded in messages. We forward
all the crap to syslog, but strip whitespace and escape invalid characters to generate
a printable message to store as the MESSAGE= field.

The common case of messages that don't need stripping or escaping is optimized.

@keszybz keszybz added the has-pr label Jun 8, 2018

keszybz added a commit to keszybz/systemd that referenced this issue Jun 20, 2018

journal: store the original input as _ORIGINAL_MESSAGE=
This allows the original stream to be recreated and/or verified.
The new field is written if any stripping or escaping was done or if
the input message contained embeded NULs.

$ printf '<13>Sep 15 15:07:58 HOST: x\0y' | nc -w1 -u -U /run/systemd/journal/dev-log

$ journalctl -o json-pretty ...
{
  ...
  "MESSAGE" : "x",
  "_ORIGINAL_MESSAGE" : [ 60, 49, 51, 62, 83, 101, 112, 32, 49, 53, 32, 49, 53, 58, 48, 55, 58, 53, 56, 32, 72, 79, 83, 84, 58, 32, 120, 0, 121 ]
}

$ journalctl -o export ... | cat -v
...
MESSAGE=x
_ORIGINAL_MESSAGE
^]^@^@^@^@^@^@^@<13>Sep 15 15:07:58 HOST: x^@y

This mostly fixes systemd#4863.

kyle-walker pushed a commit to kyle-walker/systemd that referenced this issue Nov 15, 2018

journald: make maximum size of stream log lines configurable and bump…
… it to 48K (systemd#6838)

This adds a new setting LineMax= to journald.conf, and sets it by
default to 48K. When we convert stream-based stdout/stderr logging into
record-based log entries, read up to the specified amount of bytes
before forcing a line-break.

This also makes three related changes:

- When a NUL byte is read we'll not recognize this as alternative line
  break, instead of silently dropping everything after it. (see systemd#4863)

- The reason for a line-break is now encoded in the log record, if it
  wasn't a plain newline. Specifically, we distuingish "nul",
  "line-max" and "eof", for line breaks due to NUL byte, due to the
  maximum line length as configured with LineMax= or due to end of
  stream. This data is stored in the new implicit _LINE_BREAK= field.
  It's not synthesized for plain \n line breaks.

- A randomized 128bit ID is assigned to each log stream.

With these three changes in place it's (mostly) possible to reconstruct
the original byte streams from log data, as (most) of the context of
the conversion from the byte stream to log records is saved now. (So,
the only bits we still drop are empty lines. Which might be something to
look into in a future change, and which is outside of the scope of this
work)

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=86465
See: systemd#4863
Replaces: systemd#4875

(cherry picked from commit ec20fe5)

Resolves: #1442262

[msekleta: I had to manually rewrite upstream commit, because git
did very poor job merging old and new code and identified a lot of merge
conflicts in a code that was totally unrelated.]
@evverx

This comment has been minimized.

Copy link
Member Author

evverx commented Nov 25, 2018

As shown in #10918, journald still uses string functions to handle binary messages and therefore loses everything after the first null byte. I guess the issue should be reopened.

@evverx evverx reopened this Nov 25, 2018

@evverx evverx removed the has-pr label Nov 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.