Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some logging issues #8258

Merged
merged 10 commits into from
Feb 23, 2018
Merged

Fix some logging issues #8258

merged 10 commits into from
Feb 23, 2018

Conversation

keszybz
Copy link
Member

@keszybz keszybz commented Feb 22, 2018

No description provided.

This reverts commit a7419db.

_All_ changes in that commit were wrong.

Fixes systemd#8211.
The buffers are fixed size, so the message may not fit, but we don't
particularly care.
It makes the code easier to read, because it's obvious that the function
cannot be called from elsewhere.
@keszybz
Copy link
Member Author

keszybz commented Feb 23, 2018

[   31.063002] general protection fault: 0000 [#1] SMP 
[   31.064297] Modules linked in: btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse pata_acpi floppy
[   31.064297] CPU: 0 PID: 419 Comm: dbus-daemon Not tainted 4.4.0-116-generic #140-Ubuntu
[   31.064297] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
../src/udev/udev-builtin-net_id.c: In function 'dev_pci_slot':
../src/udev/udev-builtin-net_id.c:297:47: error: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
                 snprintf(str, sizeof str, "%s/%s/address", slots, dent->d_name);
                                               ^~

So, gcc-8 now refuses to compile our code. OK, this needs fixing.

@keszybz keszybz added the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Feb 23, 2018
log_dispatch_internal has only one caller where the extra_field/extra
params are not null: log_unit_full. When log_unit_full() was called,
when we got to log_dispatch_internal, our header would look like this:
PRIORITY=7
SYSLOG_FACILITY=3
CODE_FILE=../src/core/manager.c
CODE_LINE=2145
CODE_FUNC=manager_invoke_sigchld_event
USER_UNIT=gnome-terminal-server.service
65dffa7a3b984a6d9a46f0b8fb57710bUSER_INVOCATION_ID=
SYSLOG_IDENTIFIER=systemd

It took me a while to understand why I'm not seeing mangled messages in the
journal (after all, "" is a valid rvalue for log messages). The answer is that
journald rejects any field name which starts with a digit, and the MESSAGE_ID
that was used here starts with a digit. Hence, those lines would be silently
filtered out.
If log_do_header() was called with overly long parameters, it'd generate
improper output. Essentially, it'd be truncated at random point, in particular
missing a newline at the end, so it'd run with the next field, usually MESSAGE=.

log_do_header is called with parameters from compiled code (file name, lien
nubmer, etc), so in practice this was unlikely to ever be a problem, but it is
possible. In particular, if systemd was compiled from sources in some deeply
nested directory (which happens for example in mock and other build roots), the
filename could be very long.

As a safety measure, let's truncate all parameters to 256 bytes. So we have
5 fields which are 256 bytes (plus the field name prefix), and a few other
fields with fixed width. This must always fit in the 2048 byte buffer.
I don't think there's much gain in calculating the required length precisely,
since it's a lot of fields and a few bytes allocated on the stack don't matter.
The logs don't contain any additional information.
This fixes a minor memleak of 'address' if the file could not be read.
gcc-8 throws an error if it knows snprintf might truncate output and the
return value is ignored:
../src/udev/udev-builtin-net_id.c: In function 'dev_pci_slot':
../src/udev/udev-builtin-net_id.c:297:47: error: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
                 snprintf(str, sizeof str, "%s/%s/address", slots, dent->d_name);
                                               ^~
../src/udev/udev-builtin-net_id.c:297:17: note: 'snprintf' output between 10 and 4360 bytes into a destination of size 4096
                 snprintf(str, sizeof str, "%s/%s/address", slots, dent->d_name);
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors

Let's check all return values. This actually makes the code better, because there's
no point in trying to open a file when the name has been truncated, etc.
@keszybz
Copy link
Member Author

keszybz commented Feb 23, 2018

Updated.

@keszybz keszybz removed the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Feb 23, 2018
Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, but the assert() thing should be fixed I figure. I mean it's not a new bug as we already use asserts from within logging like this, but I figure we should not add one more case of this

src/basic/log.c Outdated
isempty(extra) ? "" : extra,
isempty(extra) ? "" : "\n",
program_invocation_short_name);
assert((size_t) r < size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I am realizing now that doing asserts in our logging code like this is a pretty bad idea: this means we'll create an endless recursion here if actually any of the asserts fail. That's because we redefine assert() to map to our own logging functions, and that means if the assert fails we'll call ourselves, which when the first iteration failed is likely to fail too...

I think we need to avoid any kind of assert() in any code that is invoked from log_dispatch() and friends... Yuck.

log_open();
#define X10(x) x x x x x x x x x x
#define X100(x) X10(X10(x))
#define X1000(x) X100(X10(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wowza!

Then it can be used in the asserts in logging functions without causing
infinite recursion. The error is just printed to stderr, it should be
good enough for the common case.
@keszybz
Copy link
Member Author

keszybz commented Feb 23, 2018

Updated with a commit to just do fputs(stderr) + abort() instead of calling the normal assert function.

@poettering poettering merged commit 15eac52 into systemd:master Feb 23, 2018
@keszybz keszybz deleted the log-issues branch February 24, 2018 08:11
@@ -1699,7 +1699,7 @@ static void unit_status_log_starting_stopping_reloading(Unit *u, JobType t) {
format = unit_get_status_message_format(u, t);

DISABLE_WARNING_FORMAT_NONLITERAL;
xsprintf(buf, format, unit_description(u));
snprintf(buf, sizeof buf, format, unit_description(u));
Copy link
Member

@evverx evverx Feb 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keszybz, thank you. I think it would make sense to put (void) in front of sprintf to prevent xsprintf.cocci from patching everything up again. It would also be great not to rely entirely on the thing that is easy to forget about, but @poettering doesn't seem to like #8210 very much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be good. I changed most of the other places to snprintf_ok, but this one needs a (void).

/* match slot address with device by stripping the function */
if (strneq(address, udev_device_get_sysname(names->pcidev), strlen(address)))
if (streq(address, udev_device_get_sysname(names->pcidev)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes network devices naming in Qemu from ens3 to enp0s3 (going from systemd 237 to 238). It seems the new behaviour is correct but was it still intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was reverted in 8eebb6a. That commit is in v238-stable btw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants