Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A fuzzer for journal-remote #9014

Merged
merged 31 commits into from May 31, 2018
Merged

Conversation

keszybz
Copy link
Member

@keszybz keszybz commented May 17, 2018

No description provided.

evverx
evverx previously requested changes May 20, 2018
Copy link
Member

@evverx evverx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fuzz-journal-remote seems to be failing under msan as soon as it starts:

$ sudo infra/helper.py run_fuzzer systemd fuzz-journal-remote
Running: docker run --rm -i --privileged -e FUZZING_ENGINE=libfuzzer -v /home/vagrant/oss-fuzz/build/out/systemd:/out -t gcr.io/oss-fuzz-base/base-runner run_fuzzer fuzz-journal-remote
Using seed corpus: fuzz-journal-remote_seed_corpus.zip
/out/fuzz-journal-remote -rss_limit_mb=2048 -timeout=25 /tmp/fuzz-journal-remote_corpus -max_len=65536 < /dev/null
INFO: Seed: 3380449479
INFO: Loaded 2 modules   (36336 inline 8-bit counters): 36139 [0x7ff36ea31d39, 0x7ff36ea3aa64), 197 [0x9998c8, 0x99998d),
INFO: Loaded 2 PC tables (36336 PCs): 36139 [0x7ff36ea3aa68,0x7ff36eac7d18), 197 [0x999990,0x99a5e0),
INFO:        2 files found in /tmp/fuzz-journal-remote_corpus
INFO: seed corpus: files: 2 min: 4657b max: 7790b total: 12447b rss: 97Mb
Uninitialized bytes in __interceptor_pwrite64 at offset 24 inside [0x7fffdd4d7230, 240)
==15==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x7ff36e685e8a in journal_file_init_header /work/build/../../src/systemd/src/journal/journal-file.c:436:13
    #1 0x7ff36e683a9d in journal_file_open /work/build/../../src/systemd/src/journal/journal-file.c:3333:21
    #2 0x7ff36e68b8f6 in journal_file_open_reliably /work/build/../../src/systemd/src/journal/journal-file.c:3520:13
    #3 0x4a3f35 in open_output /work/build/../../src/systemd/src/journal-remote/journal-remote.c:70:13
    #4 0x4a34d0 in journal_remote_get_writer /work/build/../../src/systemd/src/journal-remote/journal-remote.c:136:21
    #5 0x4a550f in get_source_for_fd /work/build/../../src/systemd/src/journal-remote/journal-remote.c:183:13
    #6 0x4a46bd in journal_remote_add_source /work/build/../../src/systemd/src/journal-remote/journal-remote.c:235:13
    #7 0x4a271c in LLVMFuzzerTestOneInput /work/build/../../src/systemd/src/fuzz/fuzz-journal-remote.c:36:9
    #8 0x4f27cc in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:524:13
    #9 0x4efa0b in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) /src/libfuzzer/FuzzerLoop.cpp:448:3
    #10 0x4f8e96 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:732:7
    #11 0x4f9f73 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:752:3
    #12 0x4bf329 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:756:6
    #13 0x4ac391 in main /src/libfuzzer/FuzzerMain.cpp:20:10
    #14 0x7ff36d14982f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #15 0x41f9d8 in _start (/out/fuzz-journal-remote+0x41f9d8)

  Uninitialized value was stored to memory at
    #0 0x7ff36e61cd41 in sd_id128_randomize /work/build/../../src/systemd/src/libsystemd/sd-id128/sd-id128.c:288:16
    #1 0x7ff36e685cec in journal_file_init_header /work/build/../../src/systemd/src/journal/journal-file.c:426:13
    #2 0x7ff36e683a9d in journal_file_open /work/build/../../src/systemd/src/journal/journal-file.c:3333:21
    #3 0x7ff36e68b8f6 in journal_file_open_reliably /work/build/../../src/systemd/src/journal/journal-file.c:3520:13
    #4 0x4a3f35 in open_output /work/build/../../src/systemd/src/journal-remote/journal-remote.c:70:13
    #5 0x4a34d0 in journal_remote_get_writer /work/build/../../src/systemd/src/journal-remote/journal-remote.c:136:21
    #6 0x4a550f in get_source_for_fd /work/build/../../src/systemd/src/journal-remote/journal-remote.c:183:13
    #7 0x4a46bd in journal_remote_add_source /work/build/../../src/systemd/src/journal-remote/journal-remote.c:235:13
    #8 0x4a271c in LLVMFuzzerTestOneInput /work/build/../../src/systemd/src/fuzz/fuzz-journal-remote.c:36:9
    #9 0x4f27cc in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:524:13
    #10 0x4efa0b in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) /src/libfuzzer/FuzzerLoop.cpp:448:3
    #11 0x4f8e96 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:732:7
    #12 0x4f9f73 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:752:3
    #13 0x4bf329 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:756:6
    #14 0x4ac391 in main /src/libfuzzer/FuzzerMain.cpp:20:10
    #15 0x7ff36d14982f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

  Uninitialized value was created by an allocation of 't' in the stack frame of function 'sd_id128_randomize'
    #0 0x7ff36e61cb00 in sd_id128_randomize /work/build/../../src/systemd/src/libsystemd/sd-id128/sd-id128.c:274

SUMMARY: MemorySanitizer: use-of-uninitialized-value /work/build/../../src/systemd/src/journal/journal-file.c:436:13 in journal_file_init_header
Exiting
MS: 0 ; base unit: 0000000000000000000000000000000000000000
artifact_prefix='./'; Test unit written to ./crash-847911777b3096783f4ee70a69ab6d28380c810b
[vagrant@localhost oss-fuzz]$ sudo infra/helper.py check_build --sanitizer=memory systemd
Running: docker run --rm -i --privileged -e FUZZING_ENGINE=libfuzzer -e SANITIZER=memory -v /home/vagrant/oss-fuzz/build/out/systemd:/out -t gcr.io/oss-fuzz-base/base-runner test_all
INFO: performing bad build checks for /out/fuzz-dhcp-server.
INFO: performing bad build checks for /out/fuzz-journal-remote.
INFO: performing bad build checks for /out/fuzz-unit-file.
INFO: performing bad build checks for /out/fuzz-dns-packet.
4 fuzzers total, 0 seem to be broken (0%).
Check build passed.

It's a false positive which is most likely caused by google/sanitizers#852. I think it could be got around by avoiding getrandom when the code is compiled with msan:

diff --git a/src/basic/random-util.c b/src/basic/random-util.c
index 0750083b8..03117c225 100644
--- a/src/basic/random-util.c
+++ b/src/basic/random-util.c
@@ -46,7 +46,7 @@ int acquire_random_bytes(void *p, size_t n, bool high_quality_required) {
          * for us. */

         /* Use the getrandom() syscall unless we know we don't have it. */
-        if (have_syscall != 0) {
+        if (have_syscall != 0 && !HAS_FEATURE_MEMORY_SANITIZER) {
                 r = getrandom(p, n, GRND_NONBLOCK);
                 if (r > 0) {
                         have_syscall = true;

@keszybz
Copy link
Member Author

keszybz commented May 20, 2018

@evverx I pushed your patch and some fixes for an issue in the json output code found by the fuzzer.

@evverx evverx dismissed their stale review May 20, 2018 22:23

The issue I pointed out has been fixed.

assert_se(r >= 0);

r = sd_journal_seek_head(j);
assert_se(r >= 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In http://llvm.org/docs/LibFuzzer.html#fuzz-target It is recommended that logging should be avoided. I'm wondering if it's possible to make show_journal skip calling fprintf if it's run in the "fuzzer" mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Half of this fuzzer is the journal output code, and writing the stuff out is an integral part of it. Disabling output would be possible, but it would reduce the coverage a lot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the coverage too, but currently can neither confirm nor deny anything because of google/oss-fuzz#1426. Should something go wrong, it probably won't be hard to fix, so leaving everything as is makes sense to me. Thank you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime, I reconsidered, and pushed another patch on top to write the logs to /dev/null. I think all our output code ignored write failures, so this should have the same effect and writing to stdout. I tested that the one failure that was found is still reproducible with /dev/null.

Unfortunately there seems to be some leak of fds. I need to work on this a bit more.

@keszybz keszybz added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label May 21, 2018
@@ -451,7 +451,10 @@ static int journal_file_refresh_header(JournalFile *f) {
assert(f->header);

r = sd_id128_get_machine(&f->header->machine_id);
if (r < 0)
if (IN_SET(r, -ENOENT, -ENOMEDIUM))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker image that oss-fuzz uses has an empty /etc/machine-id.

I'm not sure it's a docker issue. An empty machine-id seems to come from the base ubuntu image and it's totally fine according to https://www.freedesktop.org/software/systemd/man/machine-id.html#Initialization:

For operating system images which are created once and used on multiple machines, for example for containers or in the cloud, /etc/machine-id should be an empty file in the generic file system image.

Given that all oss-fuzz images are based on Ubuntu, where systemd is used by default, I would expect machine-id to be initialized. systemd itself is never run there and therefore it has no chance to do what it's supposed to do, so I think the right place would be https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-image/Dockerfile, though, I may be wrong.

@inferno-chromium, @Dor1s sorry to bother you, but would it be possible to run systemd-machine-id-setup or something like echo 760cd9963f3a45d8858f9daae65426cf >/etc/machine-id (to make the environment reproducible) while building base-image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it not the responsibility of docker itself to initialize the id.

But no matter who should initialize it, I still think that not requiring it to be present in journal code is the right thing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evverx, that sounds fine to me. CC @oliverchang should we add it to all base images?

should it be a random value every time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dor1s although using the same id would make the environment more predictable, it seems to be artificial. I think generating a new id with systemd-machine-id-setup would reflect reality better.

But no matter who should initialize it, I still think that not requiring it to be present in journal code is the right thing.

I vaguely remember there were discussions about journald and machine-id where @poettering said that it's expected that journald fails if something is wrong with machine-id, but I haven't been able to find them yet.

I think I'm more concerned about the commit where the return value of sd_id128_get_machine has been changed. It's a public interface and there might be software out there depending on it. Although EINVAL doesn't seem to be the best choice, it's the only way to distinguish an empty machine-id, which is normal, from an id consisting of only zeros, which should never happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it comes down to what the program trying to use /etc/machine-id is supposed to do. It doesn't really matter if the file is missing, empty, or all zeros, the handling should be the same. -EINVAL means a programming error, so we should provide codes for those other conditions that can be distinguished from it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, i figure we should sooner or later support a machine-id-less mode altogether btw, in the interest of "fully anonymous" systems... ChromeOS has a concept like that, and I figure it makes some sense for ultra-privacy-focussed systems.

(I used to be a staunch supporter of requiring machine ID strictly, but I changed my mind on this).

Hence yes, I think @keszybz makes a ton of sense

@keszybz keszybz removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label May 21, 2018
assert_se(r >= 0);

sd_journal_close(j);
unlink(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use that new cleanup function you added for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I added it when working on this commit. But then this commit evolved to use mkostemps (because the journal code requires the .journal suffix), which is not supported by my cleanup function.

@@ -451,7 +451,10 @@ static int journal_file_refresh_header(JournalFile *f) {
assert(f->header);

r = sd_id128_get_machine(&f->header->machine_id);
if (r < 0)
if (IN_SET(r, -ENOENT, -ENOMEDIUM))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, i figure we should sooner or later support a machine-id-less mode altogether btw, in the interest of "fully anonymous" systems... ChromeOS has a concept like that, and I figure it makes some sense for ultra-privacy-focussed systems.

(I used to be a staunch supporter of requiring machine ID strictly, but I changed my mind on this).

Hence yes, I think @keszybz makes a ton of sense

size_t i;
const char *t = s;

assert(len > 4 + 4 + 1); /* two chars and the terminator */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>= is good enough, no?

@@ -347,6 +348,16 @@ int journal_importer_process_data(JournalImporter *imp) {
/* chomp newline */
n--;

if (!journal_field_valid(line, sep - line, true)) {
char buf[64], *t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far we never did such arbitrarily sized arrays, but instead used some macro or so. Can we do that here too? i.e. add FIELD_ELLIPSIZE_LIMIT or so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is completely arbitrary unfortunately. But yeah, I should add some define for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that this matters too much, but this line still has no comment btw... The one at the top of process_special_field() does however... Given that this already appears twice here I wonder if adding a (local) macro for this might not be worth it after all...

@@ -47,8 +48,10 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
r = sd_journal_open_files(&j, (const char**) STRV_MAKE(name), 0);
assert_se(r >= 0);

assert_se(dev_null = fopen("/dev/null", "w"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"we"? (out of principle, not for any specific reason beyond that)

if (t > USEC_TIMESTAMP_FORMATTABLE_MAX)
return NULL;
return "--- ✗✗✗✗-✗✗-✗✗ ✗✗:✗✗:✗✗";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so unconditional unicode (i.e. not bound to is_unicode_locale()) is probably something we should avoid...

Maybe just use regular ASCII "9"s here after all, the unicode stuff doesn't get us much I figure... i.e. --- XXXX-XX-XX XX:XX:XX

I am not too fond of returning the static string here directly, some code might expect the buffer passed in to be filled in, and quite frankly for a good reason. I am pretty sure we should fill in the buffer unconditionally, and return that and leave the return value as non-const ptr...

Also, maybe it would be better to make this bvehaviour opt-out, i.e. let's replace the two existing bool params by a new flags param, and then add a new flag: "FORMAT_TIMESTAMP_FAIL_NON_DISPLAYABLE" or so, which would get the old behaviour back of returning NULL on failure. (And yeah, format_timestamp_internal() should probably be made public then under a new name, maybe format_timestamp_full() or so. And maybe we can then drop some of the less used flavours of the wrappers and just make the callers use format_timestamp_full() directly

@keszybz keszybz added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels May 24, 2018
@keszybz
Copy link
Member Author

keszybz commented May 24, 2018

Updated. All issues were addressed, modulo the comments below.

so far we never did such arbitrarily sized arrays, but instead used some macro or so. Can we do that here too? i.e. add FIELD_ELLIPSIZE_LIMIT or so?

In the end I just added a comment. This is only used in one place, so a define would move the definition away from use without additional benefit. If this shows up in other places, we can add the define then.

I fixed a memleak (precisely speaking, the leak of an mmap of the memfd), and squashed two commits that were fixes to previous commits in the series with those commits, and fixed some more issues in the display code which were found in the meantime (the last two commits are new).

Also, maybe it would be better to make this bvehaviour opt-out, i.e. let's replace the two existing bool params by a new flags param, and then add a new flag: "FORMAT_TIMESTAMP_FAIL_NON_DISPLAYABLE"

... Pfff, maybe, dunno. I would rather not do this in this PR.

if (t > USEC_TIMESTAMP_FORMATTABLE_MAX)
return NULL;
if (t > USEC_TIMESTAMP_FORMATTABLE_MAX) {
assert(l >= strlen("--- XXXX-XX-XX XX:XX:XX"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STRLEN() rather than strlen() please.

also this is off by one, no space for the trailing NUL is included

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-by-one fixed.

STRLEN() rather than strlen() please

The compilers (gcc and clang both) are generally able to optimize strlen away to a constant, even at -O0. The reason we have STRLEN is that they still created a VLA despite optimizing the size to a constant. So we should use STRLEN in variable declarations, but otherwise plain strlen is completely fine.

@keszybz keszybz force-pushed the fuzz-journal-remote branch 2 times, most recently from 31ee78f to f1d36d2 Compare May 27, 2018 17:15
@keszybz
Copy link
Member Author

keszybz commented May 27, 2018

Pushed again, with a rebase (needed because of with-unit that was merged recently), and with 3 commits for #9090.

@yuwata
Copy link
Member

yuwata commented May 29, 2018

Ugh, conflicts again. Please rebase this.

@keszybz
Copy link
Member Author

keszybz commented May 29, 2018

Rebased.

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, did you see my >= comment earlier?

@@ -157,6 +157,8 @@ bool string_has_cc(const char *p, const char *ok) _pure_;

char *ellipsize_mem(const char *s, size_t old_length_bytes, size_t new_length_columns, unsigned percent);
char *ellipsize(const char *s, size_t length, unsigned percent);
char *cellescape_impl(char *buf, size_t len, const char *s);
#define cellescape(buf, s) cellescape_impl(buf, ELEMENTSOF(buf), s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh, sorry for not noticing this eralier, but this one is too magic I think. I mean, it might be completely OK to allocate some buffer with new() or so and then pass it to cellescape, and this would fail horribly in that case... I'd avoid such magic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still really don't like this bit tbh

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be completely OK to allocate some buffer with new() or so and then pass it to cellescape

This will fail at compilation time, because ELEMENTSOF is smart enough to catch this. But OK, I'll rework this to use a define for the length and remove the macro.

* the header, hence let's suppress it here */
if (length >= 9 &&
memcmp(data, "_BOOT_ID=", 9) == 0)
if (length >= 9 && memcmp(data, "_BOOT_ID=", 9) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be time to add some macro for this actually, "memory_startswith()" or so, or maybe "startswith_n()"?

static inline void *memory_startswith(const void *p, size_t sz, const char *token) {
        size_t n;
        n = strlen(token);
        if (sz < n)
                return NULL;
        if (memcmp(p, token, n) == 0)
                return (uint8_t*) p + n;
        return NULL;
}

or so? (but it's definitely material for a later PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I am working on adding a helper for this now, btw, for an independent PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i posted that pr as #9131 btw

This is in preparation to reusing the RemoteServer in other concepts.
I tried to keep changes to minimum:
- arg_* global variables are now passed as state in RemoteServer
- exported functions get the "journal_remote_" prefix
- some variables are renamed

In particular, there is an ugly global RemoveServer* variable. It was originally
added because µhttpd did not allow state to be passed to the callbacks. I'm not
sure if this has been remediated in µhttpd, but either way, this is not changed
here, the global variable is only renamed for clarity.
Also remove "b''" from the generated MESSAGE= field.
@keszybz
Copy link
Member Author

keszybz commented May 31, 2018

Rebased on top of #9131. I squashed the last two commits into one.

@@ -347,6 +348,16 @@ int journal_importer_process_data(JournalImporter *imp) {
/* chomp newline */
n--;

if (!journal_field_valid(line, sep - line, true)) {
char buf[64], *t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that this matters too much, but this line still has no comment btw... The one at the top of process_special_field() does however... Given that this already appears twice here I wonder if adding a (local) macro for this might not be worth it after all...

@@ -157,6 +157,8 @@ bool string_has_cc(const char *p, const char *ok) _pure_;

char *ellipsize_mem(const char *s, size_t old_length_bytes, size_t new_length_columns, unsigned percent);
char *ellipsize(const char *s, size_t length, unsigned percent);
char *cellescape_impl(char *buf, size_t len, const char *s);
#define cellescape(buf, s) cellescape_impl(buf, ELEMENTSOF(buf), s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still really don't like this bit tbh

@poettering
Copy link
Member

PR looks fine, but the ELEMENTSOF in the macro definition of cellescape() is something i really don't like, it welcomes bugs...

@poettering
Copy link
Member

(btw, did you see my last review?)

keszybz and others added 17 commits May 31, 2018 14:27
… fields

It's not supposed to be the most efficient, but instead fast and simple to use.

I kept the logic in ellipsize_mem() to use unicode ellipsis even in non-unicode
locales. I'm not quite convinced things should be this way, especially that with
this patch it'd actually be simpler to always use "…" in unicode locale and "..."
otherwise, but Lennart wanted it this way for some reason.
We shouldn't just log arbitrary stuff, in particular newlines and control chars
Now:
Unknown dunder line __CURSORFACILITY=6\nSYSLOG_IDENTIFIER=/USR/SBIN/CRON\nMES…, ignoring.
Unknown dunder line __REALTIME_TIME[TAMP=1404101101501874\n__MONOTONIC_TIMEST…, ignoring.
`fuzz-journal-remote` seems to be failing under `msan` as soon as it starts:

$ sudo infra/helper.py run_fuzzer systemd fuzz-journal-remote
Running: docker run --rm -i --privileged -e FUZZING_ENGINE=libfuzzer -v /home/vagrant/oss-fuzz/build/out/systemd:/out -t gcr.io/oss-fuzz-base/base-runner run_fuzzer fuzz-journal-remote
Using seed corpus: fuzz-journal-remote_seed_corpus.zip
/out/fuzz-journal-remote -rss_limit_mb=2048 -timeout=25 /tmp/fuzz-journal-remote_corpus -max_len=65536 < /dev/null
INFO: Seed: 3380449479
INFO: Loaded 2 modules   (36336 inline 8-bit counters): 36139 [0x7ff36ea31d39, 0x7ff36ea3aa64), 197 [0x9998c8, 0x99998d),
INFO: Loaded 2 PC tables (36336 PCs): 36139 [0x7ff36ea3aa68,0x7ff36eac7d18), 197 [0x999990,0x99a5e0),
INFO:        2 files found in /tmp/fuzz-journal-remote_corpus
INFO: seed corpus: files: 2 min: 4657b max: 7790b total: 12447b rss: 97Mb
Uninitialized bytes in __interceptor_pwrite64 at offset 24 inside [0x7fffdd4d7230, 240)
==15==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x7ff36e685e8a in journal_file_init_header /work/build/../../src/systemd/src/journal/journal-file.c:436:13
    #1 0x7ff36e683a9d in journal_file_open /work/build/../../src/systemd/src/journal/journal-file.c:3333:21
    #2 0x7ff36e68b8f6 in journal_file_open_reliably /work/build/../../src/systemd/src/journal/journal-file.c:3520:13
    #3 0x4a3f35 in open_output /work/build/../../src/systemd/src/journal-remote/journal-remote.c:70:13
    #4 0x4a34d0 in journal_remote_get_writer /work/build/../../src/systemd/src/journal-remote/journal-remote.c:136:21
    #5 0x4a550f in get_source_for_fd /work/build/../../src/systemd/src/journal-remote/journal-remote.c:183:13
    #6 0x4a46bd in journal_remote_add_source /work/build/../../src/systemd/src/journal-remote/journal-remote.c:235:13
    #7 0x4a271c in LLVMFuzzerTestOneInput /work/build/../../src/systemd/src/fuzz/fuzz-journal-remote.c:36:9
    #8 0x4f27cc in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:524:13
    #9 0x4efa0b in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) /src/libfuzzer/FuzzerLoop.cpp:448:3
    #10 0x4f8e96 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:732:7
    #11 0x4f9f73 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:752:3
    #12 0x4bf329 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:756:6
    #13 0x4ac391 in main /src/libfuzzer/FuzzerMain.cpp:20:10
    #14 0x7ff36d14982f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #15 0x41f9d8 in _start (/out/fuzz-journal-remote+0x41f9d8)

  Uninitialized value was stored to memory at
    #0 0x7ff36e61cd41 in sd_id128_randomize /work/build/../../src/systemd/src/libsystemd/sd-id128/sd-id128.c:288:16
    #1 0x7ff36e685cec in journal_file_init_header /work/build/../../src/systemd/src/journal/journal-file.c:426:13
    #2 0x7ff36e683a9d in journal_file_open /work/build/../../src/systemd/src/journal/journal-file.c:3333:21
    #3 0x7ff36e68b8f6 in journal_file_open_reliably /work/build/../../src/systemd/src/journal/journal-file.c:3520:13
    #4 0x4a3f35 in open_output /work/build/../../src/systemd/src/journal-remote/journal-remote.c:70:13
    #5 0x4a34d0 in journal_remote_get_writer /work/build/../../src/systemd/src/journal-remote/journal-remote.c:136:21
    #6 0x4a550f in get_source_for_fd /work/build/../../src/systemd/src/journal-remote/journal-remote.c:183:13
    #7 0x4a46bd in journal_remote_add_source /work/build/../../src/systemd/src/journal-remote/journal-remote.c:235:13
    #8 0x4a271c in LLVMFuzzerTestOneInput /work/build/../../src/systemd/src/fuzz/fuzz-journal-remote.c:36:9
    #9 0x4f27cc in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:524:13
    #10 0x4efa0b in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) /src/libfuzzer/FuzzerLoop.cpp:448:3
    #11 0x4f8e96 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:732:7
    #12 0x4f9f73 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:752:3
    #13 0x4bf329 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:756:6
    #14 0x4ac391 in main /src/libfuzzer/FuzzerMain.cpp:20:10
    #15 0x7ff36d14982f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

  Uninitialized value was created by an allocation of 't' in the stack frame of function 'sd_id128_randomize'
    #0 0x7ff36e61cb00 in sd_id128_randomize /work/build/../../src/systemd/src/libsystemd/sd-id128/sd-id128.c:274

SUMMARY: MemorySanitizer: use-of-uninitialized-value /work/build/../../src/systemd/src/journal/journal-file.c:436:13 in journal_file_init_header
Exiting
MS: 0 ; base unit: 0000000000000000000000000000000000000000
artifact_prefix='./'; Test unit written to ./crash-847911777b3096783f4ee70a69ab6d28380c810b
[vagrant@localhost oss-fuzz]$ sudo infra/helper.py check_build --sanitizer=memory systemd
Running: docker run --rm -i --privileged -e FUZZING_ENGINE=libfuzzer -e SANITIZER=memory -v /home/vagrant/oss-fuzz/build/out/systemd:/out -t gcr.io/oss-fuzz-base/base-runner test_all
INFO: performing bad build checks for /out/fuzz-dhcp-server.
INFO: performing bad build checks for /out/fuzz-journal-remote.
INFO: performing bad build checks for /out/fuzz-unit-file.
INFO: performing bad build checks for /out/fuzz-dns-packet.
4 fuzzers total, 0 seem to be broken (0%).
Check build passed.

It's a false positive which is most likely caused by
google/sanitizers#852. I think it could be got around
by avoiding `getrandom` when the code is compiled with `msan`
…nd string operations

We'd look for a '=' separator using memchr, i.e. ignoring any nul bytes in the
string, but then do a strndup, which would terminate on any nul byte, and then
again do a memcmp, which would access memory past the chunk allocated by strndup.

Of course, we probably shouldn't allow keys with nul bytes in them. But we
currently do, so there might be journal files like that out there. So let's fix
the journal-reading code first.
…ject

$ build-asan/fuzz-journal-remote test/fuzz-regressions/fuzz-journal-remote/crash-96dee870ea66d03e89ac321eee28ea63a9b9aa45
...
Ignoring invalid field: "S\020"
Ignoring invalid field: "S\020"
...

If the field name includes nul bytes, we won't print all of the name.
But that seems enough of a corner case to ignore.
…ported

The parser never accepted "__"-prefixed fields in binary format, but there was
a comment questioning this decision. Let's make it official, and remove the
comment.

Also, for clarity, let's move the dunder field parsing after the field
verification check. This doesn't change much, because invalid fields cannot be
known special fields, but is seems cleaner to first verify the validity of the
name, and then check if it is one of the known ones.
This makes the fuzzing much more efficient. Optionally provide output is
$SYSTEMD_FUZZ_OUTPUT is set, which makes debugging of any failures much easier.

The case from 056129d is still detected properly.
If the timestamp is above 9999-12-30, (or 2038-something-something on 32 bit),
use XXXX-XX-XX XX:XX:XX as the replacement.

The problem with refusing to print timestamps is that our code accepts such
timestamps, so we can't really just refuse to process them afterwards. Also, it
makes journal files non-portable, because suddently we might completely refuse
to print entries which are totally OK on a different machine.
The journal verification functions would reject such an entry. It would probably
still display fine (because we prefer _SOURCE_REALTIME_TIMESTAMP= if present), but
it seems wrong to create an entry that would not pass verification.
…ESTAMP entry

journalctl -o short would display those entries, but journalctl -o short-full
would refuse. If the entry is bad, just fall back to the receive-side realtime
timestamp like we would if it was completely missing.
In this commit, this is done only in testing code, i.e. there is
no functional change apart from tests.
…ng entries

The boot id is stored twice, and different code paths use either one or the
other. So we need to store it both in the header and as a field for full
compatibility.
Also remove the comma from the comment everywhere, I think the comma
unnecessarilly put emphasis on the clause after the comma.

Fixes systemd#9090.

Reproducer:
systemd-journal-remote --split-mode=none -o /tmp/msg6.journal --trust=all --listen-http=8080
systemd-journal-upload -u http://localhost:8080
journalctl --file /tmp/msg6.journal -o verbose -n1
@keszybz
Copy link
Member Author

keszybz commented May 31, 2018

Updated with the requested changes.

@poettering poettering merged commit 89544ae into systemd:master May 31, 2018
@keszybz keszybz deleted the fuzz-journal-remote branch June 2, 2018 09:30
evverx added a commit to evverx/systemd that referenced this pull request Sep 29, 2018
…Fuzz project

The containers come with an empty machine-id, which causes the fuzzer
to fail as soon as it starts.

See systemd#9014 (comment)
@evverx evverx added the fuzzing Implementation of fuzzers and fixes for stuff found through fuzzing label Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fuzzing Implementation of fuzzers and fixes for stuff found through fuzzing journal journal-remote tests
Development

Successfully merging this pull request may close these issues.

None yet

5 participants