New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

journald compression bug fix #1662

Closed
wants to merge 2 commits into
base: master
from

Conversation

2 participants
@vcaputo
Member

vcaputo commented Oct 24, 2015

No description provided.

@vcaputo

This comment has been minimized.

Show comment
Hide comment
@vcaputo

vcaputo Oct 24, 2015

Member

additional context: coreos/bugs#322

Member

vcaputo commented Oct 24, 2015

additional context: coreos/bugs#322

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Oct 24, 2015

Member

Hmm, so, our general rule is to always return kernel-style negative errno errors from functions, and normal exits are indicated by a zero or positive return value. Hence, compress_blob() should really work the same here, hence it's OK the way the function is defined, but the user of it in journal_file_append_data() is borked, it should check for >= 0 really...

Now, looking at the code, it appears there are actually more bugs with the lz4 handling here:

  • There's no le64toh used when ORing in the compression mask into the object flags field
  • The whole if check checks for f->compress_xz, but should actually alternatively check for f->compress_lz4 too, here.

I am a bit puzzled this was not noticed earlier, I figure the only reason is that lz4 support was so far never on in any of the big distros.

@keszybz any idea on this?

Member

poettering commented Oct 24, 2015

Hmm, so, our general rule is to always return kernel-style negative errno errors from functions, and normal exits are indicated by a zero or positive return value. Hence, compress_blob() should really work the same here, hence it's OK the way the function is defined, but the user of it in journal_file_append_data() is borked, it should check for >= 0 really...

Now, looking at the code, it appears there are actually more bugs with the lz4 handling here:

  • There's no le64toh used when ORing in the compression mask into the object flags field
  • The whole if check checks for f->compress_xz, but should actually alternatively check for f->compress_lz4 too, here.

I am a bit puzzled this was not noticed earlier, I figure the only reason is that lz4 support was so far never on in any of the big distros.

@keszybz any idea on this?

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Oct 24, 2015

Member

Ah, the flags issue is not an issue, the flags field of objects is actually just one byte, ignore that...

Member

poettering commented Oct 24, 2015

Ah, the flags issue is not an issue, the flags field of objects is actually just one byte, ignore that...

@vcaputo

This comment has been minimized.

Show comment
Hide comment
@vcaputo

vcaputo Oct 24, 2015

Member

updated to reflect upstream preferences

Member

vcaputo commented Oct 24, 2015

updated to reflect upstream preferences

poettering added a commit to poettering/systemd that referenced this pull request Oct 24, 2015

journal: fix error handling when compressing journal objects
Let's make sure we handle compression errors properly, and don't
misunderstand an error for success.

Also, let's actually compress things if lz4 is enabled.

Fixes systemd#1662.
@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Oct 24, 2015

Member

A bummer, we worked on the same issue at the same time... See #1663...

Member

poettering commented Oct 24, 2015

A bummer, we worked on the same issue at the same time... See #1663...

@poettering poettering added the journal label Oct 24, 2015

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Oct 24, 2015

Member

I guess the patches in #1662 and #1663 are mostly the same now, except that #1663 also fixes the compress bool check...

Anyway, @keszybz would prefer a comment from you on the bool thing...

Member

poettering commented Oct 24, 2015

I guess the patches in #1662 and #1663 are mostly the same now, except that #1663 also fixes the compress bool check...

Anyway, @keszybz would prefer a comment from you on the bool thing...

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Oct 24, 2015

Member

Still puzzled why the xz can't compress 512 byte blocks though. @vcaputo any idea why that fails? Are you saying it fails for all 512 byte blocks? Or just for specific ones?

Or is this simply triggered by an uncompressable block, i.e. where the compressed result is larger than the input?

Member

poettering commented Oct 24, 2015

Still puzzled why the xz can't compress 512 byte blocks though. @vcaputo any idea why that fails? Are you saying it fails for all 512 byte blocks? Or just for specific ones?

Or is this simply triggered by an uncompressable block, i.e. where the compressed result is larger than the input?

Vito Caputo
journal: properly detect compress_blob() errors
The one and only caller in journal-file.c treats non-zero as success:

compression = compress_blob(data, size, o->data.payload, &rsize);

if (compression) {
        assert(rsize != 0);

        o->object.size = htole64(offsetof(Object, data.payload) + rsize);
        o->object.flags |= compression;

        log_debug("Compressed data object %"PRIu64" -> %zu using %s",
                  size, rsize, object_compressed_to_string(compression));
}

This fixes a bug observed when the src was 512 bytes in length.  The
resulting rsize would be 0 because XZ compression failed, but the code
continued as if successful because the non-zero XZ error return value
was percolating out of compress_blob().
@vcaputo

This comment has been minimized.

Show comment
Hide comment
@vcaputo

vcaputo Oct 24, 2015

Member

To address your earlier statements re: lz4, we're using xz in coreos, not lz4.

After poking at liblzma's lzma_stream_buffer_bound(), it appears to return 656 on an input size of 512, and COMPRESSION_SIZE_THRESHOLD is defined as 512.

So apparently we're observing this bug triggering @ 512 because of the value defined as COMPRESSION_SIZE_THRESHOLD, and perhaps that define should be changed to 656 (or something else) on HAVE_XZ.

Though with the bug fixed I guess it doesn't make much difference.

Member

vcaputo commented Oct 24, 2015

To address your earlier statements re: lz4, we're using xz in coreos, not lz4.

After poking at liblzma's lzma_stream_buffer_bound(), it appears to return 656 on an input size of 512, and COMPRESSION_SIZE_THRESHOLD is defined as 512.

So apparently we're observing this bug triggering @ 512 because of the value defined as COMPRESSION_SIZE_THRESHOLD, and perhaps that define should be changed to 656 (or something else) on HAVE_XZ.

Though with the bug fixed I guess it doesn't make much difference.

@vcaputo

This comment has been minimized.

Show comment
Hide comment
@vcaputo

vcaputo Oct 24, 2015

Member

@poettering out of curiosity, why the src_size - 1 for dest_len? @ https://github.com/systemd/systemd/blob/master/src/journal/compress.c#L80 isn't the payload allocated to accommodate src_size?

Member

vcaputo commented Oct 24, 2015

@poettering out of curiosity, why the src_size - 1 for dest_len? @ https://github.com/systemd/systemd/blob/master/src/journal/compress.c#L80 isn't the payload allocated to accommodate src_size?

@poettering

This comment has been minimized.

Show comment
Hide comment
@poettering

poettering Oct 24, 2015

Member

@vcaputo ah, if that's a block that got increased 512 →656 through compression, then yeah, it's one of those uncompressable ones, and compress_blob_xz() already returns ENOBUFS in such a case. So I think the logic in that place is working correctly.

The src_size - 1 as dest_len is because we want the call to generate an error if the result has the same size as the input or is larger. Hence we provide only enough space as the input size, minus 1 byte. That way we know it will generate an error when the compression doesn't have any beneficial effect, so that save the data uncompressed in that case.

Member

poettering commented Oct 24, 2015

@vcaputo ah, if that's a block that got increased 512 →656 through compression, then yeah, it's one of those uncompressable ones, and compress_blob_xz() already returns ENOBUFS in such a case. So I think the logic in that place is working correctly.

The src_size - 1 as dest_len is because we want the call to generate an error if the result has the same size as the input or is larger. Hence we provide only enough space as the input size, minus 1 byte. That way we know it will generate an error when the compression doesn't have any beneficial effect, so that save the data uncompressed in that case.

@vcaputo

This comment has been minimized.

Show comment
Hide comment
@vcaputo

vcaputo Oct 24, 2015

Member

Makes sense.

Member

vcaputo commented Oct 24, 2015

Makes sense.

lnykryn pushed a commit to lnykryn/systemd that referenced this pull request Mar 23, 2016

journal: fix error handling when compressing journal objects
Let's make sure we handle compression errors properly, and don't
misunderstand an error for success.

Also, let's actually compress things if lz4 is enabled.

Fixes systemd#1662.

Cherry-picked from: d1afbcd
Resolves: #1292447
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment