New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib: Fix gzip member header/input buffer boundary issue #5883

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@addaleax
Member

addaleax commented Mar 24, 2016

Pull Request check-list

  • Does make -j8 test (UNIX) or vcbuild test nosign (Windows) pass with
    this change (including linting)?
  • Is the commit message formatted according to CONTRIBUTING.md?
  • If this change fixes a bug (or a performance problem), is a regression
    test (or a benchmark) included?

Affected core subsystem(s)

zlib

Description of change

Make sure that, even if an inflate() call only sees the first few bytes of a following gzip member, all members are decompressed and part of the full output.

This change also modifies behaviour for trailing garbage: If there is trailing garbage which happens to start with the gzip magic bytes, it is no longer discarded but rather throws an error, since we cannot reliably tell random garbage from a valid gzip member anyway and have to try and decompress it.
(Null byte padding is not affected, since it has been pointed out at various occasions that such padding is normal and discarded by gzip(1), too.)

The problem resolved by this is mentioned here, but definitely not as likely to occur in the wild, since it does not affect the decompression of gzip files containing only a single member.

/cc @kthelgason

@addaleax addaleax changed the title from Fix gzip member header/input buffer boundary issue to zlib: Fix gzip member header/input buffer boundary issue Mar 24, 2016

@jasnell jasnell added the zlib label Mar 24, 2016

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Mar 24, 2016

Mostly LGTM but I think this is a semver-major change? Can you remove the (now unused) GZIP_ defines from src/node_zlib.cc?

@addaleax

This comment has been minimized.

Member

addaleax commented Mar 24, 2016

@bnoordhuis GZIP_MIN_HEADER_SIZE can still be removed, though, you’re right.

And regarding semver: Not sure – If so, then #5120 was semver-major in the first place. The current (both master and 5.9.x) behaviour is to throw an error upon encountering trailing garbage only if that starts with 1f 8b, which can be pretty unreliable I imagine (especially if those garbage bytes are actually random).

So, yes, possibly reverting back to the pre-5.9 behaviour of silently discarding trailing garbage in all cases is a good idea for 5.x, and implementing that should certainly be feasible. But I completely agree with your comment here, throwing an error should be the right behaviour.

@addaleax addaleax force-pushed the addaleax:fix-gzip-member-buffer-boundary branch to c8755e7 Mar 24, 2016

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Mar 24, 2016

Gray area, I suppose. Maybe one or two people from @nodejs/ctc can weigh in?

Change itself LGTM. CI: https://ci.nodejs.org/job/node-test-pull-request/2052/

@addaleax

This comment has been minimized.

Member

addaleax commented Mar 24, 2016

For future reference and to the best of my knowledge:

.gz file content v5.8 v5.9 / master after this gunzip(1)
2 members first member both members both members both members
1 member + garbage member member throws member³
1 member + 00 + garbage¹ member member member member³
1 member + 1f 8b + garbage² member throws throws member³

¹ never actually inspected by any Node.js version, assumed to be null byte padding
² at least 10 bytes
³ plus a warning on stderr and non-zero exit code

And for the record, I don’t think trailing garbage is much of a real-world problem, but I’m not sure how relevant that fact is.

@addaleax addaleax referenced this pull request Mar 24, 2016

Closed

zlib: Allow zlib.gunzip features with zlib.unzip #5884

3 of 3 tasks complete
@@ -43,7 +43,6 @@ enum node_zlib_mode {
#define GZIP_HEADER_ID1 0x1f
#define GZIP_HEADER_ID2 0x8b

This comment has been minimized.

@kthelgason

kthelgason Mar 24, 2016

Contributor

These defines are unused now right?

This comment has been minimized.

@addaleax

addaleax Mar 24, 2016

Member

Yep, but #5884 would start using them again, so I’d leave them in unless it gets decided that that should not get landed.

@kthelgason

This comment has been minimized.

Contributor

kthelgason commented Mar 24, 2016

It's hard to debate the relevance of trailing garbage without any idea of prevalence in the wild. I'm on the fence regarding the behaviour of the 1 member + garbage case, I can see the argument for both sides.

Regarding semver-major, #5120 pretty much only changed in a breaking way the behaviour of the last case in your table, and that was unintended, so it looked like a minor version bump to me at least. This PR is now being explicit about throwing an error where previously there was none, so I think a better case can be made for this landing in a major version.

EDIT: I was thinking of the case here #5863. @addaleax you're absolutely right regarding semver-major on this change.

@addaleax

This comment has been minimized.

Member

addaleax commented Mar 24, 2016

Yep, I can definitely see why one would count this as semver-major, and personally I don’t think I have a strong opionion on that. But keep in mind that this PR still mostly contains a bugfix: For the case that a new member’s header wraps around the input buffer boundary. So even if this does not land in some 5.x version, I’d still argue that a modified patch (which keeps the current trailing-garbage behaviour or restores the old one) should.

@kthelgason

This comment has been minimized.

Contributor

kthelgason commented Mar 24, 2016

Yes, I agree 100%.

There is still the issue of the boundary between members falling exactly on the input buffer boundary, so we haven't really gotten rid of the problem, only reduced the likelihood of it's occurrence (significantly).

EDIT: turns out this is taken care of.

@@ -38,3 +41,26 @@ fs.createReadStream(pmmFileGz)
assert.deepStrictEqual(Buffer.concat(pmmResultBuffers), pmmExpected,
'result should match original random garbage');
}));
// test that the next gzip member can wrap around the input buffer boundary
[0, 1, 2, 3, 4, defEncoded.length].forEach((offset) => {

This comment has been minimized.

@addaleax

addaleax Mar 24, 2016

Member

@kthelgason I think testing for 0 should take care of the case where data members fall exactly on input boundaries:

The first unzip.write() call writes abcEncoded + 0 bytes from defEncoded, and the second unzip.end() call writes all of defEncoded. Or did you have something else in mind?

This comment has been minimized.

@kthelgason

kthelgason Mar 24, 2016

Contributor

of course, you're absolutely right.

@jasnell

This comment has been minimized.

Member

jasnell commented Mar 24, 2016

I agree that it's technically a semver-major but this definitely looks like a grey area and I agree with the comments that an error is the right thing to do. I think I'd be ok with this landing as a semver-minor.

@jasnell

This comment has been minimized.

Member

jasnell commented Mar 24, 2016

The change LGTM

@addaleax

This comment has been minimized.

Member

addaleax commented Mar 25, 2016

Btw, this is what re-allowing trailing garbage would look like – wouldn’t be a big deal.

zlib: Fix gzip member head/buffer boundary issue
Make sure that, even if an `inflate()` call only sees the first
few bytes of a following gzip member, all members are decompressed
and part of the full output.

This change also modifies behaviour for trailing garbage:
If there is trailing garbage which happens to start with the
gzip magic bytes, it is no longer discarded but rather throws
an error, since we cannot reliably tell random garbage from
a valid gzip member anyway and have to try and decompress it.
(Null byte padding is not affected, since it has been pointed
out at various occasions that such padding is normal and
discarded by `gzip(1)`, too.)

Adds tests for the special case that the first `inflate()` call
receives only the first few bytes of a second gzip member but
not the whole header (or even just the magic bytes).

@addaleax addaleax force-pushed the addaleax:fix-gzip-member-buffer-boundary branch from c8755e7 to e6af8b9 Mar 29, 2016

@addaleax

This comment has been minimized.

Member

addaleax commented Mar 29, 2016

Rebased & squashed this into one commit.

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Mar 31, 2016

Thanks Anna, landed in 54a5287.

@bnoordhuis bnoordhuis closed this Mar 31, 2016

bnoordhuis added a commit that referenced this pull request Mar 31, 2016

zlib: fix gzip member head/buffer boundary issue
Make sure that, even if an `inflate()` call only sees the first
few bytes of a following gzip member, all members are decompressed
and part of the full output.

This change also modifies behaviour for trailing garbage:
If there is trailing garbage which happens to start with the
gzip magic bytes, it is no longer discarded but rather throws
an error, since we cannot reliably tell random garbage from
a valid gzip member anyway and have to try and decompress it.
(Null byte padding is not affected, since it has been pointed
out at various occasions that such padding is normal and
discarded by `gzip(1)`, too.)

Adds tests for the special case that the first `inflate()` call
receives only the first few bytes of a second gzip member but
not the whole header (or even just the magic bytes).

PR-URL: #5883
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
@addaleax

This comment has been minimized.

Member

addaleax commented Mar 31, 2016

@bnoordhuis Thanks! I would actually be more comfortable with having this marked as semver-major and backporting this bugfix in a way that keeps the current (v5.x) behaviour for trailing garbage. Would that be okay? (And if so, would creating a PR against the v5.x branch be the correct way to do that?)

@addaleax addaleax deleted the addaleax:fix-gzip-member-buffer-boundary branch Mar 31, 2016

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Mar 31, 2016

Okay, I updated the labels. PRing against v5.x sounds good; v5.x-staging is the branch you want.

@addaleax

This comment has been minimized.

Member

addaleax commented Mar 31, 2016

Sorry to ask, but there is no v5.x-staging branch here?

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Mar 31, 2016

Oh, that's right; I remember being baffled by that a while ago. It should be fine to target v5.x in that case.

@jasnell @thealphanerd What's the reason there isn't a v5.x-staging branch like there is for v0.10/v0.12 and v4?

@MylesBorins

This comment has been minimized.

Member

MylesBorins commented Mar 31, 2016

@bnoordhuis afaik the staging convention has been primarily used for LTS... but tbqh it would be useful for v5 as well and likely allow for quite a bit less pain when cutting releases.

Not 100% where an issue should be made for this though, ctc?

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Mar 31, 2016

I don't think an issue is necessary. If other @nodejs/release people feel it's a good idea, just go ahead and do it. Maybe file an informational issue afterwards as a memo for contributors.

addaleax added a commit to addaleax/node that referenced this pull request Apr 2, 2016

zlib: fix gzip member head/buffer boundary issue
Make sure that, even if an `inflate()` call only sees the first
few bytes of a following gzip member, all members are decompressed
and part of the full output.

Adds tests for the special case that the first `inflate()` call
receives only the first few bytes of a second gzip member but
not the whole header (or even just the magic bytes).

This is a backport of nodejs#5883 and contains additional changes to
make sure that the behaviour on encountering trailing garbage
remains the same (namely to silently discard it if one full member
has already been decompressed).

 #PR-URL: nodejs#5883
 #Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
 #Reviewed-By: James M Snell <jasnell@gmail.com>

jasnell added a commit that referenced this pull request Apr 15, 2016

zlib: fix gzip member head/buffer boundary issue
Make sure that, even if an `inflate()` call only sees the first
few bytes of a following gzip member, all members are decompressed
and part of the full output.

Adds tests for the special case that the first `inflate()` call
receives only the first few bytes of a second gzip member but
not the whole header (or even just the magic bytes).

This is a backport of #5883 and contains additional changes to
make sure that the behaviour on encountering trailing garbage
remains the same (namely to silently discard it if one full member
has already been decompressed).

PR-URL: #5973
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment