Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added gzip/deflate support #20

Closed
wants to merge 13 commits into from
Closed

added gzip/deflate support #20

wants to merge 13 commits into from

Conversation

ghost
Copy link

@ghost ghost commented May 24, 2014

Allow body-parser to accept gzip/deflate-compressed bodies (in POSTs, e.g.).

Worth discussing whether my handling of length and encoding is correct.

Finally -- no test, as the current supertest does not support posting arbitrary buffers and/or streams. We can either create our own request for this specific test, or modify supertest, or skip the test, or use another request library (requests?).

@Fishrock123
Copy link
Contributor

Needs a test. Using a different request lib is fine. - not familiar enough with supertest.

Future note to self: shush

@dougwilson
Copy link
Contributor

Wow, thanks! This is definitely something useful.

Worth discussing whether my handling of length and encoding is correct.

I will check it out as soon as I get to a computer :)

no test, as the current supertest does not support posting arbitrary buffers and/or streams

It does indeed. You use the .write() or .pipe() methods; I have done it plenty of times :) If you can't figure it out, that's fine; I can add tests when I merge your PR.

@dougwilson
Copy link
Contributor

Using a different request lib is fine.

P.S. Please don't use a different lib :) supertest works just fine for sending binary data and streams.

@dougwilson dougwilson self-assigned this May 24, 2014
delete options.length
delete options.encoding
break
case 'identity':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove trailing space on this line, please

removed trailing spaces;
@ghost
Copy link
Author

ghost commented May 24, 2014

@dougwilson : made requested changes.

I'd love to see how to work with supertest(), because I've tried .pipe() and I've tried .send() (not .write()), both to the derived class and to the result of .request().

In the mean time I've written a test using another request lib, but have not yet committed.

Let me know how you want to proceed.

@dougwilson
Copy link
Contributor

Thanks, @nitzan-shaked-clr Yes, .send does not work. You probably want to use .write, because that is a proxy to the raw node.js write, so you can do .write(buf)

In the mean time I've written a test using another request lib, but have not yet committed.

If you want, you can commit this as a separate commit and I can covert it to supertest when I merge it.

@dougwilson
Copy link
Contributor

Also, make sure you have a test where invalid gzip is sent to test error handling.

@ghost
Copy link
Author

ghost commented May 24, 2014

added tests using mikeal/request. Added a gzip test, a deflate test, and a "bad gzip" test, as requested.

I have to admit I tried using supertest, using both .pipe, and failed. If I understand the code there correctly then using .pipe will not trigger the eventual Test class's asserts(). I am sure I got this wrong since you say you did this before. If you want, point me to an example and I'll convert the tests.

@ghost
Copy link
Author

ghost commented May 24, 2014

Hm. The build is failing for node 0.8 b/c of streams. Should I: allow stream failures, rewrite the test to not use streams (use zlib's convenience methods for compression), or use the 'readable-streams' module which tracks node-core's implementation of streams, but without relying on a specific node version?

@dougwilson
Copy link
Contributor

If you want, point me to an example and I'll convert the tests.

I'm not at a computer. Have you tried .write()? Remember you can't chain with .write(), so you need to use variables.

The build is failing for node 0.8 b/c of streams

The failure is because you are using 0.10-style streams in your tests. If you just use supertest, then it'll work. I'll already switch it to supertest if you can't figure it out, so there is no need to start doing a munch of convoluted stuff that I'll just remove.

rewrite the test to not use streams (use zlib's convenience methods for compression)

yes, no need for streams in the tests. Even putting literal Buffers in there is fine with me.

@dougwilson
Copy link
Contributor

Worth discussing whether my handling of length

I forgot to mention that if there is a content-length, we want to check that we got the exact amount, just like raw-body is doing; right now that check is don't done for the gzip stuff.

@dougwilson
Copy link
Contributor

@nitzan-shaked-clr if you wanted an example of using supertest to write your gzip:

var test = request(app).post('/')
test.set('Content-Encoding', 'gzip')
test.set('Content-Type', 'application/json')
test.write('1f8b080000000000000bab56ca4bcc4d55b2527ab16e97522d00515be1cc0e000000', 'hex')
test.expect(200, '{"name":"论"}', done)

@ghost
Copy link
Author

ghost commented May 25, 2014

@dougwilson : great. Modified tests accordingly. Next on my todo list is checking an exact content-length, which I'll do later today. Nothing else for now, correct?

@ghost
Copy link
Author

ghost commented May 25, 2014

@dougwilson : it seems like we need to better-define the logic in case of "bad" content-length. http.server will truncate incoming data that's longer than content-length, for example.

@dougwilson
Copy link
Contributor

http.server will truncate incoming data that's longer than content-length

I am well aware of this. The length check is to check for developer errors, typically on node 0.8, where some data events my fire before this module is invoked, thus data gets lost. We want to give a better error message than a JSON syntax error (or in this case, corrupted gzip).

raw-body also does another check we lose in the new content-encoding: that req.setEndocing('utf8') has not been called. This one is also very important. If a developer does req.setEncoding('utf8') by accident before this module, corrupted data will get sent to the gzip streeam. We want the error to tell what the cause is, not just error out from invalid gzip.

Lastly, can you rebase your changes on to the current master? There are conflicting changes there now, because I had to fix a bug.

@dougwilson
Copy link
Contributor

Nothing else for now, correct?

P.S. I want to let you know that everything else you do is simply a bonus for me, since I won't have to do it :) I want you to know that if you were to, say, abandon this PR right now, I will still be pulling in this valuable functionality, so I don't want you to feel that it will not get included :)

I have been on vacation this weekend, so I just happen to be leaning more on the PR submitters than usual.

@ghost
Copy link
Author

ghost commented May 25, 2014

Cool :)

@dougwilson
Copy link
Contributor

Sorry, didn't have much time today, haha. I have it on my list for tomorrow :)

}
function setupCompressedStream() {
// assert the stream encoding is buffer.
if (req._decoder || (req._readableState && req._readableState.encoding !== null)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found out thanks to node.js > 0.8 but < 0.10.6, we need this conditional to be:

var state = req._readableState;
if (req._decoder || (state && (state.encoding || state.decoder))) {

otherwise people using older versions of 0.10 will always trigger this error (since req._readableState.encoding will be undefined).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll fix this, or shall I go ahead and commit to my fork?
On May 27, 2014 8:23 AM, "Douglas Christopher Wilson" <
notifications@github.com> wrote:

In index.js:

  • function onCompressedEnd() {
  • if (compressedReceived < length) {
  •  err = new Error('compressed stream too short')
    
  •  err.status = 400
    
  •  next(err)
    
  •  return
    
  • } else if (compressedReceived > length) {
  •  err = new Error('compressed stream too long')
    
  •  err.status = 400
    
  •  next(err)
    
  •  return
    
  • }
  • }
  • function setupCompressedStream() {
  • // assert the stream encoding is buffer.
  • if (req._decoder || (req._readableState && req._readableState.encoding !== null)) {

I just found out thanks to node.js > 0.8 but < 0.10.6, we need this
conditional to be:

if (req._decoder || (req._readableState && (req._readableState.encoding !== null || req._readableState.decoder))) {

otherwise people using older versions of 0.10 will always trigger this
error (since req._readableState.encoding will be undefined).


Reply to this email directly or view it on GitHubhttps://github.com//pull/20/files#r13062528
.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a "I'll fix this if you don't do it first" kinda thing ;) You don't have to do anything if you want to think about this as a note to myself for when I merge this :D

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might do it later today -- I don't want to "just do it" without testing
first, and I'm a bit busy today.

On Tue, May 27, 2014 at 8:39 AM, Douglas Christopher Wilson <
notifications@github.com> wrote:

In index.js:

  • function onCompressedEnd() {
  • if (compressedReceived < length) {
  •  err = new Error('compressed stream too short')
    
  •  err.status = 400
    
  •  next(err)
    
  •  return
    
  • } else if (compressedReceived > length) {
  •  err = new Error('compressed stream too long')
    
  •  err.status = 400
    
  •  next(err)
    
  •  return
    
  • }
  • }
  • function setupCompressedStream() {
  • // assert the stream encoding is buffer.
  • if (req._decoder || (req._readableState && req._readableState.encoding !== null)) {

It's a "I'll fix this if you don't do it first" kinda thing ;) You don't
have to do anything if you want to think about this as a note to myself for
when I merge this :D


Reply to this email directly or view it on GitHubhttps://github.com//pull/20/files#r13062728
.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, modified and test under
v0.8.26
v0.10.5
v0.10.28
v0.11.13

On Tue, May 27, 2014 at 8:39 AM, Douglas Christopher Wilson <
notifications@github.com> wrote:

In index.js:

  • function onCompressedEnd() {
  • if (compressedReceived < length) {
  •  err = new Error('compressed stream too short')
    
  •  err.status = 400
    
  •  next(err)
    
  •  return
    
  • } else if (compressedReceived > length) {
  •  err = new Error('compressed stream too long')
    
  •  err.status = 400
    
  •  next(err)
    
  •  return
    
  • }
  • }
  • function setupCompressedStream() {
  • // assert the stream encoding is buffer.
  • if (req._decoder || (req._readableState && req._readableState.encoding !== null)) {

It's a "I'll fix this if you don't do it first" kinda thing ;) You don't
have to do anything if you want to think about this as a note to myself for
when I merge this :D


Reply to this email directly or view it on GitHubhttps://github.com//pull/20/files#r13062728
.

@jonathanong
Copy link
Member

I was thinking about adding this. The whole decompress logic could be put in a separate lib. I might've already created a lib for that.

Only problem is that I don't think anyone would actually use this. Lol

@dougwilson
Copy link
Contributor

Only problem is that I don't think anyone would actually use this

Right, at least browser do not support this, so it would only be APIs using it.

@dougwilson
Copy link
Contributor

I might've already created a lib for that.

https://github.com/stream-utils/inflation :D

@jonathanong
Copy link
Member

An edge case I was thinking about is that when you inflate, you'd want to check the content length as well. So you'll need a raw-body-like util for decompressing

@dougwilson
Copy link
Contributor

An edge case I was thinking about is that when you inflate, you'd want to check the content length as well

Yea. This PR I believe does implement this, but I don't remember. I was thinking of just adding that verification to the inflation module. Another edge-case is that content-encoding can actually be more than a single value ;)

@jonathanong
Copy link
Member

Oh my god why

@jonathanong
Copy link
Member

Maybe just add it to raw body

@dougwilson
Copy link
Contributor

Maybe just add it to raw body

Add what to raw-body? The use of inflation? Length check (which is already there)? Sorry, not sure what you are referring to :P

@jonathanong
Copy link
Member

Inflation + length check. You want to check the headers on the original stream and check the length of the final body to avoid zip bombs

@dougwilson
Copy link
Contributor

original stream and check the length of the final body to avoid zip bombs

Yep, I am aware of that :)

@ghost
Copy link
Author

ghost commented Jun 1, 2014

@dougwilson : any news?

@dougwilson
Copy link
Contributor

Yes. May get to it today. It was brought to my attention above that the inflating was already implemented by the library called inflation and so we need to use that here instead to reduce the amount of unnecessary code in here (since it is already implemented).

@panahi
Copy link

panahi commented Jun 12, 2014

+1 vote for this issue!

@dougwilson
Copy link
Contributor

Thanks. It has not been forgotten about, btw.

@dougwilson
Copy link
Contributor

FYI update on this: it should be landing tonight so I can pack it into a new express 3 release :)

@dougwilson
Copy link
Contributor

it should be landing tonight

Just want to keep people updated here: I'm just bumping this tomorrow since I'm not feeling well

@panahi
Copy link

panahi commented Jun 19, 2014

Thanks for the work dougwilson and nitzan-shaked-clr -- I've been using nitzan-shaked-clr's fork for a while and it's working great, look forward to using the released lib.

@nitzan-shaked
Copy link

@dougwilson great. is this in npm?

@panahi that's nice to hear :)

@Fishrock123
Copy link
Contributor

@nitzan-shaked not quite yet.

@dougwilson
Copy link
Contributor

is this in npm?

no, because I am still working out some kinks :) It'll be there today sometime, though.

@dougwilson
Copy link
Contributor

FYI this module should be ready to go, @nitzan-shaked-clr @panahi if you wanted to verify it still works for your use-cases. Install with npm install expressjs/body-parser

@nitzan-shaked
Copy link

@dougwilson : works for me. Thanks for the merge.

One more thing: I noticed you hadn't converted all my tests (or am I mistaken?) -- plus could it be that the length/encoding handling is different? I only quickly glanced, can't be sure.

@dougwilson
Copy link
Contributor

Correct, some did not apply to the current code, mainly because the current code is a middle-group before it is moved into it's own module.

@expressjs expressjs locked and limited conversation to collaborators Jul 20, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants