Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for gzip content decoding #951

Merged
merged 1 commit into from
Jul 2, 2014

Conversation

kevinoid
Copy link
Contributor

@kevinoid kevinoid commented Jul 1, 2014

My apologies in advance for adding yet another gzip implementation for you to consider, but it appears that work on the alternative solutions (#303, #458, #464, #660) has stalled while interest in a solution is still high (#539, #725, #747) and requires non-trivial effort to implement outside of the library (although it's certainly not horrible).

Unlike the previous solutions, this pull request supports stream-based usage and does not introduce any backwards-incompatible behavior. It is intended to serve as a bridge for the 2.x series until 3.0 is ready for release, at which points its existing gzip implementation can supersede this one.

This commit introduces the decodeContent boolean option to allow users to explicitly request decoding of supported response content and inclusion of the appropriate content negotiation header (Accept-Encoding), if unspecified.

Although deflate support is also common on servers, this commit does not add deflate support due to compatibility issues with non-conformant "raw" deflate encodings and the lack of a compelling reason to support it given gzip's wide availability.

Thanks for considering,
Kevin

@mikeal
Copy link
Member

mikeal commented Jul 1, 2014

should handle gzip and deflate, just steal this code:

https://github.com/mikeal/http-duplex-gzip-client/blob/master/index.js#L28

also, I can't add gzip accept by default before the 3.0 release. it's too big of a change to do in a point release. please add a {gzip:true} option to request that in this release will default to false and in 3.0 we can switch to true.

@kevinoid
Copy link
Contributor Author

kevinoid commented Jul 1, 2014

Does that code really support "raw" deflate streams, or only standards-conforming zlib deflate streams? Because I had thought that would fail for servers which send raw streams (IIS).

Sure, I'll rename decodeContent to gzip and resubmit. Note that it already defaults to false. Totally agree about switching to true in 3.0.

@mikeal
Copy link
Member

mikeal commented Jul 1, 2014

@kevinoid i haven't tested it myself but per the node core documentation it should.

@kevinoid
Copy link
Contributor Author

kevinoid commented Jul 1, 2014

@mikeal See what you think about this version. I've renamed the option to gzip and added support for deflate.

Unfortunately, on my machine (with Node.js v0.10.29) raw deflate encodings result in Error: incorrect header check. To get around it, I think we'd have to test the if the zlib stream has a valid header and fall back to the raw decoder (or try decoding both in parallel and use whichever doesn't error, or something else daft).

There's also an argument to be made for not supporting servers that send non-conformant data (since the spec requires the zlib-wrapped encoding), but they are somewhat prevalent and users may not appreciate appeals to the spec. What do you think?

Support decoding of the gzip Content-Encoding in responses using the
zlib module to decode the response before piping it through the request
object (and therefore before user-connected pipes or body parsing).

Add the boolean option `gzip` to allow users to explicitly
request decoding of supported response content and inclusion of
appropriate content negotiation headers, if unspecified.

This commit favors backwards-compatibility over the increased
performance that transparent compression could provide, although it is
hoped that a future backwards-incompatible version can make transparent
compression the default.

Some major tradeoffs of transparent compression are:

- It may trigger changes in server behavior and performance (for better
  or worse) that are unexpected, either due to buggy servers or
  intermediate network hardware.
- The compression is not fully transparent as users who attach to the
  `data` event of the response (rather than the `data` event of
  `request`) will get gzipped data rather than uncompressed data.
+ It is likely a big win for most users (both current and future) who
  would otherwise be unaware or unable to spend the time to implement
  content negotiation and decoding.  Especially given the prevalence of
  highly-compressible text content (e.g. JSON and XML) and widespread
  server support for gzip.

Changes since v1:
- Rename `decodeContent` option to `gzip` to match option name in 3.0

Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
@kevinoid
Copy link
Contributor Author

kevinoid commented Jul 1, 2014

Whoops, overlooked the README. Now updated there as well.

@kevinoid
Copy link
Contributor Author

kevinoid commented Jul 2, 2014

Hey @mikeal, don't want to pester you, but do you have any thoughts for what to do about deflate support? Should I work on detecting zlib wrapping or not worry about it or drop support for deflate from this PR and save it for later? Can do any of those (or something else), just let me know.

@mikeal
Copy link
Member

mikeal commented Jul 2, 2014

just drop deflate support for now.

@kevinoid
Copy link
Contributor Author

kevinoid commented Jul 2, 2014

Sounds good. Dropped the work-in-progress deflate commit from the PR. Let me know if there's anything else that needs attention in the remaining commit.

mikeal added a commit that referenced this pull request Jul 2, 2014
Add support for gzip content decoding
@mikeal mikeal merged commit b66e4cd into request:master Jul 2, 2014
@kevinoid
Copy link
Contributor Author

kevinoid commented Jul 2, 2014

Thanks, much appreciated!

@mikeal
Copy link
Member

mikeal commented Jul 2, 2014

looks like we broke 0.8 :(

https://travis-ci.org/mikeal/request/jobs/29015113

@kevinoid
Copy link
Contributor Author

kevinoid commented Jul 3, 2014

Whoops. My bad. I'll work on it.

@kevinoid
Copy link
Contributor Author

kevinoid commented Jul 3, 2014

Looks like setEncoding was added to the zlib streams in nodejs/node-v0.x-archive@9b5abe5b (v0.9.4 and later) so we'll need to wrap the zlib stream to support the gzip and encoding options simultaneously on these versions. I'll poke around to see if there is a way to do this without adding a lot of code to implement a string-decoding stream without stream.Transform.

@kevinoid
Copy link
Contributor Author

kevinoid commented Jul 3, 2014

I don't see a really simple way to get setEncoding support on zlib streams pre-v0.9.4. How would you feel about adding a dependency on stringstream to be used when setEncoding is not present on the zlib stream? It is not heavily used and hasn't been updated in a long time, but the code looks decent and it does just what we need.

I've updated the branch with a commit that uses stringstream to support pre-v0.9.4. Let me know what you think.

@kevinoid
Copy link
Contributor Author

kevinoid commented Jul 3, 2014

Looks like GitHub doesn't check for branch changes after the pull request has been merged (which makes sense). The commit in question is kevinoid/request@a2da682 . I can submit a separate pull request if it looks acceptable.

@LoicMahieu
Copy link
Member

The PR is already merged so you need to make a new PR with only the fix. Thanks.

@kevinoid
Copy link
Contributor Author

kevinoid commented Jul 8, 2014

Sure thing. Submitted as #960.

@OliverJAsh
Copy link

It took me awhile to realise that the uncompressed data is available on the request stream, not on the response stream (received in the response event on the request stream). Can anyone help me understand why this is the case, and why it isn't uncompressed on the response stream?

@kevinoid
Copy link
Contributor Author

Sure. The reason is that the response object emitted on the response event is the "raw" response object passed back from the node internals. Although it might be possible to construct some sort of response proxy object which would appear to be an uncompressed response, it would start to get ugly very quickly (e.g. are all methods proxied? Do we fix headers such as Content-Length that would differ in an uncompressed response? Do we remove the Content-Encoding header, if so how do users know if it was compressed? Those sorts of things.)

However, the documentation does not currently mention this issue, and I'm sure you aren't the only person to get tripped up by it. I'll work on improving the README to help clarify what is going on.

@OliverJAsh
Copy link

Ahh, I see! What kind of stream does request give you then – if it writes and reads?

On 21 Jul 2014, at 03:24, Kevin Locke notifications@github.com wrote:

Sure. The reason is that the response object emitted on the response event is the "raw" response object passed back from the node internals. Although it might be possible to construct some sort of response proxy object which would appear to be an uncompressed response, it would start to get ugly very quickly (e.g. are all methods proxied? Do we fix headers such as Content-Length that would differ in an uncompressed response? Do we remove the Content-Encoding header, if so how do users know if it was compressed? Those sorts of things.)

However, the documentation does not currently mention this issue, and I'm sure you aren't the only person to get tripped up by it. I'll work on improving the README to help clarify what is going on.


Reply to this email directly or view it on GitHub.

@kevinoid
Copy link
Contributor Author

If I understand your question, conceptually you could think of request as a stream.Transform (although it does not inherit from this class) where the data is being transformed by an HTTP server. The (possibly empty) request data is written then transformed by the server then returned as response data. Does that help clarify?

@OliverJAsh
Copy link

Ahh, okay. I was trying to do stream.setEncoding, but this doesn’t work as it’s not a “proper” readable stream(?).
On 21 Jul 2014, at 13:01, Kevin Locke notifications@github.com wrote:

If I understand your question, conceptually you could think of request as a stream.Transform (although it does not inherit from this class) where the data is being transformed by an HTTP server. The (possibly empty) request data is written then transformed by the server then returned as response data. Does that help clarify?


Reply to this email directly or view it on GitHub.

@kevinoid
Copy link
Contributor Author

Right. It doesn't currently inherit from stream.Readable and doesn't provide its own setEncoding method (although that is probably a good idea).

However, you can set the encoding with the encoding option to the request constructor, which will call setEncoding on the decompressed stream. I'll look into adding a setEncoding method so that the encoding can be set after the response is received (and to better support the Readable interface).

@OliverJAsh
Copy link

Thanks! Want me to open an issue for this feature request, so you can track it separately from this?

On 21 Jul 2014, at 13:17, Kevin Locke notifications@github.com wrote:

Right. It doesn't currently inherit from stream.Readable and doesn't provide its own setEncoding method (although that is probably a good idea).

However, you can set the encoding with the encoding option to the request constructor, which will call setEncoding on the decompressed stream. I'll look into adding a setEncoding method so that the encoding can be set after the response is received (and to better support the Readable interface).


Reply to this email directly or view it on GitHub.

@kevinoid
Copy link
Contributor Author

Sure, that would be great, thanks!

@OliverJAsh
Copy link

@kevinoid See #974

@kevinoid
Copy link
Contributor Author

Thanks!

@mikeal
Copy link
Member

mikeal commented Jul 22, 2014

More accurately, the Request stream is a Duplex stream, because what you pipe to it does not come our the other end, and what comes out the readable side really doesn't have all that much to do with the input (which is why it isn't just a transform stream).

Of course, request predates all the stream base objects in core so it doesn't inherit from either Transform or Duplex :)

@OliverJAsh
Copy link

Now I finally know the difference between duplex and transform streams!

On 22 Jul 2014, at 15:12, Mikeal Rogers notifications@github.com wrote:

More accurately, the Request stream is a Duplex stream, because what you pipe to it does not come our the other end, and what comes out the readable side really doesn't have all that much to do with the input (which is why it isn't just a transform stream).

Of course, request predates all the stream base objects in core so it doesn't inherit from either Transform or Duplex :)


Reply to this email directly or view it on GitHub.

nylen pushed a commit to nylen/request that referenced this pull request Oct 17, 2014
Add support for gzip content decoding
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants