Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Decompression.md - drop deflate, mention zstd #405

Merged
merged 1 commit into from
Aug 16, 2023

Conversation

wojtekmach
Copy link
Contributor

@wojtekmach wojtekmach commented Aug 11, 2023

First of all, we can't use :zlib.unzip/1 for deflate. If anything, it's :zlib.uncompress/1:

iex> Req.get!("http://httpbin.org/deflate", raw: true).body |> :zlib.unzip()
** (ErlangError) Erlang error: :data_error
    :zlib.inflate_nif(#Reference<0.3067689971.3529113605.174556>, 8192, 16384, 0)
    :zlib.dequeue_all_chunks_1/3
    :zlib.inflate/3
    :zlib.unzip/1
    iex:3: (file)
iex> Req.get!("http://httpbin.org/deflate", raw: true).body |> :zlib.uncompress()
"{\n  \"deflated\": true, \n  \"headers\": {\n    \"Accept-Encoding\": \"zstd, br, gzip\", \n    \"Host\": \"httpbin.org\", \n    \"User-Agent\": \"req/0.3.11\", \n    \"X-Amzn-Trace-Id\": \"Root=1-64d6b50d-4c634ab23ae6d5fe3dab6694\"\n  }, \n  \"method\": \"GET\", \n  \"origin\": \"89.73.45.237\"\n}\n"

But I don't think it's quite that simple.

Bandit has support for deflate: https://github.com/mtrudel/bandit/blob/1.0.0-pre.11/lib/bandit/compression.ex#L15 but using :zlib.uncompress/1 there crashes:

iex> Bandit.Compression.compress("hello", "deflate", []) |> :zlib.uncompress()
** (ErlangError) Erlang error: :data_error
    (erts 14.0.2) :zlib.inflateEnd_nif(#Reference<0.2420980051.2993029125.66777>)
    (erts 14.0.2) :zlib.uncompress/1
    iex:2: (file)

I trust @mtrudel to have followed the RFCs and implemented this correctly. :) So if you want to mention deflate, we should follow his implementation, do the opposite.

But, and maybe I'm just burned out with this a bit, I don't think it's worth it. None of the big sites seem to support it. And then there's https://zlib.net/zlib_faq.html#faq39

What's the difference between the "gzip" and "deflate" HTTP 1.1 encodings?

"gzip" is the gzip format, and "deflate" is the zlib format. They should probably have called the second one "zlib" instead to avoid confusion with the raw deflate compressed data format. While the HTTP 1.1 RFC 2616 correctly points to the zlib specification in RFC 1950 for the "deflate" transfer encoding, there have been reports of servers and browsers that incorrectly produce or expect raw deflate data per the deflate specification in RFC 1951, most notably Microsoft. So even though the "deflate" transfer encoding using the zlib format would be the more efficient approach (and in fact exactly what the zlib format was designed for), using the "gzip" transfer encoding is probably more reliable due to an unfortunate choice of name on the part of the HTTP 1.1 authors.

Bottom line: use the gzip format for HTTP 1.1 encoding.

And then there's this excerpt from zlib:deflateInit/6

WindowBits - The base two logarithm of the window size (the size of the history buffer). It is to be in the range 8 through 15. Larger values result in better compression at the expense of memory usage. Defaults to 15 if deflateInit/2 is used. A negative WindowBits value suppresses the zlib header (and checksum) from the stream. Notice that the zlib source mentions this only as a undocumented feature.

Again I think the implementation is correct because I saw setting -MAX_BITS in Python or Ruby too.

I had (wrong) support for it in Req and decided to just drop it (wojtekmach/req@e19d89f, wojtekmach/req#215) fwiw.

@mtrudel
Copy link
Contributor

mtrudel commented Aug 12, 2023

Thoughts in no particular order.

  • IIRC deflate (zlib) exhibited generally better compression than gzip
  • I tested deflate support against Safari, Chrome and Firefox and it worked as expected (I didn't break out Wireshark or anything, but I trust the browsers' developer tools to not be lying to me)
  • The negative buffer size thing was super confusing when I was writing this up; undocumented and very surprising. 0/10 (-0/10?) would not recommend.
  • There's actually a bunch of subtlety with content-encoding viz a viz caching concerns. See @tangulip's excellent work on Improve RFC compliance regarding cacheability on deflated responses mtrudel/bandit#207 for details
  • I wish brotli was easier to support. I have a TODO to add it but I'm trying VERY hard to avoid NIFs to avoid installation hurdles; it'll likely wait until I flew out the bandit_native add-on story post 1.0
  • All of these concerns generalize between client and server. To the extent we're solving them, we should endeavour to do so with shared libraries. http_content_encoding, anyone?

@ericmj ericmj merged commit 27eed68 into elixir-mint:main Aug 16, 2023
4 checks passed
@wojtekmach wojtekmach deleted the wojtekmach-patch-1 branch August 17, 2023 07:14
@wojtekmach
Copy link
Contributor Author

Thank you for the cache/deflate link, very interesting read.

Regarding brotli, I think the brotli package looks pretty good. I wouldn't include it by default but I'd consider optionally supporting it.

I think something like http_content_encoding sounds appealing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants