Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

GZIP disables Content-Length header, and thus breaks caching. #240

Closed
kentfredric opened this Issue Nov 10, 2012 · 3 comments

Comments

Projects
None yet
3 participants
  1. Caching can only be performed if the request is guaranteed to be complete.
  2. The only way we can check for this is comparing the length of the content with the advertised content length.
  3. Therefore, responses without a Content-Length header cannot be cached.
  4. Accept-Encoding: gzip enables response being gzip encoded, and also removes the Content-Length header.

Request with GZIP Encoding

curl -H 'Accept-Encoding: gzip' \
        -vd '{"fields":["author","name","date","distribution","version"],"query":{"constant_score":{"filter":{"and":[{"range":{"date":{"from":"2011-10-01T00:00:00.000Z"}}},{"terms":{"distribution":["Moose"]}}]}}},"size":5000,"sort":[{"date":"desc"}]}' \
        http://api.metacpan.org/v0/release/_search 2>&1
* About to connect() to api.metacpan.org port 80 (#0)
*   Trying 109.104.113.94...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
^M  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* connected
* Connected to api.metacpan.org (109.104.113.94) port 80 (#0)
> POST /v0/release/_search HTTP/1.1
> User-Agent: curl/7.26.0
> Host: api.metacpan.org
> Accept: */*
> Accept-Encoding: gzip
> Content-Length: 237
> Content-Type: application/x-www-form-urlencoded
> 
} [data not shown]
* upload completely sent off: 237 out of 237 bytes
< HTTP/1.1 200 OK
< Server: nginx/1.0.0
< Date: Sat, 10 Nov 2012 16:15:12 GMT
< Content-Type: application/json; charset=utf-8
< Vary: Accept-Encoding
< Via: 1.1 BC4-ACLD
< Transfer-Encoding: chunked
< Connection: Keep-Alive
< Content-Encoding: gzip
< 

Request without GZIP Encoding

curl -H 'Accept-Encoding: none' \
        -vd '{"fields":["author","name","date","distribution","version"],"query":{"constant_score":{"filter":{"and":[{"range":{"date":{"from":"2011-10-01T00:00:00.000Z"}}},{"terms":{"distribution":["Moose"]}}]}}},"size":5000,"sort":[{"date":"desc"}]}' \
        http://api.metacpan.org/v0/release/_search 2>&1
* About to connect() to api.metacpan.org port 80 (#0)
*   Trying 109.104.113.94...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
^M  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* connected
* Connected to api.metacpan.org (109.104.113.94) port 80 (#0)
> POST /v0/release/_search HTTP/1.1
> User-Agent: curl/7.26.0
> Host: api.metacpan.org
> Accept: */*
> Accept-Encoding: none
> Content-Length: 237
> Content-Type: application/x-www-form-urlencoded
> 
} [data not shown]
* upload completely sent off: 237 out of 237 bytes
< HTTP/1.1 200 OK
< Server: nginx/1.0.0
< Date: Sat, 10 Nov 2012 16:18:34 GMT
< Content-Type: application/json; charset=utf-8
< Vary: Accept-Encoding
< Via: 1.1 BC4-ACLD
< Content-Length: 6939
< Connection: Keep-Alive
< 
Owner

rwstauner commented Nov 10, 2012

searching ddg i found people noticing similar issues...
in particular enabling compression seems to remove the Content-Length and add Transfer-Encoding: chunked

Right, further delving into this issue proves its an infrastructural concern on the behalf of nginx.

nginx functions as a streaming web server, and avoids any form of buffering data it can.

If the message body doesn't need modifying, then it can know the Content-Length in advance, and can reliably report as such, and then just stream the message body from whatever is creating it as-is.

However, if the message body does need modifying ( ie: gzip ), then nginx will do so in a streaming mechanism, that is: instead of gzipping a large file, finding the content length, and then starting the transfer, it adds a gzip filter to the output stream, and uses "Chunked" transfer encoding instead.

And it has to do it this way, because you can either know the content-length of the gzipped data, or you can gzip it on the fly as it transfers, you can't have both.

So you don't have lots of choices, at present, this means for me I'll have to look into WWW::Mechanize::Cached and work out how to use chunked transfer-encoding in a cache-reliable way.

Alternatively, you can handle the gzip layer in middleware before it gets to nginx, and manually gziping all the content in the middleware layer before handing it over to nginx ( which should work, because you'll know the Content-Length before you have to hand the data to nginx )

But doing this will have measurable performance negatives for large transfers, because the entire source will have to be compressed before the first byte is sent.

@kentfredric kentfredric referenced this issue in libwww-perl/www-mechanize-cached Feb 14, 2013

Closed

Caching Chunked transfer is presently broken #3

Owner

ranguard commented Nov 13, 2014

Closing as part of Nov 2014 cleanup - should get sorted with new API version and CDN

@ranguard ranguard closed this Nov 13, 2014

@lyoshenka lyoshenka added a commit to lbryio/lbry.io that referenced this issue Sep 2, 2016

@lyoshenka lyoshenka add gzip compression in a cache-compatible way 6b3ad6c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment