Configuration options for Rack::Deflate #457

Merged
merged 2 commits into from Aug 3, 2014

9 participants

@jakubpawlowicz

Rack::Deflater currently does not support any configuration options which makes it cumbersome to control.

This pull request adds support for the following options:

  • include - an array of content types enabled for compression
  • if - a lambda for choosing whether to deflate based on current execution scope (request, status, body, or headers)

Examples:
use Rack::Deflater, include: %w(text/json application/json)
use Rack::Deflater, if: lambda { |env, status, headers, body| body.length > 512 }

EDIT: description was update based on a PR discussion

@maletor

Was this created with the intention of skipping image files with Deflator? Because that would be an excellent use case according to https://developers.google.com/speed/docs/best-practices/payload#GzipCompression.

@jakubpawlowicz

Yes, that's one use case I can think of. In general it's a bad idea to compress requests smaller than TCP packet size as it only adds an overhead on both ends.

This threshold feature can always be configured on the web server level (nginx, Apache) but in case of services which do not proxy requests through a web server (e.g. Heroku Cedar) having such option in Rack::Deflater is the only option available.

@maletor

Definitely important. Would love to see this get merged!

@maletor

TCP packet size is different for everybody? But in general, seems to be about 64K?

@jakubpawlowicz

64kB is the theoretical upper limit for TCP/IP. In reality it depends on the connection type and does not exceed ~1 kB - still there's no need to compress such small amounts of data.

@bpinto

Is this ready to be merged?

@jakubpawlowicz

Not yet as some new specs are failing under 1.8.7 / jruby / ree. It will be ready to merge within the next 24 hours.

@jakubpawlowicz

@bpinto it's ready for merge. I've refactored the tests a bit more, turned off deflating to see if all relevant tests fail (they do!) and reverted to a working state. Should be great now!

@raggi raggi commented on an outdated diff Dec 30, 2012
lib/rack/deflater.rb
@app = app
+
+ @min_content_length = options[:min_content_length] || options['min_content_length']
@raggi
Official Rack repositories member
raggi added a note Dec 30, 2012
  • Lets just support symbol option keys, not strings, consistent with other middleware options.
  • Maybe min_size or min_length? Just a little shorter?
@raggi
Official Rack repositories member
raggi added a note Dec 30, 2012

Lets default min-content-length to 1kb. There can be some advantages on mobile networks for smaller entities, but that domain has additional solutions and complexities - shared dictionaries are better, lets hope for them in http2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@raggi raggi commented on an outdated diff Dec 30, 2012
lib/rack/deflater.rb
@@ -5,19 +5,20 @@
module Rack
class Deflater
- def initialize(app)
+ def initialize(app, options = {})
@raggi
Official Rack repositories member
raggi added a note Dec 30, 2012

Lets document the options, using RDoc style. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@raggi raggi commented on an outdated diff Dec 30, 2012
lib/rack/deflater.rb
@@ -100,5 +101,45 @@ def each
deflater.close
end
end
+
+ private
+
+ def should_deflate?(env, status, headers, body)
+ # Skip compressing empty entity body responses and responses with
+ # no-transform set.
+ if Utils::STATUS_WITH_NO_ENTITY_BODY.include?(status) ||
+ headers['Cache-Control'].to_s =~ /\bno-transform\b/ ||
+ (headers['Content-Encoding'] && headers['Content-Encoding'] !~ /\bidentity\b/)
+ return false
+ end
+
+ # Skip if response body is too short
+ if @min_content_length &&
+ @min_content_length > headers['Content-Length'].to_i
@raggi
Official Rack repositories member
raggi added a note Dec 30, 2012

If we default this to some value, we can skip the nil check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@raggi raggi commented on an outdated diff Dec 30, 2012
lib/rack/deflater.rb
+ if Utils::STATUS_WITH_NO_ENTITY_BODY.include?(status) ||
+ headers['Cache-Control'].to_s =~ /\bno-transform\b/ ||
+ (headers['Content-Encoding'] && headers['Content-Encoding'] !~ /\bidentity\b/)
+ return false
+ end
+
+ # Skip if response body is too short
+ if @min_content_length &&
+ @min_content_length > headers['Content-Length'].to_i
+ return false
+ end
+
+ # Skip if :include is provided and evaluates to false
+ if @include &&
+ @include.kind_of?(Regexp) &&
+ !@include.match(env['PATH_INFO'])
@raggi
Official Rack repositories member
raggi added a note Dec 30, 2012

If we use env['PATH_INFO'] =~ @include then it's fine as long as @include is nil, or a regex, or a string. If necessary that could be sanitized in the constructor. This will crunch down all the boolean logic here.

@raggi
Official Rack repositories member
raggi added a note Dec 30, 2012

In fact, if you use === instead of =~ (change operation order) then you naturally get lambda support in 1.9+ as well as string and regex matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@raggi raggi commented on an outdated diff Dec 30, 2012
lib/rack/deflater.rb
+ if @min_content_length &&
+ @min_content_length > headers['Content-Length'].to_i
+ return false
+ end
+
+ # Skip if :include is provided and evaluates to false
+ if @include &&
+ @include.kind_of?(Regexp) &&
+ !@include.match(env['PATH_INFO'])
+ return false
+ end
+
+ # Skip if :exclude is provided and evaluates to true
+ if @exclude &&
+ @exclude.kind_of?(Regexp) &&
+ @exclude.match(env['PATH_INFO'])
@raggi
Official Rack repositories member
raggi added a note Dec 30, 2012

As for @include

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@raggi
Official Rack repositories member

In principle this looks great. I left you some line comments, thanks for your work so far!

@jakubpawlowicz

Thanks @raggi for all suggestions! I've just applied them so it should be more ready to merge.

@raggi
Official Rack repositories member

This needs rebasing on top of rack master. I'll be happy to merge this in the next release, but I am out of time for 1.5.0.

@jakubpawlowicz

That's a pity. I'll merge it as soon as possible to make it into 1.6.

@jakubpawlowicz

@raggi That should be it!

@nathanaeljones

To me, the safest default scenario is opt-in based on mime-type.

HTML5 Bootstrap includes server-side configuration to compress the following mime-types

text/html    
text/css
text/plain
text/x-component
application/javascript
application/json
application/xml
application/xhtml+xml
application/x-font-ttf
application/x-font-opentype
application/vnd.ms-fontobject
image/svg+xml
image/x-icon;

I would consider the following list instead:

# All html, text, css, and csv content should be compressed
text/plain
text/html
text/csv
text/css

# Only vector graphics and uncompressed bitmaps can benefit from compression.
#GIF, JPG, and PNG already use a lz* algorithm, and certain browsers can get confused.
image/x-icon
image/svg+xml
application/x-font-ttf
application/x-font-opentype
application/vnd.ms-fontobject


# All javascript should be compressed
text/javascript
application/ecmascript
application/json
application/javascript

# All xml should be compressed
text/xml
application/xml
application/xml-dtd
application/soap+xml
application/xhtml+xml
application/rdf+xml
application/rss+xml
application/atom+xml

If it's not too late, I would suggest mime-type evaluation instead of URL regexes. Providing a sane default set would be great; the current usage pattern is causing lots of issues with images and PDFs, since old browsers lie about Accept-Encoding.

Rack::Deflate has become very important, as Heroku's new Cedar stack removed automatic gzip support, and requires that step be moved to the application itself.

@maletor
@jakubpawlowicz

@raggi - any ideas about mime-type based evaluation?

@nathanaeljones

Should I submit a pull request?

@jakubpawlowicz

Any ideas how to push it forward?

@masterkain

Looks awesome to have; HAProxy 1.5dev19 dropped (temporarly) support for transparent gzip compression of chunked responses so we have to implement it at app levels, the more options, the better.

@masterkain masterkain commented on an outdated diff Aug 2, 2013
lib/rack/deflater.rb
@@ -100,5 +110,42 @@ def each
deflater.close
end
end
+
+ private
+
+ def should_deflate?(env, status, headers, body)
+ # Skip compressing empty entity body responses and responses with
+ # no-transform set.
+ if Utils::STATUS_WITH_NO_ENTITY_BODY.include?(status) ||
+ headers['Cache-Control'].to_s =~ /\bno-transform\b/ ||
+ (headers['Content-Encoding'] && headers['Content-Encoding'] !~ /\bidentity\b/)
+ return false
+ end
+
+ # Skip if response body is too short
+ if @min_length > headers['Content-Length'].to_i

I'm testing this but using Rainbows!, Rails 3.2.14, http 1.1 and I don't see any Content-Length header in this point, this thus fails and compression is skipped.

Rainbows! was started with the -N option to not insert any default middleware, so my config.ru is:

use Rack::ContentLength
use Rack::Chunked
run Myapp::Application
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jjb

@jakubpawlowicz -- this is an awesome patch. It has a lot of features all together, which is probably part of why it's taking so long to merge. Here are some thoughts on how it can be simplified and improved.

include/exclude

It seems like folks (including me) think that

  • whitelist is superior
  • mime-type/Content-Type is superior

I recommend:

  • remove the exclude option entirely (I can't think of a use case, tell me if I'm missing something)
  • reimplement include to operate on the Content-Type header instead of the URL.
  • thought: would be nice to allow the Content-Type list to accept regexes, to allow for text/* for example. but this could perhaps come in a subsequent pull request, to keep this one more simple.

skip_if

Maybe a better name for this to make it consistent with other DSLs would be unless. Also, IMO it's more simple to make it if. "skip/unless" make it feel like it's supposed to be a filter. "if" is more generic, allowing the user to simply put in a conditional as they see fit. (and indeed, even in the example you gave you have a !=, making your overall example a double negative).

min_content_length

tl;dr: this option should be removed entirely, because it can be trivially implemented with skip_if/unless/if

There are several problems with making a default value.

  • it's a behavior change, so it makes it more controversial to merge
  • you didn't explain how you chose 1024

There are 3 independent reasons I can think of to have a minimum value:

  1. the point at which compressing does not reduce the size, and in fact might increase it. this is an objective value that is relevant to all users. Some quick searching seems to indicate that the number is around 150 bytes.
  2. the point at which the size is smaller than a single packet anyway. to my understanding this varies between environments. Maybe others could speak to this more.
  3. the point at which the time-complexity of compression is not worth the reduction in data-transmission time. This is highly dependent on hardware and applications. Also, it can be balanced with the gzip compression level. setting the compression level to 5 results in a much different time-complexity/compression ratio than 9. so, a good solution to accommodate this tradeoff would also include adjusting the compression level based on content-type. I just checked -- rack::deflate uses zlib's default, which is 6.

To increase the chances of your patch being merged, I think the default should be removed. Instead, there could be a recommended minimum in documentation.

Furthermore, regarding point 3, this makes me think it would be nice to be able to adjust compression level as well. So after your patch is merged, I'd like to experiment with augmenting it so your compression_level option could be passed either an integer or a lambda, which would dictate if something is compressed at all, and with what compression level.

->(size){
case size
when 0..256
  nil
when 256..1024
  6
when 1025..Float::INFINITY
  9
end
}

And as I typed that out I realized: this could be achieved with skip_if -- so, you could rewrite skip_if to expect either a boolean or integer to be returned. When it's an integer, it indicates compression level.

@jjb

as soon as i posted, that, i realized that include can also be implemented with if. So now I'm a bit conflicted

  • on the one hand, only implementing if and nothing else will be much more simple overall code and allow for greater flexibility for the user
  • on the other hand, the point of offering the library is to make it easy to achieve domain-specific things, so options like include and min_content_length make a lot of sense.

I've seen other libraries offer specific options and then an all-powerful if on the side, so maybe that's fine for here. too. My gut says that going with only include and if is a good approach to start with.

@jakubpawlowicz

@jjb appreciate your awesome feedback!

  • to be honest I don't remember where 1024 came from (it's been a year since I wrote it) but I like your definition via lambda - let's have it that way.
  • whitelisting via include should work well (so exclude will be gone)
  • min_content_length was a convenience method but your idea is better

If you have a spare moment can you let me know what do you think about @nathanaeljones comment: #457 (comment) ? We could have it as an option for include, e.g.

use Rack::Deflater, { include: Rack::Deflater::DEFAULTS }

Thoughts?

@jjb

@jakubpawlowicz looks good. ideally the list would be sourced from combining arrays from another project, like...

Rack::Mime::MIME_TYPES.select{|k,v| v.start_with? "text" }

MIME::Types[/^text/] # note that this blows up with mime-types 1.25, works with 2.0, haven't tried 1.25.1

source: http://stackoverflow.com/questions/16003824/get-all-video-mime-types-in-ruby

@jakubpawlowicz

@jjb that's another good idea! Will wrap up all the changes in the coming days.

@jakubpawlowicz

@jjb - that should be it!

I decided to use Rack::Mime::MIME_TYPES instead of MIME::Types to save on adding a dependency. So a regular expression can be passed and it is first matched against Content-Type header and then against list of known mime types. We should probably have some sanity checks first against missing Content-Type header too.

Once we are all set I will squeeze all the commits into two - one for refactored spec_deflater.rb and the other one for Deflater options.

Please let me know how it looks to you.

@jakubpawlowicz

And regarding failing specs I noticed master fails on them too. It shouldn't be an excuse though...

@jjb
jjb commented Dec 9, 2013

@jakubpawlowicz awesome. the discussion about mime types was regarding where to get a list of defaults. it looks like you are using it to make sure that the provided types are valid. my feeling is that this isn't very useful and you can exclude this entirely.

@jjb jjb and 1 other commented on an outdated diff Dec 9, 2013
lib/rack/deflater.rb
+ if Utils::STATUS_WITH_NO_ENTITY_BODY.include?(status) ||
+ headers['Cache-Control'].to_s =~ /\bno-transform\b/ ||
+ (headers['Content-Encoding'] && headers['Content-Encoding'] !~ /\bidentity\b/)
+ return false
+ end
+
+ # Skip if :include is given and does not match request's content type
+ if @include &&
+ !(@include.match(headers['Content-Type']) && Rack::Mime::MIME_TYPES.values.include?(headers['Content-Type']))
+ return false
+ end
+
+ # Skip if :if lambda is given and evaluates to false
+ if @if &&
+ !@if.call(env, status, headers, body)
+ return false
@jjb
jjb added a note Dec 9, 2013
return false if @if && !@if.call(env, status, headers, body)
@jjb
jjb added a note Dec 9, 2013

(just my style opinion...)

Sure, it reads better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jakubpawlowicz

@jjb sorry my mistake! So let's make sure we are on the same page regarding :include:

  • it should be an array (assembled anyhow user wants)
  • we should default to all content types matching /^text/ - what about application/json?
  • if above then there's no need to make it default explicitly via use Rack::Deflater, { include: Rack::Deflater::DEFAULTS } as plain use Rack::Deflater will be enough
@jjb
jjb commented Dec 9, 2013

that is one way. but, the current deflater approach is to deflate everything. so, you could keep this default behavior, and only change the behavior if include is specified. I recommend this as it's more likely to be accepted by the maintainers and is just generally a more conservative approach.

So your code already does this :-D. so if you agree, you can just remove the check that it's in Rack::Mime::MIME_TYPES.values and IMO the code will be more simple.

(i think i overexplained this but just trying to be extra clear :-D)

@jjb jjb and 1 other commented on an outdated diff Dec 9, 2013
lib/rack/deflater.rb
+
+ def should_deflate?(env, status, headers, body)
+ # Skip compressing empty entity body responses and responses with
+ # no-transform set.
+ if Utils::STATUS_WITH_NO_ENTITY_BODY.include?(status) ||
+ headers['Cache-Control'].to_s =~ /\bno-transform\b/ ||
+ (headers['Content-Encoding'] && headers['Content-Encoding'] !~ /\bidentity\b/)
+ return false
+ end
+
+ # Skip if :include is given and does not match request's content type
+ if @include &&
+ !(@include.match(headers['Content-Type']) && Rack::Mime::MIME_TYPES.values.include?(headers['Content-Type']))
+ return false
+ end
+
@jjb
jjb added a note Dec 9, 2013
return false if @include && !@include.match(headers['Content-Type'])

Since @include should be an array it should rather read:

return false if @include && !@include.include?(headers['Content-Type'])

or to make it more readable

return false if @include && @include.index(headers['Content-Type']).nil?
@jjb
jjb added a note Dec 9, 2013

sounds good. i find the first more readable, i'm not familiar with index and the required .nil? at the end seems complicated

as long as it works as expected I'm fine with the first too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jakubpawlowicz

Cool, so we are on the same page. Will update the code shortly.

@jjb jjb and 1 other commented on an outdated diff Dec 9, 2013
lib/rack/deflater.rb
@app = app
+
+ @if = options[:if]
+ @include = options[:include]
@jjb
jjb added a note Dec 9, 2013

the symbols are for the DSL but you could make more descriptive variables within the class

@condition = options[:if]
@compressible_types = options[:include]

👍 nice one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jjb
jjb commented Dec 9, 2013

I made this SO question to try to find a nice list of types for us to recommended in the documentation http://stackoverflow.com/questions/20477558/where-can-i-find-a-list-of-textual-mime-types

@jakubpawlowicz

@jjb here you go! Let's see what your SO question brings.

@jakubpawlowicz

@jjb So it's quashed into two commits, with better docs, and updated PR description

@raggi @nathanaeljones You may like it much more right now!

@jayschab jayschab and 1 other commented on an outdated diff Jan 7, 2014
lib/rack/deflater.rb
@@ -126,5 +133,25 @@ def close
@body.close if @body.respond_to?(:close)
end
end
+
+ private
+
+ def should_deflate?(env, status, headers, body)
+ # Skip compressing empty entity body responses and responses with
+ # no-transform set.
+ if Utils::STATUS_WITH_NO_ENTITY_BODY.include?(status) ||
+ headers['Cache-Control'].to_s =~ /\bno-transform\b/ ||
+ (headers['Content-Encoding'] && headers['Content-Encoding'] !~ /\bidentity\b/)
+ return false
+ end
+
+ # Skip if @compressible_types are given and does not include request's content type
+ return false if @compressible_types && !@compressible_types.include?(headers['Content-Type'])
@jayschab
jayschab added a note Jan 7, 2014

First thing, thanks for this code it is a big help. One issue I ran into was this line. When the Content-type has the optional parameter value (in my case the character encoding) the content-type check fails. See RFC-1341. I am using .split(";")[0] and it is working for my use cases.

Thanks @jayschab for a valuable input. I'll add a test case and your solution and we should be all good. 👍

Found a bug in my suggested code in my testing. Content-Type isn't always set.

I ended up changing my version to this:

return false if @compressible_types && !(headers.has_key('Content-Type') && @compressible_types.include?(headers['Content-Type'][/[^;]*/]))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jakubpawlowicz jakubpawlowicz and 1 other commented on an outdated diff Jan 15, 2014
lib/rack/deflater.rb
@@ -126,5 +133,26 @@ def close
@body.close if @body.respond_to?(:close)
end
end
+
+ private
+
+ def should_deflate?(env, status, headers, body)
+ # Skip compressing empty entity body responses and responses with
+ # no-transform set.
+ if Utils::STATUS_WITH_NO_ENTITY_BODY.include?(status) ||
+ headers['Cache-Control'].to_s =~ /\bno-transform\b/ ||
+ (headers['Content-Encoding'] && headers['Content-Encoding'] !~ /\bidentity\b/)
+ return false
+ end
+
+ # Skip if @compressible_types are given and does not include request's content type
+ content_type = headers['Content-Type'].split(';')[0]
+ return false if @compressible_types && !@compressible_types.include?(content_type)

@jayschab I can't see your comment anymore. Does it mean this way is fine?

Sorry I didn't forgot I was commenting on an outdated diff (it is still there just not obvious in the GitHub UX #457 (comment)). Copied below:

Found a bug in my suggested code in my testing. Content-Type isn't always set.

I ended up changing my version to this:

return false if @compressible_types && !(headers.has_key('Content-Type') && @compressible_types.include?(headers['Content-Type'][/[^;]*/]))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
jakubpawlowicz added some commits Dec 9, 2013
@jakubpawlowicz jakubpawlowicz Adds improved Rack::Deflater tests. 8e52002
@jakubpawlowicz jakubpawlowicz Adds deflater options to control compression on per-request level.
* Adds :if option which should be given a lambda accepting env, status, headers, and body options.
* When :if evaluates to false a response body won't be compressed.
* Adds :include option which should be given an array of compressible content types.
* When :include don't include request's content type then response body won't be compressed.
3ae2f30
@jakubpawlowicz jakubpawlowicz commented on the diff Jan 15, 2014
lib/rack/deflater.rb
@@ -126,5 +133,25 @@ def close
@body.close if @body.respond_to?(:close)
end
end
+
+ private
+
+ def should_deflate?(env, status, headers, body)
+ # Skip compressing empty entity body responses and responses with
+ # no-transform set.
+ if Utils::STATUS_WITH_NO_ENTITY_BODY.include?(status) ||
+ headers['Cache-Control'].to_s =~ /\bno-transform\b/ ||
+ (headers['Content-Encoding'] && headers['Content-Encoding'] !~ /\bidentity\b/)
+ return false
+ end
+
+ # Skip if @compressible_types are given and does not include request's content type
+ return false if @compressible_types && !(headers.has_key?('Content-Type') && @compressible_types.include?(headers['Content-Type'][/[^;]*/]))

@jayschab Updated it per your suggestion.

Found it problematic to test scenario without the 'Content-Type' header as it gets set every time in specs.
Any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jakubpawlowicz

is there anything we can do to push it forward after 1.5 years in PR?

@felixbuenemann

This would be nice to have. I'm currently shuffling around middleware to keep dragonfly jobs from being compressed.

@raggi raggi merged commit 62dcc83 into rack:master Aug 3, 2014

0 of 2 checks passed

Details default The Travis CI build failed
Details continuous-integration/travis-ci The Travis CI build is in progress
@jjb
jjb commented Aug 3, 2014

😹

@jakubpawlowicz

Wicked! Thanks @raggi!

@jjb

in case someone finds this thread, here's what I'm using for mime-types:

include: Rack::Mime::MIME_TYPES.select{|k,v| v =~ /text|json|javascript/ }.values.uniq
@campbecf

I had to do use Rack::Deflater, if: lambda { |env, status, headers, body| body.body.length > 512 } to get the example to work.

@jjb

@campbecf What is "the example"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment