Enable gzip compression by default #16466

Merged
merged 1 commit into from Aug 21, 2014

Conversation

Projects
None yet
9 participants
@schneems
Member

schneems commented Aug 11, 2014

If someone is using ActionDispatch::Static to serve assets and makes it past the match? then the file exists on disk and it will be served. This PR adds in logic that checks to see if the file being served is already compressed (via gzip) and on disk, it will be served as long as the client can handle gzip encoding. If not, then a non-gzip file will be served.

This additional logic slows down an individual asset request but should speed up the consumer experience as compressed files are served and production applications should be delivered with a CDN. This PR allows a CDN to cache a gzip file by setting the Vary header appropriately. In net this should speed up a production application that are using Rails as an origin for a CDN. Non-asset request speed is not affected in this PR.

@matthewd

This comment has been minimized.

Show comment
Hide comment
@matthewd

matthewd Aug 11, 2014

Member

Shouldn't we only set Vary if a .gz file exists?

Member

matthewd commented Aug 11, 2014

Shouldn't we only set Vary if a .gz file exists?

@@ -6,7 +6,9 @@ class FileHandler
def initialize(root, cache_control)
@root = root.chomp('/')
@compiled_root = /^#{Regexp.escape(root)}/
- headers = cache_control && { 'Cache-Control' => cache_control }
+ headers = {}
+ headers.merge!('Cache-Control' => cache_control) if cache_control

This comment has been minimized.

@egilburg

egilburg Aug 11, 2014

Contributor

Do headers['Cache-Control'] = ... to avoid extra hash allocations.

@egilburg

egilburg Aug 11, 2014

Contributor

Do headers['Cache-Control'] = ... to avoid extra hash allocations.

@schneems

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Aug 11, 2014

Member

@matthewd i'm not sure. Setting Vary to Accept-Encoding is saying to our CDN that when we get Accept-Encoding=gzip or Accept-Encoding=foo treat them as different requests that can be cached differently. From the spec http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44

An HTTP/1.1 server SHOULD include a Vary header field with any cacheable response that is subject to server-driven negotiation. Doing so allows a cache to properly interpret future requests on that resource and informs the user agent about the presence of negotiation

So if we don't set it on the non-gzip response if a client makes a request with Accept-Encoding=foo that request will be cached and returned even when someone makes a request with Accept-Encoding=gzip and the result should return a gzip body.

Here's a blog post that, to me, makes Vary seem like it should be set on every asset request simply because the server treats a request differently if Accept-Encoding is set: http://blog.maxcdn.com/accept-encoding-its-vary-important/

It's possible that i'm not reading or interpreting that correctly. Any sane/easy ways you can think to test this out?

Member

schneems commented Aug 11, 2014

@matthewd i'm not sure. Setting Vary to Accept-Encoding is saying to our CDN that when we get Accept-Encoding=gzip or Accept-Encoding=foo treat them as different requests that can be cached differently. From the spec http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.44

An HTTP/1.1 server SHOULD include a Vary header field with any cacheable response that is subject to server-driven negotiation. Doing so allows a cache to properly interpret future requests on that resource and informs the user agent about the presence of negotiation

So if we don't set it on the non-gzip response if a client makes a request with Accept-Encoding=foo that request will be cached and returned even when someone makes a request with Accept-Encoding=gzip and the result should return a gzip body.

Here's a blog post that, to me, makes Vary seem like it should be set on every asset request simply because the server treats a request differently if Accept-Encoding is set: http://blog.maxcdn.com/accept-encoding-its-vary-important/

It's possible that i'm not reading or interpreting that correctly. Any sane/easy ways you can think to test this out?

+
+ def can_gzip?(env)
+ return false if env['HTTP_ACCEPT_ENCODING'] !~ /\bgzip\b/
+ return true if File.exist?(File.join(@root, "#{::Rack::Utils.unescape(env["PATH_INFO"])}.gz"))

This comment has been minimized.

@egilburg

egilburg Aug 11, 2014

Contributor

Can just do File.exist?(File.join(@root, "#{::Rack::Utils.unescape(env["PATH_INFO"])}.gz")) with no return false on next line either.

Also nice to extract gz_path to own method:

def gz_path
  File.join(@root, "#{::Rack::Utils.unescape(env["PATH_INFO"])}.gz")
end
@egilburg

egilburg Aug 11, 2014

Contributor

Can just do File.exist?(File.join(@root, "#{::Rack::Utils.unescape(env["PATH_INFO"])}.gz")) with no return false on next line either.

Also nice to extract gz_path to own method:

def gz_path
  File.join(@root, "#{::Rack::Utils.unescape(env["PATH_INFO"])}.gz")
end
+
+ def can_gzip?(env)
+ return false if env['HTTP_ACCEPT_ENCODING'] !~ /\bgzip\b/
+ File.exist?(File.join(@root, "#{::Rack::Utils.unescape(env["PATH_INFO"])}.gz"))

This comment has been minimized.

@sgrif

sgrif Aug 11, 2014

Member

Thoughts on turning this into a more expression focused method?

def can_gzip?(env)
  gzip_encoding_accepted?(env) && gzipped_file_exists?(env)
end

def gzip_encoding_accepted?(env)
  env['HTTP_ACCEPT_ENCODING'] =~ /\bgzip\b/
end

def gzipped_file_exists?(env)
  File.exist?(File.join(@root, "#{::Rack::Utils.unescape(env["PATH_INFO"])}.gz"))
end
@sgrif

sgrif Aug 11, 2014

Member

Thoughts on turning this into a more expression focused method?

def can_gzip?(env)
  gzip_encoding_accepted?(env) && gzipped_file_exists?(env)
end

def gzip_encoding_accepted?(env)
  env['HTTP_ACCEPT_ENCODING'] =~ /\bgzip\b/
end

def gzipped_file_exists?(env)
  File.exist?(File.join(@root, "#{::Rack::Utils.unescape(env["PATH_INFO"])}.gz"))
end

This comment has been minimized.

@schneems

schneems Aug 11, 2014

Member

I don't like having so many methods that take env as a parameter. If this was an object that already had those values, I would be for it. I did an object allocation to make sure this doesn't leak anything

require 'rack'

TEST_CNT = 10_000_000

def can_gzip?(env)
  gzip_encoding_accepted?(env) && gzipped_file_exists?(env)
end

def gzip_encoding_accepted?(env)
  env['HTTP_ACCEPT_ENCODING'] =~ /\bgzip\b/
end

def gzipped_file_exists?(env)
  File.exist?(File.join(@root, "#{::Rack::Utils.unescape(env["PATH_INFO"])}.gz"))
end

MY_HASH = {big: "hash", random: "values", maybe: "gets_coppied", i: "dontknow"}

MYPROC = Proc.new do
  can_gzip?(MY_HASH )
end

GC.start
GC.disable
start = ObjectSpace.count_objects
TEST_CNT.times { MYPROC.call }
finish = ObjectSpace.count_objects
GC.enable
finish.each do |k,v|
  puts k => (v - start[k]) / TEST_CNT.to_f
end

It doesn't. It is still 2 more method dispatches that have to be performed, but if that turned into a performance problem Ruby is in a lot worse shape than this PR.

The net of this is a +7 line change for what equates to only 2 lines of logic. I'm leaning towards not doing the method extraction.

@schneems

schneems Aug 11, 2014

Member

I don't like having so many methods that take env as a parameter. If this was an object that already had those values, I would be for it. I did an object allocation to make sure this doesn't leak anything

require 'rack'

TEST_CNT = 10_000_000

def can_gzip?(env)
  gzip_encoding_accepted?(env) && gzipped_file_exists?(env)
end

def gzip_encoding_accepted?(env)
  env['HTTP_ACCEPT_ENCODING'] =~ /\bgzip\b/
end

def gzipped_file_exists?(env)
  File.exist?(File.join(@root, "#{::Rack::Utils.unescape(env["PATH_INFO"])}.gz"))
end

MY_HASH = {big: "hash", random: "values", maybe: "gets_coppied", i: "dontknow"}

MYPROC = Proc.new do
  can_gzip?(MY_HASH )
end

GC.start
GC.disable
start = ObjectSpace.count_objects
TEST_CNT.times { MYPROC.call }
finish = ObjectSpace.count_objects
GC.enable
finish.each do |k,v|
  puts k => (v - start[k]) / TEST_CNT.to_f
end

It doesn't. It is still 2 more method dispatches that have to be performed, but if that turned into a performance problem Ruby is in a lot worse shape than this PR.

The net of this is a +7 line change for what equates to only 2 lines of logic. I'm leaning towards not doing the method extraction.

This comment has been minimized.

@sgrif

sgrif Aug 11, 2014

Member

Not sure lines of code is a good metric to be basing decisions on. Agreed on the data clump with env, and had the same conclusion about avoiding the object allocation. However, I think the additional expression of intent is a huge win. Dropping the return false if improves the style drastically in my opinion.

@sgrif

sgrif Aug 11, 2014

Member

Not sure lines of code is a good metric to be basing decisions on. Agreed on the data clump with env, and had the same conclusion about avoiding the object allocation. However, I think the additional expression of intent is a huge win. Dropping the return false if improves the style drastically in my opinion.

@matthewd

This comment has been minimized.

Show comment
Hide comment
@matthewd

matthewd Aug 11, 2014

Member

Note I said "if a .gz file exists", not "if we're returning a .gz file". A given response is only subject to negotiation if we had a compressed alternative; for an image request, no matter what Accept-Encoding is offered, the CDN needn't ask us again.

Member

matthewd commented Aug 11, 2014

Note I said "if a .gz file exists", not "if we're returning a .gz file". A given response is only subject to negotiation if we had a compressed alternative; for an image request, no matter what Accept-Encoding is offered, the CDN needn't ask us again.

@schneems

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Aug 11, 2014

Member

@matthewd ahh, thanks I didn't read close enough to your comment. I understand better now, thanks. You're correct. I'm modifying to always check presence of a gzip file and only set the header if it's available. I'm not sure if the disk hit on non-gzipped files is a net win for doing one less cache miss. It is the technically correct thing to do here, i'm pushed the change in behavior and broke out some logic into methods per @sgrif's comment

Member

schneems commented Aug 11, 2014

@matthewd ahh, thanks I didn't read close enough to your comment. I understand better now, thanks. You're correct. I'm modifying to always check presence of a gzip file and only set the header if it's available. I'm not sure if the disk hit on non-gzipped files is a net win for doing one less cache miss. It is the technically correct thing to do here, i'm pushed the change in behavior and broke out some logic into methods per @sgrif's comment

+ @root = root.chomp('/')
+ @compiled_root = /^#{Regexp.escape(root)}/
+ headers = {}
+ headers['Cache-Control'] = cache_control if cache_control

This comment has been minimized.

@sgrif

sgrif Aug 11, 2014

Member

Yo dawg, I herd you like controlling caches.

@sgrif

sgrif Aug 11, 2014

Member

Yo dawg, I herd you like controlling caches.

+ status, headers, body = @file_server.call(env)
+ end
+
+ headers['Vary'] = 'Accept-Encoding' if gzip_file_exists

This comment has been minimized.

@sgrif

sgrif Aug 11, 2014

Member

It looks like we're missing a test for this piece?

@sgrif

sgrif Aug 11, 2014

Member

It looks like we're missing a test for this piece?

This comment has been minimized.

@schneems

schneems Aug 12, 2014

Member

I added a refutation for the header in assert_html helper.

@schneems

schneems Aug 12, 2014

Member

I added a refutation for the header in assert_html helper.

actionpack/CHANGELOG.md
@@ -1,3 +1,9 @@
+* Requests that hit `ActionDispatch::Static` can now take advantage
+ of gizpped assets on disk. By default a gzip asset will be served if

This comment has been minimized.

@carlosantoniodasilva

carlosantoniodasilva Aug 12, 2014

Member

typo: gzipped

actionpack/CHANGELOG.md
@@ -1,3 +1,9 @@
+* Requests that hit `ActionDispatch::Static` can now take advantage
+ of gizpped assets on disk. By default a gzip asset will be served if
+ it the client supports gzip and a compressed file is on disk.

This comment has been minimized.

@carlosantoniodasilva

carlosantoniodasilva Aug 12, 2014

Member

if it the client?

@carlosantoniodasilva

carlosantoniodasilva Aug 12, 2014

Member

if it the client?

- @compiled_root = /^#{Regexp.escape(root)}/
- headers = cache_control && { 'Cache-Control' => cache_control }
+ @root = root.chomp('/')
+ @compiled_root = /^#{Regexp.escape(root)}/

This comment has been minimized.

@carlosantoniodasilva

carlosantoniodasilva Aug 12, 2014

Member

It seems there's no need to reindent those?

@carlosantoniodasilva

carlosantoniodasilva Aug 12, 2014

Member

It seems there's no need to reindent those?

Enable gzip compression by default
If someone is using ActionDispatch::Static to serve assets and makes it past the `match?` then the file exists on disk and it will be served. This PR adds in logic that checks to see if the file being served is already compressed (via gzip) and on disk, if it is it will be served as long as the client can handle gzip encoding. If not, then a non gzip file will be served.

This additional logic slows down an individual asset request but should speed up the consumer experience as compressed files are served and production applications should be delivered with a CDN. This PR allows a CDN to cache a gzip file by setting the `Vary` header appropriately. In net this should speed up a production application that are using Rails as an origin for a CDN. Non-asset request speed is not affected in this PR.

guilleiguaran added a commit that referenced this pull request Aug 21, 2014

@guilleiguaran guilleiguaran merged commit 90006ac into rails:master Aug 21, 2014

1 check failed

continuous-integration/travis-ci The Travis CI build failed
Details
@jeremy

This comment has been minimized.

Show comment
Hide comment
@jeremy

jeremy Aug 21, 2014

Member

Checking gzip_encoding_accepted? first will save a file stat when it isn't accepted.

Checking gzip_encoding_accepted? first will save a file stat when it isn't accepted.

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Aug 21, 2014

Member

We need this later to set the Vary header regardless. If we're serving via a CDN, we need to let the CDN know that if another client makes the same request with a different Accept-Encoding header they will get a different response. We set the header if the gzip file exists regardless of whether we're serving it or not.

Member

schneems replied Aug 21, 2014

We need this later to set the Vary header regardless. If we're serving via a CDN, we need to let the CDN know that if another client makes the same request with a different Accept-Encoding header they will get a different response. We set the header if the gzip file exists regardless of whether we're serving it or not.

@jeremy

This comment has been minimized.

Show comment
Hide comment
@jeremy

jeremy Aug 21, 2014

Member

Seems gzip_file_exists?(path) could be gzipped_file_path(path) and return the .gz path. Then we aren't assuming this is the path it looked for.

Seems gzip_file_exists?(path) could be gzipped_file_path(path) and return the .gz path. Then we aren't assuming this is the path it looked for.

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Aug 21, 2014

Member

seems good

Member

schneems replied Aug 21, 2014

seems good

@jeremy

This comment has been minimized.

Show comment
Hide comment
@jeremy

jeremy Aug 21, 2014

Member

This is leaky. It sets the header even if there's no mime type for the path ext, and it presumes the default mime type is text/plain. These implementation and assumptions are based on the current version of Rack::File behavior.

If Rack::File got smarter and sniffed the file type, or changed the default content type to application/octet-stream (assuming text = ??), then this mirror implementation of it will cause regressions.

This is leaky. It sets the header even if there's no mime type for the path ext, and it presumes the default mime type is text/plain. These implementation and assumptions are based on the current version of Rack::File behavior.

If Rack::File got smarter and sniffed the file type, or changed the default content type to application/octet-stream (assuming text = ??), then this mirror implementation of it will cause regressions.

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Aug 21, 2014

Member

Agree that this isn't the best. Rack::File doesn't expose their "default content type" unfortunately. I agree that adding a test is good practice, i'm verifying content type with an unknown file extension generates the same content type when serving gzip file and when not. We could also make a PR to Rack to expose that value in a constant.

Member

schneems replied Aug 21, 2014

Agree that this isn't the best. Rack::File doesn't expose their "default content type" unfortunately. I agree that adding a test is good practice, i'm verifying content type with an unknown file extension generates the same content type when serving gzip file and when not. We could also make a PR to Rack to expose that value in a constant.

@jeremy

This comment has been minimized.

Show comment
Hide comment
@jeremy

jeremy Aug 21, 2014

Member

Note that PATH_INFO is mutated in env earlier, so middleware that wrap this will see the internal file path instead of the logical URI path. Should probably set it back before returning.

Note that PATH_INFO is mutated in env earlier, so middleware that wrap this will see the internal file path instead of the logical URI path. Should probably set it back before returning.

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Aug 21, 2014

Member

going with a set and then reset: https://gist.github.com/schneems/96c16647f43575604627 it's the fastest option

Member

schneems replied Aug 21, 2014

going with a set and then reset: https://gist.github.com/schneems/96c16647f43575604627 it's the fastest option

@jeremy

This comment has been minimized.

Show comment
Hide comment
@jeremy

jeremy Aug 21, 2014

Member

Introduces an unneeded hash allocation. Rack::File copes with nil headers.

Introduces an unneeded hash allocation. Rack::File copes with nil headers.

This comment has been minimized.

Show comment
Hide comment
@zzak

zzak Aug 21, 2014

Member

You mean ::Rack::File.new(@root, 'Cache-Control' => cache_control)?

Member

zzak replied Aug 21, 2014

You mean ::Rack::File.new(@root, 'Cache-Control' => cache_control)?

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Aug 21, 2014

Member

👍 I need to introduce the ability to set more headers than just cache_control and this would have made the next PR merge cleaner (i think) but no point in getting ahead of myself, will revert this to prior behavior.

Member

schneems replied Aug 21, 2014

👍 I need to introduce the ability to set more headers than just cache_control and this would have made the next PR merge cleaner (i think) but no point in getting ahead of myself, will revert this to prior behavior.

@jeremy

This comment has been minimized.

Show comment
Hide comment
@jeremy

jeremy Aug 21, 2014

Member

Content coding tokens are case-insensitive.

Content coding tokens are case-insensitive.

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Aug 21, 2014

Member

👍 fixed and tested

Member

schneems replied Aug 21, 2014

👍 fixed and tested

@jeremy

This comment has been minimized.

Show comment
Hide comment
@jeremy

jeremy Aug 21, 2014

Member

s/refute/assert_not/g ❤️

s/refute/assert_not/g ❤️

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Aug 21, 2014

Member

can't say i'm in love with this, but happy to follow style guides for the project. Didn't realize https://github.com/rails/rails/blob/master/activesupport/lib/active_support/test_case.rb#L32-L45 was a thing. Updating my tests

Member

schneems replied Aug 21, 2014

can't say i'm in love with this, but happy to follow style guides for the project. Didn't realize https://github.com/rails/rails/blob/master/activesupport/lib/active_support/test_case.rb#L32-L45 was a thing. Updating my tests

@jeremy

This comment has been minimized.

Show comment
Hide comment
@jeremy

jeremy Aug 21, 2014

Member

Need to test that

  • env['PATH_INFO'] isn't permanently mutated
  • fallback mime type matches Rack::File's fallback
  • Accept-Encoding tokens are case insensitive

Need to test that

  • env['PATH_INFO'] isn't permanently mutated
  • fallback mime type matches Rack::File's fallback
  • Accept-Encoding tokens are case insensitive
@jeremy

This comment has been minimized.

Show comment
Hide comment
@jeremy

jeremy Aug 21, 2014

Member

Assertion style

Assertion style

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Aug 21, 2014

Member

👍

Member

schneems replied Aug 21, 2014

👍

@schneems

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Aug 21, 2014

Member

Thanks for the review ❤️ here's your changes in a new PR #16616

Member

schneems commented on cfaaacd Aug 21, 2014

Thanks for the review ❤️ here's your changes in a new PR #16616

This comment has been minimized.

Show comment
Hide comment
@guyisra

guyisra Mar 31, 2015

wouldn't headers['Vary'] = 'Accept' be better for all responses for bugs like rails/jquery-ujs#318 ?

wouldn't headers['Vary'] = 'Accept' be better for all responses for bugs like rails/jquery-ujs#318 ?

@schneems

This comment has been minimized.

Show comment
Hide comment
@schneems

schneems Aug 21, 2014

Member

Thanks for the review ❤️ here's your changes in a new PR #16616

Member

schneems commented on cfaaacd Aug 21, 2014

Thanks for the review ❤️ here's your changes in a new PR #16616

This comment has been minimized.

Show comment
Hide comment
@guyisra

guyisra Mar 31, 2015

wouldn't headers['Vary'] = 'Accept' be better for all responses for bugs like rails/jquery-ujs#318 ?

wouldn't headers['Vary'] = 'Accept' be better for all responses for bugs like rails/jquery-ujs#318 ?

nateberkopec added a commit to nateberkopec/todomvc-turbolinks that referenced this pull request May 22, 2015

Revert "Rack-zippy"
Turns out this gem does nothing post Rails 4.2

rails/rails#16466

This reverts commit 31f3f85.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment