Skip to content
Permalink
Browse files

Address comments on Gzip implementation

- don't mutate PATH_INFO in env, test
- test fallback content type matches Rack::File
- change assertion style
- make HTTP_ACCEPT_ENCODING comparison case insensitive
- return gzip path from method instead of true/false so we don't have to assume later
- don't allocate un-needed hash.

Original comments:

https://github.com/rails/rails/commit/
cfaaacd#commitcomment-7468728

cc @jeremy
  • Loading branch information...
schneems committed Aug 21, 2014
1 parent 33c0536 commit 8e31fa3b72892f7421b85b5a79d71a2d726ccddd
@@ -16,8 +16,7 @@ class FileHandler
def initialize(root, cache_control)
@root = root.chomp('/')
@compiled_root = /^#{Regexp.escape(root)}/
headers = {}
headers['Cache-Control'] = cache_control if cache_control
headers = cache_control && { 'Cache-Control' => cache_control }
@file_server = ::Rack::File.new(@root, headers)
end

@@ -37,19 +36,23 @@ def match?(path)
end

def call(env)
path = env['PATH_INFO']
gzip_file_exists = gzip_file_exists?(path)
if gzip_file_exists && gzip_encoding_accepted?(env)
env['PATH_INFO'] = "#{path}.gz"
path = env['PATH_INFO']
gzip_path = gzip_file_path(path)

if gzip_path && gzip_encoding_accepted?(env)
env['PATH_INFO'] = gzip_path
status, headers, body = @file_server.call(env)
headers['Content-Encoding'] = 'gzip'
headers['Content-Type'] = content_type(path)
else
status, headers, body = @file_server.call(env)
end

headers['Vary'] = 'Accept-Encoding' if gzip_file_exists
headers['Vary'] = 'Accept-Encoding' if gzip_path

return [status, headers, body]
ensure
env['PATH_INFO'] = path
end

private
@@ -73,11 +76,17 @@ def content_type(path)
end

def gzip_encoding_accepted?(env)
env['HTTP_ACCEPT_ENCODING'] =~ /\bgzip\b/
env['HTTP_ACCEPT_ENCODING'] =~ /\bgzip\b/i
end

def gzip_file_exists?(path)
File.exist?(File.join(@root, "#{::Rack::Utils.unescape(path)}.gz"))
def gzip_file_path(path)
can_gzip_mime = content_type(path) =~ /\A(?:text\/|application\/javascript)/

This comment has been minimized.

Copy link
@swrobel

swrobel Aug 28, 2017

Contributor

@schneems I'm interested in submitting a PR because this prevents a number of filetypes that are gzipped by sprockets from being served in their gzipped form by Rails, notably svg, json and (possibly?) fonts. I don't see why we can't just assume if a gzipped version of the file exists in the same path, that we should serve it ... why the need for the regex?

This comment has been minimized.

Copy link
@matthewd

matthewd Aug 29, 2017

Member

IIRC it's about avoiding an extra file stat for every image

This comment has been minimized.

Copy link
@swrobel

swrobel Aug 29, 2017

Contributor

@matthewd then maybe a blacklist would be acceptable instead of a whitelist?

This comment has been minimized.

Copy link
@schneems

schneems Aug 29, 2017

Author Member

This logic was put in place to limit the file stats on disk.

There are a seemingly infinite number of content types, the majority of them cannot be gzipped. Switching over to a blacklist would be bad.

I'm open to including really common extensions here, SVG seems like an obvious one. Are we also not matching for CSS here? We should add that as well.

An alternative could be to check for a gz file on first request to ALL content types, then cache which content types have gzipped files and which don't.

Pros:

  • ALL content types with GZ files now get served.

Cons:

  • First request on known GZ data types is slower
  • Slight memory growth over time as a hash of content types is stored in memory. This would be bounded to the number of content types on disk.
  • Would have to store data in a thread safe data structure

WDYT @matthewd ?

This comment has been minimized.

Copy link
@matthewd

matthewd Aug 30, 2017

Member

Are we also not matching for CSS here

CSS, like most text-based (and thus notably compressible) formats, lives under text/. Correspondingly, it seems reasonable to me for us to assume that a given unknown image/ or video/ file format is not textual, but rather some more dense binary format (which, if it contained regular/compressible data, would handle that compression internally).

Sprockets must have a similar conditional to decide whether it's going to compress files.. synchronizing the two does seem like a sensible and uncontroversial first step.


@schneems my concern with that idea would be non-asset files (or more generally, non-Sprockets-managed files) -- we can't necessarily trust that all files of a given type will either have, or not have, a corresponding gzipped variant. That could lead to false negatives, where the first file we try doesn't have one, and now this server process never looks for .js.gz again.

I do wonder if we should revisit the idea of going all the way, and caching the full list of files inside @root at boot -- then we'd save a file stat on every request.

This comment has been minimized.

Copy link
@swrobel

swrobel Sep 1, 2017

Contributor

The logic in Sprockets is a bit different ... it'll generate a gzipped version for any file with a valid charset (should cover text/ as well as javascript, json, etc), and then manually specifies mime types for 'binary' formats to gzip.

This comment has been minimized.

Copy link
@matthewd

matthewd Sep 1, 2017

Member

Ah okay, so we can't really match that directly. We could extend our whitelist using the same source document though.

This comment has been minimized.

Copy link
@swrobel

swrobel Sep 1, 2017

Contributor

Ok, so my proposal is to change this from a regex to all text/ plus an array of mime types that can be overridden via a config setting like config.assets.gzip_mime_types. Agreeable?

This comment has been minimized.

Copy link
@schneems

schneems Sep 7, 2017

Author Member

If we're really picky, I would prefer a hash of mime types so that lookup time is constant.

This comment has been minimized.

Copy link
@matthewd

matthewd Sep 7, 2017

Member

The config should be an array, but yeah, we can transform that into a hash (or just a set) in the initializer.

Alternatively, we keep it as a regex (again, internally -- still configured as an array): .match?(/\Atext\/|\A#{Regexp.union configured_array}\z/) (but with the regex actually built in initialize and stored in an ivar)

I suspect a hash lookup + starts_with?("text/") is going to be fastest, though. But if we're rearranging, we might as well benchmark the options and make an informed decision.

gzip_path = "#{path}.gz"
if can_gzip_mime && File.exist?(File.join(@root, ::Rack::Utils.unescape(gzip_path)))
gzip_path
else
false
end
end
end

@@ -115,8 +115,30 @@ def test_serves_gzip_files_when_header_set
assert_equal 'Accept-Encoding', response.headers["Vary"]
assert_equal 'gzip', response.headers["Content-Encoding"]

response = get(file_name, 'HTTP_ACCEPT_ENCODING' => 'Gzip')
assert_gzip file_name, response

response = get(file_name, 'HTTP_ACCEPT_ENCODING' => 'GZIP')
assert_gzip file_name, response

response = get(file_name, 'HTTP_ACCEPT_ENCODING' => '')
refute_equal 'gzip', response.headers["Content-Encoding"]
assert_not_equal 'gzip', response.headers["Content-Encoding"]
end

def test_does_not_modify_path_info
file_name = "/gzip/application-a71b3024f80aea3181c09774ca17e712.js"
env = {'PATH_INFO' => file_name, 'HTTP_ACCEPT_ENCODING' => 'gzip'}
@app.call(env)
assert_equal file_name, env['PATH_INFO']
end

def test_serves_gzip_with_propper_content_type_fallback
file_name = "/gzip/foo.zoo"
response = get(file_name, 'HTTP_ACCEPT_ENCODING' => 'gzip')
assert_gzip file_name, response

default_response = get(file_name) # no gzip
assert_equal default_response.headers['Content-Type'], response.headers['Content-Type']
end

# Windows doesn't allow \ / : * ? " < > | in filenames
@@ -147,7 +169,7 @@ def assert_gzip(file_name, response)
def assert_html(body, response)
assert_equal body, response.body
assert_equal "text/html", response.headers["Content-Type"]
refute response.headers.key?("Vary")
assert_nil response.headers["Vary"]
end

def get(path, headers = {})

Large diffs are not rendered by default.

Oops, something went wrong.
Binary file not shown.

Large diffs are not rendered by default.

Oops, something went wrong.
Binary file not shown.
@@ -268,6 +268,13 @@ Please refer to the [Changelog][action-pack] for detailed changes.
* Added an option to disable logging of CSRF failures.
([Pull Request](https://github.com/rails/rails/pull/14280))

* When the Rails server is set to serve static assets, gzip assets will now be
served if the client supports it and a pre-generated gzip file (.gz) is on disk.
By default the asset pipeline generates `.gz` files for all compressible assets.
Serving gzip files minimizes data transfer and speeds up asset requests. Always
[use a CDN](http://guides.rubyonrails.org/asset_pipeline.html#cdns) if you are
serving assets from your Rails server in production.
([Pull Request](https://github.com/rails/rails/pull/16466))

Action View
-------------

0 comments on commit 8e31fa3

Please sign in to comment.
You can’t perform that action at this time.