New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address comments on Gzip implementation #16616

Merged
merged 1 commit into from Aug 24, 2014

Conversation

Projects
None yet
7 participants
@schneems
Copy link
Member

schneems commented Aug 21, 2014

  • don't mutate PATH_INFO in env, test
  • test fallback content type matches Rack::File
  • change assertion style
  • make HTTP_ACCEPT_ENCODING comparison case insensitive
  • return gzip path from method instead of true/false so we don't have to assume later
  • don't allocate un-needed hash.

Original comments:

cfaaacd

cc @jeremy

schneems referenced this pull request Aug 21, 2014

Enable gzip compression by default
If someone is using ActionDispatch::Static to serve assets and makes it past the `match?` then the file exists on disk and it will be served. This PR adds in logic that checks to see if the file being served is already compressed (via gzip) and on disk, if it is it will be served as long as the client can handle gzip encoding. If not, then a non gzip file will be served.

This additional logic slows down an individual asset request but should speed up the consumer experience as compressed files are served and production applications should be delivered with a CDN. This PR allows a CDN to cache a gzip file by setting the `Vary` header appropriately. In net this should speed up a production application that are using Rails as an origin for a CDN. Non-asset request speed is not affected in this PR.
@matthewd

View changes

actionpack/lib/action_dispatch/middleware/static.rb Outdated
status, headers, body = @file_server.call(env)
env['PATH_INFO'] = path

This comment has been minimized.

@matthewd

matthewd Aug 21, 2014

Member

ensure?

This comment has been minimized.

@schneems

schneems Aug 21, 2014

Member

(en)sure, why not. done.

@matthewd

View changes

actionpack/lib/action_dispatch/middleware/static.rb Outdated
if File.exist?(File.join(@root, ::Rack::Utils.unescape(gzip_path)))
gzip_path
else
false

This comment has been minimized.

@matthewd

matthewd Aug 21, 2014

Member

I'd personally favour value-or-nil over value-or-false

This comment has been minimized.

@schneems

schneems Aug 21, 2014

Member

I have an aversion to returning nil where I can avoid it. I generally take nil to mean IDK where as here i'm aware that the value returned should be falsey and false is more explicit than nil. If we start returning nil then we get into a habit of being able to call predicates like nil? which assume we know the output class. Here we only want to emit a falsey value.

@schneems schneems force-pushed the schneems:schneems/jeremy-comments branch Aug 21, 2014

@@ -48,8 +48,11 @@ def call(env)
status, headers, body = @file_server.call(env)
end

headers['Vary'] = 'Accept-Encoding' if gzip_file_exists
headers['Vary'] = 'Accept-Encoding' if gzip_path

This comment has been minimized.

@jeremy

jeremy Aug 22, 2014

Member

Makes sense that we need the file stat so we can tell an upstream cache that we vary on content encoding.. but.. the extra file stat per request is pretty rough for folks who don't have .gz assets available.

Couple possibilities

  • allow turning off .gz-triggered content encoding awareness
  • deduce the content type from path extname first, then limit .gz checks to text/*

(For most apps, most static file hits are going to be images that are already compressed)

This comment has been minimized.

@schneems

schneems Aug 22, 2014

Member

I ran some benchmarks. For an asset that does not have gzip if we never stat it the second time it looks something like this:

Calculating -------------------------------------
           Gzip; Now       490 i/100ms
  Gzip; No gzip Stat       515 i/100ms
-------------------------------------------------
           Gzip; Now     5076.2 (±6.1%) i/s -      25480 in   5.040338s
  Gzip; No gzip Stat     5352.6 (±6.7%) i/s -      26780 in   5.027866s

Which is 100 * (5352.6 - 5076.2) / 5076.2 #=> 5.445017926795646 or roughly 5% faster. by not doing the second stat. This "tax" will only happen on requests to a valid asset (verified earlier in match?) It will also only occur one time for each asset if you're properly using a CDN, or never if you're using NGINX. <opinion>If you don't care enough to put a CDN in front of your assets, you probably don't care about 5% on a 200ms asset request </opinion>.

That being said, we could cache this stat in memory. There are threading issues, and if you're doing something silly like dynamically storing images and serving them from your disk then we've created the ability to create a very large object. I'm not a big fan of this, now we're kindof acting like a poor-person's CDN.

Turning on/off the ability, we already have the ability to turn off static asset serving all together which would turn this feature off 😁 Personally i'm already suffering from config.whatever overload.

Deducing content type is a really clever idea, I hadn't thought of that before. There may be some edge cases, perhaps someone is using a svg.gz instead of a svgz, but the wins likely outway the costs of any files that fall through the cracks. If you're not comfortable with the 5% tax on non-gzip assets, I think this is the way to go.

This comment has been minimized.

@jeremy

jeremy Aug 22, 2014

Member

Consider cost of the stat on NFS vs local disk ...

On Friday, August 22, 2014, Richard Schneeman notifications@github.com
wrote:

In actionpack/lib/action_dispatch/middleware/static.rb:

@@ -48,8 +48,11 @@ def call(env)
status, headers, body = @file_server.call(env)
end

  •  headers['Vary'] = 'Accept-Encoding' if gzip_file_exists
    
  •  headers['Vary'] = 'Accept-Encoding' if gzip_path
    

I ran some benchmarks. For an asset that does not have gzip if we never
stat it the second time it looks something like this:

Calculating -------------------------------------
Gzip; Now 490 i/100ms

Gzip; No gzip Stat 515 i/100ms

       Gzip; Now     5076.2 (±6.1%) i/s -      25480 in   5.040338s

Gzip; No gzip Stat 5352.6 (±6.7%) i/s - 26780 in 5.027866s

Which is 100 * (5352.6 - 5076.2) / 5076.2 #=> 5.445017926795646 or
roughly 5% faster. by not doing the second stat. This "tax" will only
happen on requests to a valid asset (verified earlier in match?) It will
also only occur one time for each asset if you're properly using a CDN, or
never if you're using NGINX. If you don't care enough to put a
CDN in front of your assets, you probably don't care about 5% on a 200ms
asset request .

That being said, we could cache this stat in memory. There are threading
issues, and if you're doing something silly like dynamically storing images
and serving them from your disk then we've created the ability to create a
very large object. I'm not a big fan of this, now we're kindof acting like
a poor-person's CDN.

Turning on/off the ability, we already have the ability to turn off static
asset serving all together which would turn this feature off [image:
😁] Personally i'm already suffering from config.whatever overload.

Deducing content type is a really clever idea, I hadn't thought of that
before. There may be some edge cases, perhaps someone is using a svg.gz
instead of a svgz, but the wins likely outway the costs of any files that
fall through the cracks. If you're not comfortable with the 5% tax on
non-gzip assets, I think this is the way to go.


Reply to this email directly or view it on GitHub
https://github.com/rails/rails/pull/16616/files#r16609723.

This comment has been minimized.

@schneems

schneems Aug 22, 2014

Member

I re-ran on heroku (no SSD)

irb(main):328:0> 100* (7993.8- 7050.2)/7050.2
=> 13.384017474681574

Double digits is a bit more than a "tax". I'll re-work to check file type and ping again

This comment has been minimized.

@schneems

schneems Aug 22, 2014

Member

I added this logic to gzip_file_path so it doesn't stat when mime is wrong. Javascript doesn't return a text/* mime type, so had to special case it.

This comment has been minimized.

@jeremy

jeremy Aug 23, 2014

Member

Nice ( ͡° ͜ʖ ͡°)

@jeremy

This comment has been minimized.

Copy link
Member

jeremy commented Aug 22, 2014

Much appreciated @schneems !

@schneems schneems force-pushed the schneems:schneems/jeremy-comments branch Aug 22, 2014

@jeremy

View changes

actionpack/lib/action_dispatch/middleware/static.rb Outdated
def gzip_file_exists?(path)
File.exist?(File.join(@root, "#{::Rack::Utils.unescape(path)}.gz"))
def gzip_file_path(path)
can_gzip_mime = content_type(path) =~ /(^text\/|^application\/javascript)/

This comment has been minimized.

@jeremy

jeremy Aug 23, 2014

Member

Tiny thing, can move the anchor out and avoid the unused capture: /\A(?:text\/|application\/javascript)/

This comment has been minimized.

@schneems

schneems Aug 23, 2014

Member

Updated.

@chancancode

This comment has been minimized.

Copy link
Member

chancancode commented Aug 23, 2014

@schneems can you add this to the release notes while you're at it? 😄 (if you think users need to take any actions about this feel free to add to upgrade guide also)

@schneems schneems force-pushed the schneems:schneems/jeremy-comments branch 2 times, most recently Aug 23, 2014

@schneems

This comment has been minimized.

Copy link
Member

schneems commented Aug 23, 2014

@chancancode added a release notes entry for the serving gzip. In general people don't need to "do" anything. The only thing I can think of is to remove any custom middleware that they're using that might do the same thing as it's no longer needed, think that's a valid thing to put in the upgrade guide or too much static?

@@ -268,6 +268,11 @@ Please refer to the [Changelog][action-pack] for detailed changes.
* Added an option to disable logging of CSRF failures.
([Pull Request](https://github.com/rails/rails/pull/14280))

* When the Rails server is set to serve static assets, gzip assets will now be
served if the client supports it and a pre-generated gzip file (.gz) is on disk.

This comment has been minimized.

@jeremy

jeremy Aug 23, 2014

Member

Leads to the question "how do I get pregenerated .gz files on disk?" => Sprockets generates them by default.

This comment has been minimized.

@schneems

schneems Aug 24, 2014

Member

I added this information

This comment has been minimized.

@chrisnicola

chrisnicola Jul 31, 2015

Contributor

Is Sprockets generating them by default again? I don't see that in any of the commits https://github.com/rails/sprockets. Also the discussion left with the suggestion that compression would be opt-in not the default. rails/sprockets#26 (comment)

@jeremy

View changes

guides/source/4_2_release_notes.md Outdated
* When the Rails server is set to serve static assets, gzip assets will now be
served if the client supports it and a pre-generated gzip file (.gz) is on disk.
Serving gzip files minimizes data transfer and speeds up asset requests. Always
use a CDN if you are serving assets from your Rails server in production.

This comment has been minimized.

@jeremy

jeremy Aug 23, 2014

Member

"What's a CDN? How do I use one?"

This comment has been minimized.

@schneems

schneems Aug 23, 2014

Member

We should have a guide on this. At bare minimum a section in the assets guide that mentions how to configure a CDN and what they do.

This comment has been minimized.

@chancancode

chancancode Aug 24, 2014

Member

+1 to moving to the assets guide.

This seems too detailed for the a release note entry. As noted in #16576, the release note entries should be short, to the point and scannable. We should aim to make it easy for people to answer the question "does this change affect me" and point them to elsewhere for more information if it does. Usually this is all contained in the pull request/commit (hence the link), but in this case we could add a link to the new section in the assets guide, something like this

@robin850 robin850 added this to the 4.2.0 milestone Aug 24, 2014

@robin850 robin850 added the actionpack label Aug 24, 2014

Address comments on Gzip implementation
- don't mutate PATH_INFO in env, test
- test fallback content type matches Rack::File
- change assertion style
- make HTTP_ACCEPT_ENCODING comparison case insensitive
- return gzip path from method instead of true/false so we don't have to assume later
- don't allocate un-needed hash.

Original comments:

https://github.com/rails/rails/commit/
cfaaacd#commitcomment-7468728

cc @jeremy

@schneems schneems force-pushed the schneems:schneems/jeremy-comments branch to 8e31fa3 Aug 24, 2014

@schneems

This comment has been minimized.

Copy link
Member

schneems commented Aug 24, 2014

Added a link to the already existing CDN docs for the asset pipeline in the release notes. I also just spammed docrails with a bunch more information on configuring a CDN with Rails, expected behavior, and some common gotchas: https://github.com/rails/docrails/commits/master?author=schneems

jeremy added a commit that referenced this pull request Aug 24, 2014

Merge pull request #16616 from schneems/schneems/jeremy-comments
Address comments on Gzip implementation

@jeremy jeremy merged commit ae66fda into rails:master Aug 24, 2014

1 check was pending

continuous-integration/travis-ci The Travis CI build is in progress
Details
@chrisnicola

This comment has been minimized.

Copy link
Contributor

chrisnicola commented Jul 31, 2015

@schneems will this eliminate the need for middlewares like:

https://github.com/mattolson/heroku_rails_deflate
or
https://github.com/romanbsd/heroku-deflater

Both of those also enable Rack::Deflater to compress application requests as well as assets, assuming they are not already compressed.

I'm guessing with these changes, and assuming pre-compiled compressed static assets are available, one would put Rack::Deflater after ActionDispatch::Static rather than before it and there should be no need for either of the Heroku specific middlewares. Am I correct?

@chancancode

This comment has been minimized.

Copy link
Member

chancancode commented Jul 31, 2015

@schneems sorry I missed your response, a brief note in the upgrade guide seems good!

@schneems

This comment has been minimized.

Copy link
Member

schneems commented Jul 31, 2015

@schneems will this eliminate the need for middlewares like

It's better to serve an already compressed image/asset rather than having to read the whole thing in from disk and then compress on the fly. I would honestly benchmark the performance impact, my gut instinct would be to say to put them after Static, but I don't know if it would save a meaningful amount of time per request without trying it out. Maybe those gems know if a request is already compressed?

a brief note in the upgrade guide seems good!

If generating gzipped files is removed from sprockets (which it currently is), then this change doesn't actually do anything other than add an extra file system call to every asset request 😦 If we don't get that behavior turned back on in sprockets we should consider removing this behavior. Alternatively if sprockets enables it based on a flag we could use the same flag in AD::Static to enable/disable the check.

@chrisnicola

This comment has been minimized.

Copy link
Contributor

chrisnicola commented Jul 31, 2015

@schneems yes, I believe that's exactly what those two middlewares do. They compress on the fly using Rack::Deflater only if they don't find a static asset first. If they do they just serve the compressed asset.

I'd suggest that they are not really necessary assuming Sprockets can compress assets and ActionStatic can serve them up by default. So I think it would make sense now to just recommend just putting Rack::Deflater after ActionDispatch::Static generally, unless you really need to compress assets on the fly.

@swrobel

This comment has been minimized.

Copy link
Contributor

swrobel commented on actionpack/lib/action_dispatch/middleware/static.rb in 8e31fa3 Aug 28, 2017

@schneems I'm interested in submitting a PR because this prevents a number of filetypes that are gzipped by sprockets from being served in their gzipped form by Rails, notably svg, json and (possibly?) fonts. I don't see why we can't just assume if a gzipped version of the file exists in the same path, that we should serve it ... why the need for the regex?

This comment has been minimized.

Copy link
Member

matthewd replied Aug 29, 2017

IIRC it's about avoiding an extra file stat for every image

This comment has been minimized.

Copy link
Contributor

swrobel replied Aug 29, 2017

@matthewd then maybe a blacklist would be acceptable instead of a whitelist?

This comment has been minimized.

Copy link
Member

schneems replied Aug 29, 2017

This logic was put in place to limit the file stats on disk.

There are a seemingly infinite number of content types, the majority of them cannot be gzipped. Switching over to a blacklist would be bad.

I'm open to including really common extensions here, SVG seems like an obvious one. Are we also not matching for CSS here? We should add that as well.

An alternative could be to check for a gz file on first request to ALL content types, then cache which content types have gzipped files and which don't.

Pros:

  • ALL content types with GZ files now get served.

Cons:

  • First request on known GZ data types is slower
  • Slight memory growth over time as a hash of content types is stored in memory. This would be bounded to the number of content types on disk.
  • Would have to store data in a thread safe data structure

WDYT @matthewd ?

This comment has been minimized.

Copy link
Member

matthewd replied Aug 30, 2017

Are we also not matching for CSS here

CSS, like most text-based (and thus notably compressible) formats, lives under text/. Correspondingly, it seems reasonable to me for us to assume that a given unknown image/ or video/ file format is not textual, but rather some more dense binary format (which, if it contained regular/compressible data, would handle that compression internally).

Sprockets must have a similar conditional to decide whether it's going to compress files.. synchronizing the two does seem like a sensible and uncontroversial first step.


@schneems my concern with that idea would be non-asset files (or more generally, non-Sprockets-managed files) -- we can't necessarily trust that all files of a given type will either have, or not have, a corresponding gzipped variant. That could lead to false negatives, where the first file we try doesn't have one, and now this server process never looks for .js.gz again.

I do wonder if we should revisit the idea of going all the way, and caching the full list of files inside @root at boot -- then we'd save a file stat on every request.

This comment has been minimized.

Copy link
Contributor

swrobel replied Sep 1, 2017

The logic in Sprockets is a bit different ... it'll generate a gzipped version for any file with a valid charset (should cover text/ as well as javascript, json, etc), and then manually specifies mime types for 'binary' formats to gzip.

This comment has been minimized.

Copy link
Member

matthewd replied Sep 1, 2017

Ah okay, so we can't really match that directly. We could extend our whitelist using the same source document though.

This comment has been minimized.

Copy link
Contributor

swrobel replied Sep 1, 2017

Ok, so my proposal is to change this from a regex to all text/ plus an array of mime types that can be overridden via a config setting like config.assets.gzip_mime_types. Agreeable?

This comment has been minimized.

Copy link
Member

schneems replied Sep 7, 2017

If we're really picky, I would prefer a hash of mime types so that lookup time is constant.

This comment has been minimized.

Copy link
Member

matthewd replied Sep 7, 2017

The config should be an array, but yeah, we can transform that into a hash (or just a set) in the initializer.

Alternatively, we keep it as a regex (again, internally -- still configured as an array): .match?(/\Atext\/|\A#{Regexp.union configured_array}\z/) (but with the regex actually built in initialize and stored in an ivar)

I suspect a hash lookup + starts_with?("text/") is going to be fastest, though. But if we're rearranging, we might as well benchmark the options and make an informed decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment