Skip to content

Conversation

schneems
Copy link
Member

Currently the ActionDispatch::Static middleware hits disk on every GET and HEAD request to check if the given request maps to a file on disk. This means that all requests that do not involve serving an asset are slowed down. If you are using this middleware and serving assets via Nginx, no asset requests will ever reach the ActionDispatch::Static middleware so it makes sense as a performance optimization to turn off this middleware. This PR increases the speed of the middleware to the point where the difference on non-asset requests is trivial,.

The optimization works by getting the contents of /public (using one call to Dir.glob). We then store this information in a hash. Later on each request we check to see if the path of the request is in our hash for instance a request to /users/new would not be in the hash because there is no root level users/ directory, however a request to assets/application.css would trigger the middleware because assets/ is in the public/ directory. This prevents us form checking the disk except on a request which is more than likely going to contain an asset.

When benchmarked against the current ActionDispatch::Static we see roughly a ~68% improvement for non-asset requests:

While 68% is an impressive speed increase, this is only testing the middleware in isolation. When benchmarked with the entire Rails stack we still see an increase but it is roughly ~2.6% speed increase:

Note: The scale on this one does not start at 0.

There is still a speed cost associated with using this improved middleware versus no middleware. In my benchmarks it resulted in a ~0.7% speed decrease though with a very high standard deviation. You can see in the graph that the speed of the improved middleware falls in-between the speed of the app without middleware. This means that the decrease is fairly trivial in the scope of your entire application stack. You can still disable this middleware for a performance optimization if you are deploying behind NGINX, but there is no reason why using a CDN with your Rails server as the origin server shouldn't work out of the box.

For more information about the purpose of ActionDispatch::Static and how it is currently used (or not used) to serve assets please see this reference document: https://gist.github.com/schneems/5dc42963fb3221a7a089.

@dhh
Copy link
Member

dhh commented Aug 12, 2014

This seems well worth it. Rails absolutely should not require the use of nginx in production. Plenty of sites are small enough that they don't need anything like that. 👍 on the concept.

@wsouto
Copy link

wsouto commented Aug 13, 2014

Nice! 👏

def base_hash
@base_hash ||= begin
base_hash = {}
Dir.glob("#{@root}/*").each do |file|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to Dir.entries(@root)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use Dir.entries 👍

@tenderlove
Copy link
Member

This cache could be inconsistent with the filesystem (e.g. someone adds a file or removes a file after the cache is populated). Do we care?

@jeremy
Copy link
Member

jeremy commented Aug 14, 2014

Going default broadens our security exposure. Seeing Dir.globs makes me shiver.

@schneems
Copy link
Member Author

@tenderlove this cache only cares about the first level, so if you are adding files to disk at runtime, as long as the root folder is there at boot up, it will be in the cache. For example if you're letting users upload photos to public/users/photos as long as public/users exists, we will still serve the files. Even if that wasn't the case, i'm fine stating that this is counter to best practice as it doesn't scale up across multiple servers and shouldn't be used (use S3, etc. instead). I mentioned this in my linked gist

It is possible for it to serve user uploaded assets (user uploads smile.png and it goes into /public and is served by Static), but this is not best practice in production, and the only time you would enable this is in production. Doing this makes it impossible to scale up to more than one server, therefore this is not a supported use of this middleware.

The ability to serve runtime generated files still exists, so even if i'm being heavy handed with my assumptions and stated "best practices" I think for the most part we're still okay.

@jeremy the main functionality of the glob is to deal with default file extensions i.e. mapping /500 => /500.html and dealing with the "index" special case. Any and all rendering is delegated to Rack::File which explicitly has security measures to only allow files below the passed in root directory to be rendered https://github.com/rack/rack/blob/master/lib/rack/file.rb. I believe that glob in ActionDispatch::Static was likely for convenience/speed (maybe?) we can write the same functionality very easily without it

if env['PATH_INFO'].chomp('/').last.nil?
  env['PATH_INFO'] << "index"
end

if File.extname(env['PATH_INFO']).empty?
  env['PATH_INFO'] << ::ActionController::Base.default_static_extension
end

I could remove the glob and wrap that functionality up in another PR if that would help.

@@ -43,6 +52,20 @@ def unescape_path(path)
def escape_glob_chars(path)
path.gsub(/[*?{}\[\]]/, "\\\\\\&")
end

def parse_root
base_hash = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bf4 why? @base_hash never changes after boot time...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I suppose you're right. My bad for not looking carefully at its usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently lazilly initialized on the first request, I did this to make some previously written tests that were doing some weird stuff work. Could either refactor those tests, or this code, or make it thread safe. Thanks for the review, good catch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the test and updated so we're now populating this hash at boot.

@bf4
Copy link
Contributor

bf4 commented Aug 18, 2014

Looks good the me. The failing test is unrelated

MultibyteCharsUTF8BehaviourTest#test_upcase_should_upcase_ascii_characters [test/multibyte_chars_test.rb:434]:
Expected: "ABC"
  Actual: A

@claudiug
Copy link

👍 👏

@schneems
Copy link
Member Author

@jeremy this should help security concerns #16544 we can replace the Dir.glob with a Dir.exist? to achieve the same logic. Now we don't have to worry about any glob escaping logic. We're also replacing one file system call with another so speed should be comparable.

@schneems schneems force-pushed the schneems/improved-dispatch-static branch from 1f9b3c3 to 1770f0c Compare August 20, 2014 15:14
@schneems schneems force-pushed the schneems/improved-dispatch-static branch from 1770f0c to c5f8f3f Compare August 20, 2014 15:20
@schneems schneems force-pushed the schneems/improved-dispatch-static branch from c5f8f3f to 0375e7b Compare August 27, 2014 17:14
schneems added a commit to schneems/rails that referenced this pull request Aug 27, 2014
Dir.glob can be a security concern. The original use was to provide logic of fallback files. Example a request to `/` should render the file from `/public/index.html`. We can replace the dir glob with the specific logic it represents. The glob {,index,index.html} will look for the current path, then in the directory of the path with index file and then in the directory of the path with index.html. This PR replaces the glob logic by manually checking each potential match. Best case scenario this results in one less file API request, worst case, this has one more file API request.

Related to rails#16464

Update: added a test for when a file of a given name (`public/bar.html` and a directory `public/bar` both exist in the same root directory. Changed logic to accommodate this scenario.
@schneems
Copy link
Member Author

All green as well as #16544 also added a release notes section

trungpham pushed a commit to trungpham/rails that referenced this pull request Sep 18, 2014
Dir.glob can be a security concern. The original use was to provide logic of fallback files. Example a request to `/` should render the file from `/public/index.html`. We can replace the dir glob with the specific logic it represents. The glob {,index,index.html} will look for the current path, then in the directory of the path with index file and then in the directory of the path with index.html. This PR replaces the glob logic by manually checking each potential match. Best case scenario this results in one less file API request, worst case, this has one more file API request.

Related to rails#16464

Update: added a test for when a file of a given name (`public/bar.html` and a directory `public/bar` both exist in the same root directory. Changed logic to accommodate this scenario.
@schneems schneems force-pushed the schneems/improved-dispatch-static branch 3 times, most recently from 653855e to fa72aad Compare September 28, 2014 19:35
@schneems
Copy link
Member Author

Any other blockers? My birthday is coming up on tuesday...

I just pushed a rebase.

@eval
Copy link
Contributor

eval commented Sep 30, 2014

Bumping this one, cause it's @schneems 🎂 ! 🎈

@rails rails locked and limited conversation to collaborators Sep 30, 2014
@schneems schneems force-pushed the schneems/improved-dispatch-static branch 2 times, most recently from c6fcefa to 4aa1f9d Compare October 31, 2014 22:02
@schneems
Copy link
Member Author

I re-wrote so that the config.cache_classes will enable/disable this behavior. To do this, i'm deprecating accepting a string cache control and taking an options hash.

@schneems schneems force-pushed the schneems/improved-dispatch-static branch from 4aa1f9d to 2540b1c Compare November 3, 2014 18:25
@schneems
Copy link
Member Author

schneems commented Nov 3, 2014

Build is green

@schneems schneems force-pushed the schneems/improved-dispatch-static branch from 2540b1c to 91b8c85 Compare November 4, 2014 17:30
@@ -13,16 +13,35 @@ module ActionDispatch
# located at `public/assets/application.js` if the file exists. If the file
# does not exist a 404 "File not Found" response will be returned.
class FileHandler
def initialize(root, cache_control)
def initialize(root, options = {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • options is always passed (never nil) so we don't need the default value
  • consider a more verbose argument name that reflects the deprecation runaround that follows this, e.g. options_or_deprecated_cache_control

@rails rails unlocked this conversation Nov 5, 2014
schneems added a commit to schneems/rails that referenced this pull request Nov 10, 2014
Based on conversation in rails#16464 we can't enable this by default for ALL applications. It was agreed that it's reasonable if the host container can set this value (i.e. docker/heroku). 

It's important that we can explicitly enable as well as disable this value as services may enable setting environment variables to different values (like `"true"` or `"false"`) but not allow unsetting the values (`nil`)
Rails serves assets in development by default, in production this is turned off. The default rails guides require you be running behind nginx. In the last year 800,000+ developers don't want to run behind nginx: https://rubygems.org/gems/rails_serve_static_assets. This PR enables serving assets in Production by default and includes an optimization that makes this option dramatically faster for non-asset requests.

Currently the `ActionDispatch::Static` middleware hits disk on every `GET` and `HEAD` request to check if the given request maps to a file on disk. This means that all requests that do not involve serving an asset are slowed down. If you are using this middleware and serving assets via Nginx, no asset requests will ever reach the `ActionDispatch::Static` middleware so it makes sense as a performance optimization to turn off this middleware. This PR increases the speed of the middleware to the point where the difference on non-asset requests is trivial, so we can enable the middleware by default for all users with minimal (if any) impact.

The optimization works by getting the contents of `/public` (using one call to `Dir.glob`). We then store this information in a hash. Later on each request we check to see if the path of the request is in our hash for instance a request to `/users/new` would not be in the hash because there is no root level `users/` directory, however a request to `assets/application.css` would trigger the middleware because `assets/` is in the `public/` directory. This prevents us form checking the disk except on a request which is more than likely going to contain an asset.

When benchmarked against the current `ActionDispatch::Static` we see roughly a ~68% improvement for non-asset requests:

![](https://www.dropbox.com/s/dcsrhrfh7gb44dc/Screenshot%202014-08-08%2014.05.21.png?dl=1)

While 68% is an impressive speed increase, this is only testing the middleware in isolation. When benchmarked with the entire Rails stack we still see an increase but it is roughly ~2.6% speed increase:

![](https://www.dropbox.com/s/bhb0o3tbupw0hus/Screenshot%202014-08-08%2015.06.11.png?dl=1)

Note: The scale on this one does not start at 0.

There is still a speed cost associated with using this improved middleware versus no middleware. In my benchmarks it resulted in a ~0.7% speed decrease though with a very high standard deviation. You can see in the graph that the speed of the improved middleware falls in-between the speed of the app without middleware. This means that the decrease is fairly trivial in the scope of your entire application stack. You can still disable this middleware for a performance optimization if you are deploying behind NGINX, but there is no reason why using a CDN with your Rails server as the origin server shouldn't work out of the box.

For more information about the purpose of ActionDispatch::Static and how it is currently used (or not used) to serve assets please see this reference document: https://gist.github.com/schneems/5dc42963fb3221a7a089.
@schneems schneems force-pushed the schneems/improved-dispatch-static branch from 91b8c85 to d6d768f Compare November 13, 2014 04:25
@schneems
Copy link
Member Author

Added tests and comments. Simplified the deprecation, slightly better naming. The lambda is faster, but they're both pretty fast

Calculating -------------------------------------
              lambda    126633 i/100ms
         conditional     91833 i/100ms
-------------------------------------------------
              lambda  5860485.7 (±16.5%) i/s -   28365792 in   4.999036s
         conditional  4522421.6 (±11.8%) i/s -   22223586 in   5.002480s

Moved the caching on/off switch to a method instead of a lambda as it is more readable and fast enough.

Ready for another round of reviews/discussion.

@jeremy
Copy link
Member

jeremy commented Nov 24, 2014

I hear the pressures behind this, but the tradeoffs still aren't balancing well.

It is illustrating that our current setup (with static file service as a config option) is too unclear and black-boxy. With an nginx/apache/httpd on top, it's super-clear how we respond to a request, e.g. with the nginx try_files directive or another httpd's 404 handler.

In a Rails app, I'd expect to open config/routes.rb and see static file service mounted at specific locations, and the public web root mounted at / with 404 cascade to other app routes.

@jeremy jeremy removed this from the 4.2.0 milestone Nov 24, 2014
@schneems
Copy link
Member Author

schneems commented Jun 8, 2017

I'm closing this PR. What I want to do eventually, is add an option config.assets.pipeline_only or something when enabled it would only serve an asset if it was in the asset pipeline versus an arbitrary asset from disk then you could do stuff like build a hash from manifest files

The problem is that even if actually serving an asset happens infrequently, the codepath for checking if a request matches a thing on disk happens on EVERY request i.e. by turning on rails asset service ANY request, not just those to assets/*whatever-digest* can result in an asset hit i.e. localhost:3000/ can trigger public/index.html and that check is slow.

If we could skip hitting the disk for those checks, we could get a much faster middleware as it stands my patch is too complex and doesn't actually save real time because it has a fallback. what we need is for the user to be more explicit with how they want the app to behave.

i.e. config.assets.pipeline_only for a fast middleware that is not forgiving or config.assets.i_suck_and_i_have_non_digest_assets for slow but easier to use middleware. Also there's webpack to consider. I don't know how it works or if there's a canonical asset manifest we can use. Maybe we could have multiple middleware's that perform an different check if it's extremely fast.

As with all things, there's more problems with error pages 404.html and 500.html etc.

Currently we can't put these into the pipeline because then they end up with digests 404-c91mv8sdfnsdf0a.html. And when your app is failing you want to do the simplest thing possible, you don't want to look up hashes etc.

So that's not an option. Serving without a digest opens up caching problems so, i'm stuck. That's the problem we need to solve, the issue of how to serve error pages when "asset pipeline only" is enabled.

@schneems schneems closed this Jun 8, 2017
@matthewd
Copy link
Member

matthewd commented Jun 8, 2017

The "every request" part is about page caching, I think.

If you only care to serve assets, it should be fairly straightforward to only mount AD::Static on /assets, IIRC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.