-
Notifications
You must be signed in to change notification settings - Fork 21.9k
Speed up static file service by caching existence checks #16464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This seems well worth it. Rails absolutely should not require the use of nginx in production. Plenty of sites are small enough that they don't need anything like that. 👍 on the concept. |
Nice! 👏 |
def base_hash | ||
@base_hash ||= begin | ||
base_hash = {} | ||
Dir.glob("#{@root}/*").each do |file| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to Dir.entries(@root)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use Dir.entries
👍
This cache could be inconsistent with the filesystem (e.g. someone adds a file or removes a file after the cache is populated). Do we care? |
Going default broadens our security exposure. Seeing Dir.globs makes me shiver. |
@tenderlove this cache only cares about the first level, so if you are adding files to disk at runtime, as long as the root folder is there at boot up, it will be in the cache. For example if you're letting users upload photos to
The ability to serve runtime generated files still exists, so even if i'm being heavy handed with my assumptions and stated "best practices" I think for the most part we're still okay. @jeremy the main functionality of the glob is to deal with default file extensions i.e. mapping
I could remove the glob and wrap that functionality up in another PR if that would help. |
@@ -43,6 +52,20 @@ def unescape_path(path) | |||
def escape_glob_chars(path) | |||
path.gsub(/[*?{}\[\]]/, "\\\\\\&") | |||
end | |||
|
|||
def parse_root | |||
base_hash = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use ThreadSafe::Cache? https://github.com/ruby-concurrency/thread_safe#usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bf4 why? @base_hash
never changes after boot time...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I suppose you're right. My bad for not looking carefully at its usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's currently lazilly initialized on the first request, I did this to make some previously written tests that were doing some weird stuff work. Could either refactor those tests, or this code, or make it thread safe. Thanks for the review, good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the test and updated so we're now populating this hash at boot.
Looks good the me. The failing test is unrelated
|
👍 👏 |
1f9b3c3
to
1770f0c
Compare
1770f0c
to
c5f8f3f
Compare
c5f8f3f
to
0375e7b
Compare
Dir.glob can be a security concern. The original use was to provide logic of fallback files. Example a request to `/` should render the file from `/public/index.html`. We can replace the dir glob with the specific logic it represents. The glob {,index,index.html} will look for the current path, then in the directory of the path with index file and then in the directory of the path with index.html. This PR replaces the glob logic by manually checking each potential match. Best case scenario this results in one less file API request, worst case, this has one more file API request. Related to rails#16464 Update: added a test for when a file of a given name (`public/bar.html` and a directory `public/bar` both exist in the same root directory. Changed logic to accommodate this scenario.
All green as well as #16544 also added a release notes section |
Dir.glob can be a security concern. The original use was to provide logic of fallback files. Example a request to `/` should render the file from `/public/index.html`. We can replace the dir glob with the specific logic it represents. The glob {,index,index.html} will look for the current path, then in the directory of the path with index file and then in the directory of the path with index.html. This PR replaces the glob logic by manually checking each potential match. Best case scenario this results in one less file API request, worst case, this has one more file API request. Related to rails#16464 Update: added a test for when a file of a given name (`public/bar.html` and a directory `public/bar` both exist in the same root directory. Changed logic to accommodate this scenario.
653855e
to
fa72aad
Compare
Bumping this one, cause it's @schneems 🎂 ! 🎈 |
c6fcefa
to
4aa1f9d
Compare
I re-wrote so that the |
4aa1f9d
to
2540b1c
Compare
Build is green |
2540b1c
to
91b8c85
Compare
@@ -13,16 +13,35 @@ module ActionDispatch | |||
# located at `public/assets/application.js` if the file exists. If the file | |||
# does not exist a 404 "File not Found" response will be returned. | |||
class FileHandler | |||
def initialize(root, cache_control) | |||
def initialize(root, options = {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
options
is always passed (nevernil
) so we don't need the default value- consider a more verbose argument name that reflects the deprecation runaround that follows this, e.g.
options_or_deprecated_cache_control
Based on conversation in rails#16464 we can't enable this by default for ALL applications. It was agreed that it's reasonable if the host container can set this value (i.e. docker/heroku). It's important that we can explicitly enable as well as disable this value as services may enable setting environment variables to different values (like `"true"` or `"false"`) but not allow unsetting the values (`nil`)
Rails serves assets in development by default, in production this is turned off. The default rails guides require you be running behind nginx. In the last year 800,000+ developers don't want to run behind nginx: https://rubygems.org/gems/rails_serve_static_assets. This PR enables serving assets in Production by default and includes an optimization that makes this option dramatically faster for non-asset requests. Currently the `ActionDispatch::Static` middleware hits disk on every `GET` and `HEAD` request to check if the given request maps to a file on disk. This means that all requests that do not involve serving an asset are slowed down. If you are using this middleware and serving assets via Nginx, no asset requests will ever reach the `ActionDispatch::Static` middleware so it makes sense as a performance optimization to turn off this middleware. This PR increases the speed of the middleware to the point where the difference on non-asset requests is trivial, so we can enable the middleware by default for all users with minimal (if any) impact. The optimization works by getting the contents of `/public` (using one call to `Dir.glob`). We then store this information in a hash. Later on each request we check to see if the path of the request is in our hash for instance a request to `/users/new` would not be in the hash because there is no root level `users/` directory, however a request to `assets/application.css` would trigger the middleware because `assets/` is in the `public/` directory. This prevents us form checking the disk except on a request which is more than likely going to contain an asset. When benchmarked against the current `ActionDispatch::Static` we see roughly a ~68% improvement for non-asset requests:  While 68% is an impressive speed increase, this is only testing the middleware in isolation. When benchmarked with the entire Rails stack we still see an increase but it is roughly ~2.6% speed increase:  Note: The scale on this one does not start at 0. There is still a speed cost associated with using this improved middleware versus no middleware. In my benchmarks it resulted in a ~0.7% speed decrease though with a very high standard deviation. You can see in the graph that the speed of the improved middleware falls in-between the speed of the app without middleware. This means that the decrease is fairly trivial in the scope of your entire application stack. You can still disable this middleware for a performance optimization if you are deploying behind NGINX, but there is no reason why using a CDN with your Rails server as the origin server shouldn't work out of the box. For more information about the purpose of ActionDispatch::Static and how it is currently used (or not used) to serve assets please see this reference document: https://gist.github.com/schneems/5dc42963fb3221a7a089.
91b8c85
to
d6d768f
Compare
Added tests and comments. Simplified the deprecation, slightly better naming. The lambda is faster, but they're both pretty fast
Moved the caching on/off switch to a method instead of a lambda as it is more readable and fast enough. Ready for another round of reviews/discussion. |
I hear the pressures behind this, but the tradeoffs still aren't balancing well. It is illustrating that our current setup (with static file service as a config option) is too unclear and black-boxy. With an nginx/apache/httpd on top, it's super-clear how we respond to a request, e.g. with the nginx In a Rails app, I'd expect to open |
I'm closing this PR. What I want to do eventually, is add an option The problem is that even if actually serving an asset happens infrequently, the codepath for checking if a request matches a thing on disk happens on EVERY request i.e. by turning on rails asset service ANY request, not just those to If we could skip hitting the disk for those checks, we could get a much faster middleware as it stands my patch is too complex and doesn't actually save real time because it has a fallback. what we need is for the user to be more explicit with how they want the app to behave. i.e. As with all things, there's more problems with error pages 404.html and 500.html etc. Currently we can't put these into the pipeline because then they end up with digests So that's not an option. Serving without a digest opens up caching problems so, i'm stuck. That's the problem we need to solve, the issue of how to serve error pages when "asset pipeline only" is enabled. |
The "every request" part is about page caching, I think. If you only care to serve assets, it should be fairly straightforward to only mount AD::Static on |
Currently the
ActionDispatch::Static
middleware hits disk on everyGET
andHEAD
request to check if the given request maps to a file on disk. This means that all requests that do not involve serving an asset are slowed down. If you are using this middleware and serving assets via Nginx, no asset requests will ever reach theActionDispatch::Static
middleware so it makes sense as a performance optimization to turn off this middleware. This PR increases the speed of the middleware to the point where the difference on non-asset requests is trivial,.The optimization works by getting the contents of
/public
(using one call toDir.glob
). We then store this information in a hash. Later on each request we check to see if the path of the request is in our hash for instance a request to/users/new
would not be in the hash because there is no root levelusers/
directory, however a request toassets/application.css
would trigger the middleware becauseassets/
is in thepublic/
directory. This prevents us form checking the disk except on a request which is more than likely going to contain an asset.When benchmarked against the current
ActionDispatch::Static
we see roughly a ~68% improvement for non-asset requests:While 68% is an impressive speed increase, this is only testing the middleware in isolation. When benchmarked with the entire Rails stack we still see an increase but it is roughly ~2.6% speed increase:
Note: The scale on this one does not start at 0.
There is still a speed cost associated with using this improved middleware versus no middleware. In my benchmarks it resulted in a ~0.7% speed decrease though with a very high standard deviation. You can see in the graph that the speed of the improved middleware falls in-between the speed of the app without middleware. This means that the decrease is fairly trivial in the scope of your entire application stack. You can still disable this middleware for a performance optimization if you are deploying behind NGINX, but there is no reason why using a CDN with your Rails server as the origin server shouldn't work out of the box.
For more information about the purpose of ActionDispatch::Static and how it is currently used (or not used) to serve assets please see this reference document: https://gist.github.com/schneems/5dc42963fb3221a7a089.