Active storage add proxying#34477
Conversation
…torage-add-proxying-and-direct-downloads
…thub.com:fleck/rails into active-storage-add-proxying-and-direct-downloads
|
Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @georgeclaghorn (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review. Please see the contribution instructions for more information. |
|
@zinosama no worries commenting on a closed PR, it's a great place to document issues/solutions to problems around the PR. To prevent session cookies being set during asset responses add the following to config/environment.rb: Rails.application.initialize!
# The code should be added after your application is initialized
ActiveStorage::BaseController.class_eval do
def disable_session
request.session_options[:skip] = true
end
end
ActiveStorage::Blobs::ProxyController.class_eval do
before_action :disable_session
end
ActiveStorage::Representations::ProxyController.class_eval do
before_action :disable_session
end |
|
Thank you @fleck for the quick response. I can see that working. I'm also wondering though if that should the default behavior? The purpose of these proxy controllers is to make image caching easy. Yet (plz correct me if I'm wrong) the cookies header makes the asset uncache-able, which defeats this purpose. |
|
@zinosama making that the default may make sense, I can't think of a good reason to include the session for assets. But, it's too late for that change to make it into rails 6.1. I missed this during development because the application I was testing on doesn't use session except for a couple routes on the admin portion of the site. As for "the cookies header makes the asset uncache-able" that's Cloudflare specific behavior, a lot of CDNs will cache assets regardless of cookies. Cloudflare can also be configured to cache with a cookie present using a page rule with the "Edge cache TTL" setting. |
|
Thank you @fleck for this feature. This is my application.rb config.active_storage.delivery_method = :proxy
config.active_storage.proxy_urls_host = "cdn.mydomain.com"I've also tried with asset_name.deliver(:proxy)in views, that is just removing the host but still not calling the cdn, otherwise the current host is being called. I'd appreciate any insight. |
|
@mcanto unfortunately the backport is in a half finished state. At some point during this PR I upgraded my project to use the latest version of rails and stopped updating the back port. Even if you can get the backport to work the API is fairly different from what was merged in this PR. If possible I'd recommend upgrading to the latest rails (it's pretty stable), or trying to re-create the backport based on this API. |
|
Thank you for your answer @fleck I'll take a look into upgrade to the latest version. Definitely its a pretty nice feature, good job!
|
|
I was kind of expecting to see this in 6.1.0.RC1 but instead i only find things like |
|
@phoet The API has changed a decent amount from the initial proposal. Here's the up to date API: https://github.com/rails/rails/blob/master/activestorage/README.md#proxying |
|
For what it's worth, that documentation is pretty sparse. It seems to indicate that there's some way to get the URL for the proxied version of the asset instead of the direct ActiveStorage version, but it doesn't go into detail about how to tell Rails about how to map the raw asset path to the proxied one. Presumably, if S3 is the primary ActiveStorage backend, then we need some way to tell Rails whether the proxy is Cloudfront or Cloudflare or Cloudinary, etc. |
Heads up for other people on Cloudflare who cannot get this to work. Apparently you can only have it ignore cookies on the $200/mo Business plan or higher. The mentioned workaround that disables the session for the relevant controllers might be a better fix if you're on one of the cheaper plans. |
| <%= image_tag user.avatar.variant(resize_to_limit: [100, 100]) %> | ||
| ``` | ||
|
|
||
| ## File serving strategies |
There was a problem hiding this comment.
i think it would be good if https://edgeguides.rubyonrails.org/active_storage_overview.html#linking-to-files mentioned this too
|
|
||
| * `config.active_storage.draw_routes` can be used to toggle Active Storage route generation. The default is `true`. | ||
|
|
||
| * `config.active_storage.resolve_model_to_route` can be used to globally change how Active Storage files are delivered. |
There was a problem hiding this comment.
@fleck FYI this is rendering a bit weird:
https://guides.rubyonrails.org/configuring.html#configuring-active-storage
|
@fleck Hi, quick question. I have been using the direct block in my routes to set the CDN hostname for the proxy routes, and that works well. I would like to restrict these routes so that they only work if the incoming request actually uses the CDN hostname, and just a 404 or no content if the request uses the app's main domain. Basically I want all the requests for assets to go through the CDN. Reason: someone has been flooding my app by making lots of requests to the assets but using the app's main domain directly, bypassing the CDN caching. I would like to prevent this and make the relevant routes available only if the CDN hostname is used with some kind of constraint. Any suggestions? Thanks in advance. |
|
@vitobotta seems like a job for something like rack-attack. the redirect and proxy routes just point to a controller inside activestorage (somewhere in here https://github.com/rails/rails/tree/main/activestorage). configure rack attack to check additional headers (like those set by your CDN) and if they dont exist, shut down the request |
|
Has anyone tried the proxy method with pages that have hundreds of images? Currently we use Heroku + Cloudflare for caching... I wonder how many active storage proxy requests per second we can handle with each "2x dyno". |
|
@collimarco I have. Since each image is going to generate an HTTP request, the number of requests that a single 2x dyno can handle will depend on: the number of puma workers, the size of the images, the latency/speed of your storage backend. If you are using the recommended defaults for 2x dynos, your Puma concurrency is set to The first two times the page with hundreds with image is opened, all hundreds of requests will hit your servers. If you don't have enough dynos, all your other requests (including user navigation), will have to wait until all hundreds of images have been proxied from S3 to Cloudflare through your dynos. So let's say you have 200 images in that page, and 2 dynos (12-16 images per second). We are talking about other requests waiting 12-16 seconds until they get a response. The third time, the images will be cached and theoretically, your dynos will not have to handle them anymore. Theoretically because Cloudflare has so many PoPs, that if the page with all the images is not popular enough, your cache hit will be low, and your dynos will continue being forced to stream those images. I recommend you at least use lazy loading in that page, so that if the user does not scroll, you don't waste your capacity streaming those images. |
|
@brenogazzola Thanks for the reply! That was my fear... 6-8 req / sec / dyno are not enough with pages that may have 300+ images. It would be too expensive (300 / 8 = 37 dynos only for images!). Maybe it's something better however, because you forgot the threads. It think it's 2 workers * 5 threads the concurrency (most of the time with S3 is spent waiting). It's strange because a normal browser on a normal PC can download 300+ images in a few seconds and from my research it seems that they only use 6 concurrent connection per domain. It is strange that a server cannot perform similarly 🤔 |
|
@collimarco Threads coud help, yes. I tend to ignore them when calculating how much load Puma can handle since I normally have no idea which requests have I/O time and which don't. But in this case I guess you are right. As for the difference between PC and servers, that's because the browser is downloading from Cloudflare, which does its best to ensure that latency between request and content download is as short as possible. Your dynos on the other hand are dealing with S3, which does not care about latency at all. There might also be something in the AWS gem code, or the APIs it uses that introduces extra latency. It's not really a level playing field. |
Even after Disable session in ActiveStorage blobs and representations proxy controllers #48869 was implemented, or any monkey patch applied, that's only for session cookies. There could be other cookies beings set, for example if using ahoy, you get another cookie called I found a more future proof approach here: |
|
@silva96 Is this still necessary to use with latest rails ? config.middleware.insert_after(
ActionDispatch::Cookies,
Rack::StripCookies,
paths: %w[/rails/active_storage/representations/proxy /rails/active_storage/blobs/proxy],
) |
|
@navidemad I'm not familiar with the newest rails changes, if it's only the #48869 changes, then yes, that PR only disables session cookies, no other cookies, such as ahoy visits cookie. If you are not using ahoy, maybe you are good to go, otherwise you will need the Middleware. |

Summary
Added the option to globally change how active storage files are delivered. Users can now set app.config.active_storage.delivery_method to [*:redirect, :proxy]. There's also the option to override at the model level using
has_one_attached :avatar, delivery_method: :proxysyntax.This is just a rough draft, still could be DRYed up a bit, needs tests, and documentation. Just want to get some feedback before I go to far.
Other Information
If you are updating any of the CHANGELOG files or are asked to update the
CHANGELOG files by reviewers, please add the CHANGELOG entry at the top of the file.
Finally, if your pull request affects documentation or any non-code
changes, guidelines for those changes are available
here