Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ActiveStorage: Allow access to backing file from Service API #31419

Closed
jduff opened this issue Dec 12, 2017 · 78 comments
Closed

ActiveStorage: Allow access to backing file from Service API #31419

jduff opened this issue Dec 12, 2017 · 78 comments

Comments

@jduff
Copy link
Contributor

jduff commented Dec 12, 2017

Steps to reproduce

Currently with the ActiveStorage::Service api you can only get a link through the url method which, for most services, gives back a public URL that expires in some timeframe. It would be very useful if there was a file method that returned the backing file object from the service so that you have more flexibility in how you can expose those files.

System configuration

Rails version: 5.2.0.beta2

@rafaelfranca rafaelfranca added this to the 5.2.0 milestone Dec 12, 2017
@rafaelfranca
Copy link
Member

Sounds good to me. @georgeclaghorn thoughts?

@georgeclaghorn
Copy link
Contributor

I’m 👎 on the same method having a different signature in each Service class. It defeats the purpose of the Service abstraction.

@rafaelfranca
Copy link
Member

Would a new file method require to have a different signature in each Service class?

@georgeclaghorn
Copy link
Contributor

georgeclaghorn commented Dec 12, 2017

If I understand the proposal here correctly, it’d return a different type of object in every class. The file classes provided by the clients are different enough that any application using ActiveStorage::Service#file could not be indifferent to the configured service.

@rafaelfranca
Copy link
Member

Ah yeah, got it. I understand your concerns but I think this doesn't defeats the purpose of the service abstraction. Take the Active Job adapter as example, each implementation return a different object each job with the same API. Or the Active Record connection adapters, where ActiveRecord::Base.connection can return similar objects but with slightly different API.

This proposal is not asking us to have different methods in the Service layer, it is asking to us to allow the inner implementation to be accessible via the common API in the Service layer. Of course as soon the application stats to use the information in the inner implementation it is now coupled to that provider, but at least now people can leverage the framework in order to build more complex cases that we don't support.

@georgeclaghorn
Copy link
Contributor

I’m still 👎. If you want, say, a Google Cloud Storage client, you don’t need to go through Active Storage to get one.

@rafaelfranca
Copy link
Member

Of course not, in the same way if you want a MySQL connection you don't need to go through Active Record, but why make it harder if we can make it easy? Is not this the whole point of Rails?

@georgeclaghorn
Copy link
Contributor

The point of Active Storage is to permit Rails applications to maintain a healthy indifference to various storage services. It was borne from a specific need to keep Basecamp at arm’s length from heterogenous storage backends.

I’ve already stated my objections on those grounds and would suggest that you find a different way to accomplish your still-unstated purpose. Nonetheless, it sounds like you’re going to proceed against my objections. It’s your call.

@rafaelfranca
Copy link
Member

Nonetheless, it sounds like you’re going to proceed against my objections.

I'm not, this is why I asked your opinion.

I really understand your reasoning about this but I think we can make a compromise since if someone wants to couple their application to the storage it is their choice and I believe Rails could also help those users. Basecamp and other applications that users heterogenous backends would not need any change and should not care about this new method. But I can also understand your side of this being a possible sharp knife to hurt applications.

If you are still strong about it I'm totally fine with not exposing it.

@georgeclaghorn
Copy link
Contributor

Sorry, I didn‘t mean to imply you’re not listening to me. I know you are and I appreciate it. ❤️

If you have a use for this, let’s add it. You’re right that we don’t have to use it.

@jduff
Copy link
Contributor Author

jduff commented Dec 12, 2017

I can work around this if I really need to, but I was thinking along the lines of @rafaelfranca when he said "why make it harder if we can make it easy".

The case I have for this is we have an internal application that is storing somewhat sensitive documents and we're hesitant around have a public url, even a random one that expires, available to the files. I already added a custom controller to redirect to the file so that I can authenticate the users from the application side before exposing the url, but they could still share that expiring url with someone that doesn't have access.

In this case the storage service can also build protected urls that will use the ACL we have set up, but I need the client and the file object to call those.

Again, I can patch around this, but it sounded like a case that others could run into as well. If there is another way to get the same result, or a different API that could be added that works too, I'm not married to the idea of exposing the file object. I would rather come up with the right API that fits with the overall goal of ActiveStorage than just the thing that meets our specific need.

Thanks for talking through this with us @georgeclaghorn ❤️

@meinac
Copy link
Contributor

meinac commented Dec 13, 2017

There is the download method and as far as I see from the implementation it returns the content of the file, doesn't it work for you?

@jduff
Copy link
Contributor Author

jduff commented Dec 13, 2017

That would be a possible solution, download the file and proxy it through the app. I was hoping to get a direct url to the file that would require google account authentication, but I could handle that part in the app and proxy the file as a work around.

@dwightwatson
Copy link
Contributor

I had a crack at proxying files through the app in #30465. I would like to see a way of directly accessing certain attachments rather than being given an expiring link to them.

My use case is that some of the images we upload through ActiveStorage are intended to be public. We are seeing lag when it has to generate a new expiring link, and because they are expiring it's more difficult to cache.

In addition we're using CloudFront as a CDN which caches the redirect to the asset, not the end result of the asset itself. I don't know how other CDNs tackle this sort of thing, but it effectively makes CloudFront incompatible with ActiveStorage URLs.

@georgeclaghorn
Copy link
Contributor

As with the OP’s case, I think there’s a general solution to that problem that doesn’t require apps to couple themselves to the underlying clients of the various services. (It might even be the same problem.) Please Do Investigate. 😄

@matthewd
Copy link
Member

matthewd commented Dec 18, 2017

I do think there's value, as a general principle, in providing an escape hatch to access the underlying layer: if someone can use the abstract ASt API for 95% of their needs, better we allow them to do so, without forcing them to choose between a fully custom no-ASt implementation, or manually reimplementing ASt's knowledge of how models map to entries in the store.

IMO, we lean pretty heavily to the pragmatic, rather than perfect, abstraction... an AR model will hand you the raw database connection; it'll also accept a backend-dialect-specific where condition. A goal of AR is certainly to allow an application to remain as arms-length as practical from the data store, but not so much that it obstructs people when a generic solution is unavailable.

All of that said, if we offered such a method, I think I'd want it named something slightly sharper-edged than file, more in line with AR's raw_connection -- just enough to make it sound like you're piercing a layer of abstraction.

@dhh dhh removed this from the 5.2.0 milestone Jan 8, 2018
@dhh
Copy link
Member

dhh commented Jan 8, 2018

(Not rendering an opinion on this specifically, just don't think it's a blocker for 5.2.0. I'm inclined too to open a syntactically vinegar'ed backdoor for people to do whatever they want.)

@koenpunt
Copy link
Contributor

koenpunt commented Jan 9, 2018

Not the same as what @jduff is asking for, but; I'm using Rails as a backend of a relatively high traffic website, with its main purpose of serving/displaying images. All these images are allowed to be public (and are configured like that on S3), so no need for signed URLs. And like mentioned by @dwightwatson, caching with signed URLs is an issue.
So for me using the url method is not a problem, but having the option to get it without signing parameters would be nice.

Of course this would be solvable with a custom implementation if the backing file would be exposed from the service.

Then related to this; how would one configure an CDN host for AS stored files (instead of the s3 urls)? Or is that currently not possible?

@wadestuart
Copy link

I have to say I am in the same boat as @koenpunt as it stands the architecture of AS really forces one into a very specific use case. There are large portions of apps out there that conflict entirely with the signed access/temp url endpoint use.

Is there a doc somewhere that goes over the reasoning behind some of these conventions? Perhaps stating the thought usage when the site has what seems like common cases of public assets, CDN fronts, static file delivery (without requests to rails per asset per user per time period).

IMHO the whole signed and managed url seems like it should be the optional behavior not the default.

@dhh
Copy link
Member

dhh commented Jan 29, 2018 via email

@dhh
Copy link
Member

dhh commented Jan 29, 2018

You can see how little code there actually is in the default controllers here: https://github.com/rails/rails/blob/master/activestorage/app/controllers/active_storage/blobs_controller.rb

Adding your own controller that wraps this in Google Authentication or whatever scheme you please should be trivial.

@dhh
Copy link
Member

dhh commented Jan 29, 2018

@koenpunt I'd be happy to see a patch that generated a URL without signature if false was passed to expires_in: 👍

@koenpunt
Copy link
Contributor

We have a full explanation in the docs about how you can make your own URLs that use a different authentication scheme by using your own controllers.

I seem to be unable to find that.. Can you point me in the right direction?

@dhh
Copy link
Member

dhh commented Jan 31, 2018

@koenpunt Here's the default controller: https://github.com/rails/rails/blob/master/activestorage/app/controllers/active_storage/blobs_controller.rb. You basically just do that, but with your own wrapping, and then you'll have your own URLs for it 👍

@koenpunt
Copy link
Contributor

I'd be happy to see a patch that generated a URL without signature if false was passed to expires_in: 👍

I looked into this, and started with S3, but there the content disposition headers do not work for unsigned (public) urls, and thus you end up with a url like https://rails-as-test-1.s3.eu-central-1.amazonaws.com/rwFanLfBnQC831WjCaCMAEbh.
So unless content type and other headers are set when uploading the file, I doubt this is going to work.

It would be nice if the storage path could be configured, so that in the case of publicly accessible items, the service url can be used directly, instead of routing through Rails.

So the actual keys in the bucket would become something like:

  • rwFanLfBnQC831WjCaCMAEbh/myfile.jpg
  • variants/rwFanLfBnQC831WjCaCMAEbh/5ab8a8648821d31837ecfa2b3b5ae85f52c099af95308cd4f5c01f76882427b7/myfile.jpg

@koenpunt
Copy link
Contributor

Here's the default controller:

I've seen that, but I expected there to be a more since you mentioned "a full explanation" 😅

@dhh
Copy link
Member

dhh commented Jan 31, 2018 via email

@kylefox
Copy link

kylefox commented Dec 2, 2018

I think it would be useful to support multiple services within an application, instead of a single application-wide service.

There would still be a default (application-wide) storage service. But allowing different services per attachment enables flexible behaviour like this:

class User < ApplicationRecord
  # Store profile photos on S3 publicly, serve from CDN, etc.
  has_one_attached :photo, storage: :public_s3

  # Store birth certificates on S3 privately, require signed URLs, etc.
  has_one_attached :birth_certificate, storage: :private_s3
end

Django has had a mechanism like this for years, and it's a delight to work with — it feels like the right balance of indirection and simplicity.

@dhh
Copy link
Member

dhh commented Dec 3, 2018 via email

@joeczucha
Copy link

Just to add another possible use-case for this...

My application has a number of user-submitted articles. An email is periodically generated and sent out to registered users showing the latest articles that have been added to the site; I've been including the avatar of the author with the article summary in the email by just linking to the asset on S3.

However, I can't think of a way that I can continue to do this if I migrate to ActiveStorage, given that there is no way to disable the expiry for certain assets.

@EnziinSystem
Copy link

EnziinSystem commented Jan 30, 2019

@akshaysharma096 No, it is just a quick patch, sorry. You need to *class_eval Variant model too based on my override of Blob model

Sorry, but if you can not override Variant, then the patch on ActiveStorage::Service::S3Service not mean.
Because in fact, variants use very often.
If I use the line:
<%= image_tag @course.image.service_url %>
It's work, but if I use:
<%= image_tag @course.image.variant(resize: "850x480").service_url %>
It not working.

@sc0ttman
Copy link

I have a use-case where I would like all the originals to be private but variants to be public (essentially preview images of digital product).
I have taken what @georgeclaghorn has said here and subclassed the S3Service.
I have Overrode the def upload and def url methods to store and retrieve assets as public if if key.start_with?("variants").

@Manfred
Copy link
Contributor

Manfred commented Mar 21, 2019

From my experience in working with applications with multiple storage providers I would like to add some observations to this issue which might be relevant.

When referencing an object in storage you generally need a storage provider (e.g. Amazon S3 or Google Cloud Storage), a region (availability regions are usually geographically tied), a container (e.g. bucket on S3 or container on Rackspace Cloud Files), and a path (e.g. key on S3 or path on Cloud Storage). These four are pretty consistent across providers.

A representation for an S3 object could be:

{
  service: 's3',
  region: 'eu-central-1',
  container: 'my-bucket-name',
  path: 'path/to/object.jpg
}

Storing metadata with an object reference is crucial for performance reasons, but we'll leave that aside for now.

Note that the first three are pretty static so we want to denormalize and store them centrally somewhere. Active Storage got this right and stores that information in storage.yml.

The only thing missing to reference objects on different storage providers is to add a service_name column to the blobs table and then implement ActiveStorage::Blob#service.

This allows for a great API to move or copy files between providers because the Blob always references both of them.


Code that generals URLs or paths (e.g. https://…, file://…, /path/to/file) almost operates like a view layer on the Blob. In 99% of the cases Blob#url is good enough, but are always one or two places in an application where you need to do something special.

URLs can have wildly different requirements because content delivery networks, asset servers, and everlasting URLs all use different additional information.

Everlasting URLs are a URL back to the website itself with some stable token that allows you to generate a URL which can be used in archived media like PDFs, emails, and chat.

  • Serving private files through a CDN might require additional credentials or encryption keys to generate the URL.
  • Asset servers can be dependent on the actual request (e.g. server pinning).
  • Everlasting URLs might need to know the tenant domain to generate the URL (e.g. company-name.myproduct.com).

You never want to support all these options through one url method. I believe we've learnt through other Rails APIs that having methods with an incredible amount of options to make them completely unmaintainable.

Direct access to the ‘low-level’ storage objects can be useful for ops, but are not the ideal API when you want to generate URLs because it adds coupling to the underlying libs.

class Book < ApplicationRecord
  has_one_attached :cover_image
end

book.cover_image.storage_object #=> #<Aws::S3::Object>

if book.cover_image.on?('s3')
  book.cover_image.variant(
    resize_to_limit: [100, 100]
  ).storage_object.public_url #=> "https://…"
end

Active Storage could implement URL formatters which take a blob or variant, some configuration, and optional additional arguments to generate a URL.

For example, using this configuration (not sure where this would be stored).

{
  cdn: {
    service: 'CloudFront'
    origin: 's3:eu-central-1:my-bucket-name'
    domain_name: 'example.cloudfront.net'
    private_key: '…'
  }
}

You could implement plumbing similar to Service to fetch the configuration and initialize a formatter with static information.

formatter = ActiveStorage::UrlFormatter.configure(:cdn)
variant = book.cover_image.variant(
  resize_to_limit: [100, 100]
)
formatter.signed_url(key: variant.key)
formatter.public_url(key: variant.key)
formatter.url(key: variant.key)

Methods on a formatter don't have to conform to a specific interface, but it might be useful for all of them to expose a url method.

So if we go back to the book example, we would get the following model.

class Book < ApplicationRecord
  has_one_attached :cover_image

  def cover_image_url
    cdn_url_formatter.signed_url(
      key: cover_image_small_variant.key
    )
  end

  private

  def cover_image_small_variant
    cover_image.variant(
      resize_to_limit: [512, 512]
    )
  end

  def cdn_url_formatter
    ActiveStorage::UrlFormatter.configure(:cdn)
  end
end

Or in the case of a provider specific formatter for a publisher which requires request information for some reason.

class Book < ApplicationRecord
  has_one_attached :pdf_file

  def pdf_file_download_url(request_hostname)
    download_url_formatter(request_hostname).url
  end

  private

  def download_url_formatter(request_hostname)
    ActiveStorage::UrlFormatter.configure(
      :assets, request_hostname: request_hostname
    )
  end
end

@mengqing
Copy link

@akshaysharma096 No, it is just a quick patch, sorry. You need to *class_eval Variant model too based on my override of Blob model

Sorry, but if you can not override Variant, then the patch on ActiveStorage::Service::S3Service not mean.
Because in fact, variants use very often.
If I use the line:
<%= image_tag @course.image.service_url %>
It's work, but if I use:
<%= image_tag @course.image.variant(resize: "850x480").service_url %>
It not working.

I've commented in the gist with a version that works with variants / previews

https://gist.github.com/dinatih/dbfdfd4e84faac4037448a06c9fdc016#gistcomment-2940505

@dhh
Copy link
Member

dhh commented Aug 15, 2019

Your mind boggled that people who write and share software had different needs than you do? I’m not sure that open source is a safe environment for you then. It can’t be healthy to have your mind boggled so frequently 😄

Active Storage didn’t set up to be a replacement for anything. It, like every other major framework in Rails, was extracted from actual use in a real application. It wasn’t designed on a spec of what other gems or other users might offer or expect.

That’s the beauty of open source! We share what we built to serve our own needs, others then share their enhancements based on their needs, and together we get to enjoy the combined fruits of our labor.

I spoke about how I view that whole process at RailsConf this year. Including a diagnosis and prescription for the acute case of vendoritis you’re exhibiting. Feel free to self medicate for an hour or so: https://m.youtube.com/watch?v=VBwWbFpkltg

Then, hopefully endowed with a healthier perspective, you can come back and help us make software together. Free of the misconception that you’re a customer who bought something and is owed anything ❤️

@dhh
Copy link
Member

dhh commented Aug 15, 2019

You're right. Working PRs is not the place to pontificate on how your mind boggles that "basic features" aren't implemented as you would have done them. Thanks for your understanding ✌️

@8vius
Copy link

8vius commented Aug 20, 2019

I tried @dinatih solution (thanks for posting it btw) but it doesn't seem to work for me. I'm still getting my image links expired. Anyone else having the same problem?

@yboulkaid
Copy link
Contributor

This issue has been addressed in #36729 and will be available in rails 6.1

So we now can close this issue, thank @peterzhu2118 and marvel at the beauty of open source :)

@collimarco
Copy link

@yboulkaid @peterzhu2118 Wow, that's great news! Will It be possible to migrate a bucket from private to public?

@peterzhu2118
Copy link
Contributor

peterzhu2118 commented Nov 29, 2019

@collimarco Yes, you should be able to change the access level of a bucket in S3, GCS, or Azure. See the Public access section on the edge guides that contains the instructions on how to change the access level for your bucket.

@collimarco
Copy link

Here's my final solution:
https://stackoverflow.com/a/59107484/51387

It works great for my use case, where we have public pages with many images.

@dhh
Copy link
Member

dhh commented Nov 29, 2019 via email

@collimarco
Copy link

Update: my solution above works, but I discovered an issue: currently you cannot block all public access to the bucket from s3 settings, because Rails tries to set public acl on uploads.

I am looking to build a similar solution that also allows to block all public access to bucket. In order to achieve that I need to find a solution to this question:

If I keep Rails private behavior / private bucket, is there any way to get a non-signed URL to a variant? e.g. Something like service_url(signed: false) would be great. Then you can add a CDN in front of the bucket, block all public access to the bucket, allow the IP ranges of the CDN (e.g. Cloudflare) in the bucket policy and finally use the non-signed URLs

@collimarco
Copy link

Solved! I can use the Rails default private bucket / private acl and simply allow the Cloudflare IP ranges in the bucket policy. Then when I wan to display the image from the CDN, without signatures, I use:

"https://storage.example.com/" + variant.key

storage.example.com points to Cloudflare, which proxies to the actual bucket on s3. I have updated my tutorial here.

@rails-bot
Copy link

rails-bot bot commented Feb 29, 2020

This issue has been automatically marked as stale because it has not been commented on for at least three months.
The resources of the Rails team are limited, and so we are asking for your help.
If you can still reproduce this error on the 6-0-stable branch or on master, please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions.

@rails-bot rails-bot bot added the stale label Feb 29, 2020
@rails-bot rails-bot bot closed this as completed Mar 7, 2020
@mingshenggan
Copy link

mingshenggan commented Mar 31, 2020

Did something similar for AWS Cloudfront.

Final approach: Same as @collimarco

Configure cloudfront to handle assets

  1. Create distribution
  2. Create origin group(s) - if you have 2 paths for cloudfront, you'll need 2 origin groups. For me, I did one for /assets and another for S3. For S3, I created an origin access identity so that nobody can directly access S3, instead, everything has to go through cloudfront and CF will query S3 to cache.
  3. Modify behaviors. That's like a routing table, put the obvious routes first, then a default route.

Configure application to return url
This part is quite foolproof

  1. Configure asset hosts, to what I have done so far:
# config/environments/production.rb
config.action_controller.asset_host = ENV["CLOUDFRONT_ENDPOINT"]
config.action_mailer.asset_host = ENV["CLOUDFRONT_ENDPOINT"]
  1. For home-grown assets, use asset_path. For S3, use "https://#{ENV["CLOUDFRONT_ENDPOINT"]}/#{activestorage_image.key}"

Other thoughts

  • I considered signing all my cf url, that is possible if you reference what activestorage did for s3. Basically using Aws::CloudFront directly. If I were to need this, I may build it as a separate service in activestorage. Pretty much overkill for me right now so I skipped it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests