New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ActiveStorage: Allow access to backing file from Service API #31419

Open
jduff opened this Issue Dec 12, 2017 · 59 comments

Comments

Projects
None yet
@jduff
Contributor

jduff commented Dec 12, 2017

Steps to reproduce

Currently with the ActiveStorage::Service api you can only get a link through the url method which, for most services, gives back a public URL that expires in some timeframe. It would be very useful if there was a file method that returned the backing file object from the service so that you have more flexibility in how you can expose those files.

System configuration

Rails version: 5.2.0.beta2

@rafaelfranca rafaelfranca added this to the 5.2.0 milestone Dec 12, 2017

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Dec 12, 2017

Sounds good to me. @georgeclaghorn thoughts?

@georgeclaghorn

This comment has been minimized.

Member

georgeclaghorn commented Dec 12, 2017

I’m 👎 on the same method having a different signature in each Service class. It defeats the purpose of the Service abstraction.

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Dec 12, 2017

Would a new file method require to have a different signature in each Service class?

@georgeclaghorn

This comment has been minimized.

Member

georgeclaghorn commented Dec 12, 2017

If I understand the proposal here correctly, it’d return a different type of object in every class. The file classes provided by the clients are different enough that any application using ActiveStorage::Service#file could not be indifferent to the configured service.

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Dec 12, 2017

Ah yeah, got it. I understand your concerns but I think this doesn't defeats the purpose of the service abstraction. Take the Active Job adapter as example, each implementation return a different object each job with the same API. Or the Active Record connection adapters, where ActiveRecord::Base.connection can return similar objects but with slightly different API.

This proposal is not asking us to have different methods in the Service layer, it is asking to us to allow the inner implementation to be accessible via the common API in the Service layer. Of course as soon the application stats to use the information in the inner implementation it is now coupled to that provider, but at least now people can leverage the framework in order to build more complex cases that we don't support.

@georgeclaghorn

This comment has been minimized.

Member

georgeclaghorn commented Dec 12, 2017

I’m still 👎. If you want, say, a Google Cloud Storage client, you don’t need to go through Active Storage to get one.

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Dec 12, 2017

Of course not, in the same way if you want a MySQL connection you don't need to go through Active Record, but why make it harder if we can make it easy? Is not this the whole point of Rails?

@georgeclaghorn

This comment has been minimized.

Member

georgeclaghorn commented Dec 12, 2017

The point of Active Storage is to permit Rails applications to maintain a healthy indifference to various storage services. It was borne from a specific need to keep Basecamp at arm’s length from heterogenous storage backends.

I’ve already stated my objections on those grounds and would suggest that you find a different way to accomplish your still-unstated purpose. Nonetheless, it sounds like you’re going to proceed against my objections. It’s your call.

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Dec 12, 2017

Nonetheless, it sounds like you’re going to proceed against my objections.

I'm not, this is why I asked your opinion.

I really understand your reasoning about this but I think we can make a compromise since if someone wants to couple their application to the storage it is their choice and I believe Rails could also help those users. Basecamp and other applications that users heterogenous backends would not need any change and should not care about this new method. But I can also understand your side of this being a possible sharp knife to hurt applications.

If you are still strong about it I'm totally fine with not exposing it.

@georgeclaghorn

This comment has been minimized.

Member

georgeclaghorn commented Dec 12, 2017

Sorry, I didn‘t mean to imply you’re not listening to me. I know you are and I appreciate it. ❤️

If you have a use for this, let’s add it. You’re right that we don’t have to use it.

@jduff

This comment has been minimized.

Contributor

jduff commented Dec 12, 2017

I can work around this if I really need to, but I was thinking along the lines of @rafaelfranca when he said "why make it harder if we can make it easy".

The case I have for this is we have an internal application that is storing somewhat sensitive documents and we're hesitant around have a public url, even a random one that expires, available to the files. I already added a custom controller to redirect to the file so that I can authenticate the users from the application side before exposing the url, but they could still share that expiring url with someone that doesn't have access.

In this case the storage service can also build protected urls that will use the ACL we have set up, but I need the client and the file object to call those.

Again, I can patch around this, but it sounded like a case that others could run into as well. If there is another way to get the same result, or a different API that could be added that works too, I'm not married to the idea of exposing the file object. I would rather come up with the right API that fits with the overall goal of ActiveStorage than just the thing that meets our specific need.

Thanks for talking through this with us @georgeclaghorn ❤️

@meinac

This comment has been minimized.

Contributor

meinac commented Dec 13, 2017

There is the download method and as far as I see from the implementation it returns the content of the file, doesn't it work for you?

@jduff

This comment has been minimized.

Contributor

jduff commented Dec 13, 2017

That would be a possible solution, download the file and proxy it through the app. I was hoping to get a direct url to the file that would require google account authentication, but I could handle that part in the app and proxy the file as a work around.

@dwightwatson

This comment has been minimized.

Contributor

dwightwatson commented Dec 18, 2017

I had a crack at proxying files through the app in #30465. I would like to see a way of directly accessing certain attachments rather than being given an expiring link to them.

My use case is that some of the images we upload through ActiveStorage are intended to be public. We are seeing lag when it has to generate a new expiring link, and because they are expiring it's more difficult to cache.

In addition we're using CloudFront as a CDN which caches the redirect to the asset, not the end result of the asset itself. I don't know how other CDNs tackle this sort of thing, but it effectively makes CloudFront incompatible with ActiveStorage URLs.

@georgeclaghorn

This comment has been minimized.

Member

georgeclaghorn commented Dec 18, 2017

As with the OP’s case, I think there’s a general solution to that problem that doesn’t require apps to couple themselves to the underlying clients of the various services. (It might even be the same problem.) Please Do Investigate. 😄

@matthewd

This comment has been minimized.

Member

matthewd commented Dec 18, 2017

I do think there's value, as a general principle, in providing an escape hatch to access the underlying layer: if someone can use the abstract ASt API for 95% of their needs, better we allow them to do so, without forcing them to choose between a fully custom no-ASt implementation, or manually reimplementing ASt's knowledge of how models map to entries in the store.

IMO, we lean pretty heavily to the pragmatic, rather than perfect, abstraction... an AR model will hand you the raw database connection; it'll also accept a backend-dialect-specific where condition. A goal of AR is certainly to allow an application to remain as arms-length as practical from the data store, but not so much that it obstructs people when a generic solution is unavailable.

All of that said, if we offered such a method, I think I'd want it named something slightly sharper-edged than file, more in line with AR's raw_connection -- just enough to make it sound like you're piercing a layer of abstraction.

@dhh dhh removed this from the 5.2.0 milestone Jan 8, 2018

@dhh

This comment has been minimized.

Member

dhh commented Jan 8, 2018

(Not rendering an opinion on this specifically, just don't think it's a blocker for 5.2.0. I'm inclined too to open a syntactically vinegar'ed backdoor for people to do whatever they want.)

@koenpunt

This comment has been minimized.

Contributor

koenpunt commented Jan 9, 2018

Not the same as what @jduff is asking for, but; I'm using Rails as a backend of a relatively high traffic website, with its main purpose of serving/displaying images. All these images are allowed to be public (and are configured like that on S3), so no need for signed URLs. And like mentioned by @dwightwatson, caching with signed URLs is an issue.
So for me using the url method is not a problem, but having the option to get it without signing parameters would be nice.

Of course this would be solvable with a custom implementation if the backing file would be exposed from the service.

Then related to this; how would one configure an CDN host for AS stored files (instead of the s3 urls)? Or is that currently not possible?

@wadestuart

This comment has been minimized.

wadestuart commented Jan 29, 2018

I have to say I am in the same boat as @koenpunt as it stands the architecture of AS really forces one into a very specific use case. There are large portions of apps out there that conflict entirely with the signed access/temp url endpoint use.

Is there a doc somewhere that goes over the reasoning behind some of these conventions? Perhaps stating the thought usage when the site has what seems like common cases of public assets, CDN fronts, static file delivery (without requests to rails per asset per user per time period).

IMHO the whole signed and managed url seems like it should be the optional behavior not the default.

@dhh

This comment has been minimized.

Member

dhh commented Jan 29, 2018

@dhh

This comment has been minimized.

Member

dhh commented Jan 29, 2018

You can see how little code there actually is in the default controllers here: https://github.com/rails/rails/blob/master/activestorage/app/controllers/active_storage/blobs_controller.rb

Adding your own controller that wraps this in Google Authentication or whatever scheme you please should be trivial.

@dhh

This comment has been minimized.

Member

dhh commented Jan 29, 2018

@koenpunt I'd be happy to see a patch that generated a URL without signature if false was passed to expires_in: 👍

@koenpunt

This comment has been minimized.

Contributor

koenpunt commented Jan 31, 2018

We have a full explanation in the docs about how you can make your own URLs that use a different authentication scheme by using your own controllers.

I seem to be unable to find that.. Can you point me in the right direction?

@dhh

This comment has been minimized.

Member

dhh commented Jan 31, 2018

@koenpunt Here's the default controller: https://github.com/rails/rails/blob/master/activestorage/app/controllers/active_storage/blobs_controller.rb. You basically just do that, but with your own wrapping, and then you'll have your own URLs for it 👍

@koenpunt

This comment has been minimized.

Contributor

koenpunt commented Jan 31, 2018

I'd be happy to see a patch that generated a URL without signature if false was passed to expires_in: 👍

I looked into this, and started with S3, but there the content disposition headers do not work for unsigned (public) urls, and thus you end up with a url like https://rails-as-test-1.s3.eu-central-1.amazonaws.com/rwFanLfBnQC831WjCaCMAEbh.
So unless content type and other headers are set when uploading the file, I doubt this is going to work.

It would be nice if the storage path could be configured, so that in the case of publicly accessible items, the service url can be used directly, instead of routing through Rails.

So the actual keys in the bucket would become something like:

  • rwFanLfBnQC831WjCaCMAEbh/myfile.jpg
  • variants/rwFanLfBnQC831WjCaCMAEbh/5ab8a8648821d31837ecfa2b3b5ae85f52c099af95308cd4f5c01f76882427b7/myfile.jpg
@koenpunt

This comment has been minimized.

Contributor

koenpunt commented Jan 31, 2018

Here's the default controller:

I've seen that, but I expected there to be a more since you mentioned "a full explanation" 😅

@dhh

This comment has been minimized.

Member

dhh commented Jan 31, 2018

@gregblass

This comment has been minimized.

gregblass commented Mar 6, 2018

EDIT: Alrighty, after consulting the ole rubber duck for a while, I've deleted the multiple comments I posted here and summarized to one.

For my use case, I've got a multi-tenant Rails app that I allow administrators to sign into and upload their company logo, which gets served through a GraphQL API to save app-wide settings on many mobile devices linked to that company account. So I need to just send along a public facing URL of the logo to save in some local settings on these mobile devices.

2ND EDIT: @dinatih's solution worked perfectly for me. I'd recommend anyone trying to expose public, no-expire URL's to use his code below!

@dinatih

This comment has been minimized.

dinatih commented Mar 6, 2018

Update:
gist patch to add :acl (:public or :private) options
https://gist.github.com/dinatih/dbfdfd4e84faac4037448a06c9fdc016
(it use expire_in: false when public as proposed by @dhh in #31419 (comment))

class User < ActiveRecord::Base
  has_one_attached :avatar, acl: :public
  has_many_attached :documents, acl: :private
end

For those who want ActiveStorage with public acl and no expire time for S3 you can do this :

# config/initializers/active_storage.rb
Rails.application.config.to_prepare do
  if defined?(ActiveStorage::Service::S3Service)
    # Make ActiveStorage as public-read. @dinatih
    ActiveStorage::Service::S3Service.class_eval do
      def upload(key, io, checksum: nil)
        instrument :upload, key: key, checksum: checksum do
          begin
            object_for(key)
              .put(upload_options.merge(body: io, content_md5: checksum, acl: 'public-read'))
          rescue Aws::S3::Errors::BadDigest
            raise ActiveStorage::IntegrityError
          end
        end
      end

      def url(key, expires_in:, filename:, disposition:, content_type:)
        instrument :url, key: key do |payload|
          generated_url = object_for(key).public_url
          payload[:url] = generated_url
          generated_url
        end
      end
    end
  end
end
@gregblass

This comment has been minimized.

gregblass commented Mar 8, 2018

Thanks @dinatih!!!

EDIT: This worked fantastic/perfect for me. Much appreciated!!

@MarkMurphy

This comment has been minimized.

MarkMurphy commented Mar 10, 2018

I landed on this thread looking for examples on how I might use AS with S3 and CloudFront. Currently, it doesn't look like there's any support for using a CDN.

I wonder if we could expose something similar to what Paperclip does:
https://stackoverflow.com/questions/32077746/rails-4-use-cloudfront-with-paperclip

Which is essentially to allow the S3 service to accept a host parameter for constructing the url or alternatively a callback method? IDK but the DiskService kind of already does this.

Thoughts?

@nukeproof

This comment has been minimized.

nukeproof commented Mar 10, 2018

To work with cloudfront, here's a hacky way you could add to the example that @dinatih provided above to generate public, unsigned urls with some extra code to replace the host and bucket portion of the urls with your cloudfront host.

# \config\initializers\activestorage.rb
module MyApp
  class Application < Rails::Application

    config.cloudfront_host = ENV["CLOUDFRONT_HOST"]
    
    config.after_initialize do
      if defined?(ActiveStorage::Service::S3Service)
        ActiveStorage::Service::S3Service.class_eval do
          def cloudfront_host
            @cloudflront_host ||= Rails.configuration.cloudfront_host
          end

          def proxy_url(url)
            return url unless cloudfront_host
            uri = URI(url)
            uri.host = cloudfront_host
            uri.path.gsub!("/#{bucket.name}","")
            uri.to_s
          end

          def upload(key, io, checksum: nil)
            instrument :upload, key: key, checksum: checksum do
              begin
                object_for(key).put(upload_options.merge(body: io, content_md5: checksum, acl: 'public-read'))
              rescue Aws::S3::Errors::BadDigest
                raise ActiveStorage::IntegrityError
              end
            end
          end

          def url(key, expires_in: nil, filename: nil, disposition: nil, content_type: nil)
            instrument :url, key: key do |payload|
              generated_url = proxy_url object_for(key).public_url
              payload[:url] = generated_url
              generated_url
            end
          end
        end
      end
    end
  end
end
@wbotelhos

This comment has been minimized.

wbotelhos commented May 18, 2018

Hi @nukeproof ,

I tried your patch version, but in that moment the ActiveStorage::Service::S3Service still not defined. And to use to_prepare on initializers does not see the S3Service yet too.

Someone could point me to the right direction? I'm trying this steps:

  1. Change s3.amazonaws.com/files.bucket.com to files.bucket.com.s3.amazonaws.com inside AS;
  2. Create a CNAME on Cloudflare from files to files.bucket.com.s3.amazonaws.com;
  3. Configure config.action_controller.asset_host = 'https://files.bucket.com';
  4. Avoid expires on URL to avoid change the path and MISS the cache.

Has anyone managed or needed to do these steps to cache the uploaded files on a CDN?

Thanks! (:

@nukeproof

This comment has been minimized.

nukeproof commented May 20, 2018

@wbotelhos have you defined a configuration in config/storage.yml that uses the S3 service? Something like this:

# config/storage.yml
amazon:
  service: S3
  access_key_id: <%= Rails.application.credentials.dig(:aws, :access_key_id) %>
  secret_access_key: <%= Rails.application.credentials.dig(:aws, :secret_access_key) %>
  region: "ca-central-1"
  bucket: "your.bucketname"

And use that service in your config/environment/[development|test|production].rb like this:

 # config/environment/production.rb
 config.active_storage.service = :amazon
@wbotelhos

This comment has been minimized.

wbotelhos commented May 21, 2018

Hi @nukeproof ,

Using it on production it worked.

For those who want to use Cloudflare:

  • Attach a certificate on ALB;
  • Configure SSL as Full, Full (strict) will raise 526;
  • Creates a bucket with a sub domain name like: assets.example.com;
  • Creates a CNAME to your bucket including the region like: assets -> assets.example.com.s3-sa-east-1.amazonaws.com;
  • To use @nukeproof solution with assets.example.com as the CLOUDFRONT_HOST.

Thanks @nukeproof and @dinatih for the solution.

@collimarco

This comment has been minimized.

collimarco commented Jun 21, 2018

+1 I don't like the choice to force the use of redirects. For many apps is not a viable solution.

For example, when you send web push notifications (W3C Push API), those redirects create a lot of extra load on servers and in any case you must provide a url that is valid for some weeks at least - you can't use a link that expires.

@m0n9oose

This comment has been minimized.

m0n9oose commented Jun 21, 2018

I've found a pretty simple solution to generate permalinks without expired key, but it's still serve the files through rails controller. Hope this helps.

# config/routes.rb

get '/files/:key/*filename', to: 'files#show', as: :rails_public_blob
direct :public_blob do |blob, options|
  route_for(:rails_public_blob, blob.key, blob.filename, options)
end
resource :files, only: :show
# app/controllers/files_controller.rb

# frozen_string_literal: true

class FilesController < ActionController::API
  include ActionController::Live
  before_action :setup_response_headers

  def show
    disk_service.download(params[:key]) do |chunk|
      response.stream.write(chunk)
    end
  ensure
    response.stream.close
  end

  private

  def setup_response_headers
    response.headers['Content-Type'] = params[:content_type] || DEFAULT_SEND_FILE_TYPE
    response.headers['Content-Disposition'] = params[:disposition] || DEFAULT_SEND_FILE_DISPOSITION
  end

  def disk_service
    ActiveStorage::Blob.service
  end
end

And that's it. No redefines, no monkeypatching.

# view
json.link public_blob_url(model.file)
@pskl

This comment has been minimized.

Contributor

pskl commented Jul 10, 2018

For public assets and using the answer from @georgeclaghorn in this post:

class ActiveStorage::Service::CloudfrontS3Service < ActiveStorage::Service::S3Service
  def url(key, **)
    uri = URI(super)
    uri.host = RunEnv.var!('S3_DISTRIBUTION')
    uri.to_s
  end
end
@programrails

This comment has been minimized.

programrails commented Jul 15, 2018

dinatih
I applied your code, but while (later) purging ActiveStorage attachments (made with your code), S3 (non-expiring) links remained alive (which is probably wrong).

@dinatih

This comment has been minimized.

dinatih commented Jul 17, 2018

@programrails
I do not see why, it should be ok, I do not override the purge method...
Sorry I can't help you.

@kwent

This comment has been minimized.

kwent commented Jul 17, 2018

@dinatih Any idea how we can upload the content-type as well ? Cloudfront (or even direct link to S3) is resolving my image as content-type: binary/octet-stream but really it's should be image/png in my case.

@dinatih

This comment has been minimized.

dinatih commented Jul 18, 2018

@kwent
You could pass the content_type in upload(key, io, checksum: nil)

object_for(key).put(
  upload_options.merge(
    body: io, 
    content_md5: checksum, 
    acl: 'public-read', 
    content_type: 'image/png')
)

No tested Hope it works

@kwent

This comment has been minimized.

kwent commented Jul 18, 2018

@dinatih

This comment has been minimized.

dinatih commented Jul 18, 2018

@kwent
That what I was thinking first 🙃. Take a look at my gist https://gist.github.com/dinatih/dbfdfd4e84faac4037448a06c9fdc016, see how my :acl option is passed through :

  • Blob#upload(io)
service.upload(key, io, checksum: checksum, acl: metadata[:acl], content_type: content_type)
  • Service::S3Service
def upload(key, io, checksum: nil, acl: 'private', content_type: 'image/png')
  # ...
  object_for(key).put(
    upload_options.merge(
      body: io, 
      content_md5: checksum,
      acl: acl == 'public' ? 'public-read' : 'private',
      content_type: content_type)
    )
  # ...
end

Hope it helps.

@kwent

This comment has been minimized.

kwent commented Jul 25, 2018

@dinatih That helps ! Here is a version of your gist where content type is passed along when uploading the file to s3. https://gist.github.com/kwent/0fb55eedb7eb45c9cea2676ab7760a6a/revisions

Regards

@akshaysharma096

This comment has been minimized.

akshaysharma096 commented Aug 27, 2018

@dinatih

This comment has been minimized.

dinatih commented Aug 30, 2018

@akshaysharma096 No, it is just a quick patch, sorry. You need to *class_eval Variant model too based on my override of Blob model

@yogodoshi

This comment has been minimized.

yogodoshi commented Sep 18, 2018

I'm trying to find a workaround for my use case but as I think it's a pretty common one I'm sharing it here: applications that need to have image urls as meta tags, including variants.

Right now I can't find a way to have a non-expiring url for a variant to use in the og:image meta tag. And I need it to be a variant as I can't just link a 5mb image in the meta tag.

@jeffmcfadden

This comment has been minimized.

jeffmcfadden commented Sep 18, 2018

@yogodoshi We created a VariantsController that handles requests specifically for variant images in cases like this, and proxies the data back to our CDN.

So the request path is Browser -> CDN -> Variants Controller -> Read File Data In S3, Return That Data -> CDN -> Browser

We return long-expiring headers to the CDN as well, which it passes on, to limit load on our app.

Our site is hosted on EC2; we've had no trouble with performance or load with this setup.

@ConfusedVorlon

This comment has been minimized.

ConfusedVorlon commented Sep 27, 2018

I'm a fairly unsophisticated Rails user, I though I'd add my perspective on the failure of active storage image variants to 'just work'

I just fired up Active Storage for the first time.
My index page is essentially a product list with a bunch of product thumbnails.

the active storage docs show how easy it is to

  1. upload images
  2. access variants

all this is true - except that the default implementation (for s3 at least)
a) can't cache image urls
b) adds an ~500ms delay for each image while AS checks if the variant has been processed

all this is great for secret attachments

for the 'just works' case of serving images though, it seems nuts.
(and the docs imply that ActiveStorage is a suitable solution for images and resized variants)

We don't have signing on images served through the asset pipeline - why do we have (required) signing on images uploaded through active storage?

I honestly didn't expect to be looking at monkeypatching, custom variants controllers, etc in order to show an image and a bunch of thumbnails sensibly. I expected Rails to have delivered the customary magic of 'just works'

for me, a syntax like

has_one_attached :hero_image, public: true
would make sense

and for variants, some solution whereby rails doesn't need to query s3 for each image
(at least for some defined sizes)

thanks as ever for the great work.

@xiobot

This comment has been minimized.

xiobot commented Sep 27, 2018

Looks like I'll have to give up trying to get ActiveStorage to work for my rails app.

It just doesn't suit the context of what I need, specifically to load images behind a CDN. Carrierwave as it stands is a much simpler implementation and that clearly is not great news for what should be a nice out of the box rails feature.

I'll be switching to ActiveStorage when simple permalink functionality is enabled and this convoluted approach is deprecated (which it will be I hope)

@collimarco

This comment has been minimized.

collimarco commented Sep 27, 2018

@xiobot I agree with you that ActiveStorage is not suited for all applications at the moment

@huacnlee

This comment has been minimized.

Contributor

huacnlee commented Nov 1, 2018

I wrote an Active Storage use example with the Public URL generate.

https://github.com/huacnlee/rails-activestorage-example

@fleck

This comment has been minimized.

fleck commented Nov 17, 2018

I have a PR to add direct linking and proxying to active storage #34477 still needs work, but feedback would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment