Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to and implement the app to properly and securely enforce restrictions on access to files on S3 #832

Closed
11 tasks done
jrochkind opened this issue Aug 3, 2020 · 5 comments
Assignees

Comments

@jrochkind
Copy link
Contributor

jrochkind commented Aug 3, 2020

We could go over to all "signed" S3 URLs, with the app gatekeeping whether a user can get a signed url. but the problems with this are potentially:

  • could interfere with google indexing of our images and other assets, if their URLs are different on every page load.
    • Right now, we include S3 non-changing "public" URLs in our google sitemap, which we think is what has led to succesful indexing of lots of our images in google images. These are also the URLs we actually use on pages to display thumbs. (in multiple resolutions per thumb).
    • For originals, we have an app URL that redirects to a signed (different on every access) S3 URL, with a HTTP 302. (Temporary redirect). Will this still allow google to index them? Well, Google is still indexing 40 of our OH PDF's that are delivered like this, so it's not an absolute barrier. Could it be related to why Google is not indexing the other ~200 though? Hard to say.
    • The redirect approach might be too slow/expensive for thumb src in pages anyway, when we have hundreds of thumb urls on a page. We don't really want those hundreds of extra requests to our app per page load.
  • could be a performance problem for pages with hundreds of asset URLs?

Plan

Our originals are already treated as non-public, we will make sure originals bucket has public access blocked, the app is already set up to work with that.

Derivatives are the problem, due to performance and efficiency problems with delivering all the thumbnails.

After much analysis, the least bad thing to do is: Leave existing derivatives how they are on a public S3 bucket, but provide an additional flag on Assets for "secure derivative storage type". If an asset has that flag set, derivatives will be stored in a different location, in a public-access-blocked S3 bucket.

Such assets for now will generally not have thumbnails that can be delivered. As our use case is restricted OH content (PDF and MP3) that won't be shown to the public with thumbnails, and doens't really have useful thumbs to show staff either. So we'll work with what we got.

Steps

@jrochkind
Copy link
Contributor Author

Don't forget about combined audio derivatives too

@jrochkind
Copy link
Contributor Author

jrochkind commented Aug 13, 2020

Some outstanding questions:

@jrochkind
Copy link
Contributor Author

Also recall that we can't break OHMS editor, which may not follow redirects and isn't okay with query parameters in URLs for audio files. Doh!

@jrochkind
Copy link
Contributor Author

Other challenges with putting presigned URLs directly on the page as an img src... is that it breaks HTTP caching, as the cached page may have expired URLs on it.

This is probably why ActiveStorage does the redirect technique.

@jrochkind
Copy link
Contributor Author

The "every img src a redirect" appraoch in ActiveStorage DOES give some people trouble with too much traffic to rails server. rails/rails#34552

There are some discussions about what to do about this on the web. Most of the solutions proposed with ActiveStorage assume all your files are public, they are about getting ActiveStorage to do something more like we do right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant