Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding validations for size and content_type #35390

Closed
wants to merge 1 commit into from
Closed

Adding validations for size and content_type #35390

wants to merge 1 commit into from

Conversation

abhchand
Copy link
Contributor

Background

Adds ActiveModel validations for Size and Content Type to Active Storage attachments.

This feature was discussed here on the rails-core mailing list in late 2018.

The above discussion also mentions why it is difficult to validate Presence on Active Storage attachments. A separate change can/will be filed after that discussion reaches a resolution.

Scope

  • Adds Active Storage validations on Size
  • Adds Active Storage validations on Content Type
  • Adds validation helper: validates_attachment()
  • Minor fix when loading the _blob association (more information inline)
  • Tests
  • Updating Active Storage README
  • Updating Rails Edge Guides
  • Updating CHANGELOG

Validating Size

Adds validators to validate the size (in bytes) of the attached Blob object:

validates :avatar, attachment_size: { in: 0..1.megabyte }
validates :avatar, attachment_size: { minimum: 1.kilobyte }
validates :avatar, attachment_size: { maximum: 12.megabytes }

Also accepts a Range as a shortcut option for :in:

validates :avatar, attachment_size: 0..1.megabyte

Validating Content Type

Validates the content type of the attached Blob object:

validates :avatar, attachment_content_type: { in: %w[image/jpeg image/png] }
validates :avatar, attachment_content_type: { not: %w[application/pdf] }

Also accepts a Array or String as a shortcut option for :in:

validates :avatar, attachment_content_type: %w[image/jpeg image/png]
validates :avatar, attachment_content_type: "image/jpeg"

Validation Helper

Provides a more readable validation helper named validates_attachment() which provides the same functionality as validates() but does not require the attachment_ prefix on keys:

validates_attachment :avatar, size: { in: 0..1.megabyte }, content_type: "image/jpeg"

The inspiration here came from Thoughtbot's Paperclip library which implements similar functionality.

Thanks

Copy link
Contributor Author

@abhchand abhchand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @igorkasyanchuk. 👋 I opened this PR as I couldn't get in contact with the previous PR owner.

Also as I outlined there I think it needed several test scenarios included here since validation functionality should be something end users can depend on not breaking.

Also tagging @connorshea since he's been reviewing some of this functionality too. 🔎

Really appreciate everyone's feedback here, thanks!

```

See the [rails guides](https://edgeguides.rubyonrails.org/active_storage_overview.html#validations) for more information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with the trend of keeping the README. short with basic examples. It links to the rails guide as a more official source for documentation.


ActiveSupport.on_load(:i18n) do
I18n.load_path << File.expand_path("active_storage/locale/en.yml", __dir__)
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loads the translation file. Similar pattern used in other places in Rails (e.g. see ActiveModel)

@@ -30,6 +30,7 @@ def upload

def save
record.public_send("#{name}_attachment=", attachment)
record.public_send("#{name}_blob=", blob)
Copy link
Contributor Author

@abhchand abhchand Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is a bug or not. Realized that the #{name}_blob association is not getting properly loaded when attaching a single attachment.

I wrote out a much more detailed description of this in the commit message description itself: see 139e4b4

It's a very small change but I added as its own commit since it's not fully related to the core PR (although it was impacting my tests for it).

I'm happy to remove it if needed. Looking forward to thoughts and feedback here.

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please break this out into its own PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@georgeclaghorn - Done! Opened #35412 and removed the commit from here.

messages:
in_between: "must be between %{minimum} and %{maximum}"
minimum: "must be greater than or equal to %{minimum}"
maximum: "must be less than or equal to %{maximum}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size needed default translations for :minimum, :maximum and :in (key name :in_between).

Content Type validation leverages exiting translations in ActiveModel for :inclusion and :exclusion


def parse_shortcut_options(options)
_parse_validates_options(options)
end
Copy link
Contributor Author

@abhchand abhchand Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leverage existing logic inherited from ActiveModel, just aliasing it to a name that better conveys what is being done here.

@@ -0,0 +1,91 @@
module ActiveStorage
module Validations
class AttachmentContentTypeValidator < BaseValidator
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attachment Validators share a lot of common code, so I abstracted it to a common BaseValidator.

when marked_for_deletion? then []
else
@record.send(blob_association)
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Active Storage attachments are external to the record itself, they have safeguards that prevent them from uploading to the storage service until the record itself is succcesfully saved (i.e. attachments are saved/applied in an after_save callback)

Because of this, record model validations run before active storage blobs get saved. So validation here becomes a bit tricky: we have to validate based on what the attachment changes we expect to be applied once the record is saved.

We do this by looking at the contents of @attachment_changes.

  • If it's marked for creation then we pull the list of blobs we have already create or expect to create
  • If it's marked for deletion we wont have any blobs


assert_not_empty @user.highlights_attachments
assert_equal @user.highlights_blobs.count, 2
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the test directly below are part of the separate commit mentioned above.

Happy to modify as needed, but I thought it would be good to have a test to ensure that the associations get loaded when creating an attachment. I suppose this would be the default expected behavior for a model field, but attachments work differently from regular model fields, so good to have confirmation here.

@@ -0,0 +1,420 @@
# frozen_string_literal: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests the following scenarios.

  1. This may seem like a lot of scenarios but even a "simple" validation covers a lot of these use cases. Plus the end users of Rails are expecting Active Storage validations to be as flexible and robust as validation functionality provided by other Rails libraries.

  2. Each test tests both one and many attachments, as well as a validation failure followed by a succesful save. Consolidated all these into one test to make runtime faster, reasonably group similar functionality, and prevent a very long list of tests.

"record has no attachment"
"new record, creating attachments"
"persisted record, creating attachments"
"persisted record, updating attachments"
"persisted record, updating some other field"
"persisted record, destroying attachments"
"destroying record with attachments"
"new record, with no attachment"
"persisted record, with no attachment"
"destroying record, with no attachment"
"specifying :in option as String"
"specifying :not option"
"specifying :not option as a String"
"specifying no options"
"specifying redundant options"
"validating with `validates()`"
"validating with `validates()`, String shortcut option"
"validating with `validates()`, Array shortcut option"
"validating with `validates()`, invalid shortcut option"
"validating with `validates_attachment()`"
"validating with `validates_attachment()`, String shortcut option"
"validating with `validates_attachment()`, Array shortcut option"
"validating with `validates_attachment()`, invalid shortcut option"
"validating with `validates_attachment_content_type()`"
"specifying a :message option"
"inheritance of default ActiveModel options"

Run it with

cd activestorage/
bundle exec ruby -Itest test/models/validations/attachment_content_type_validator_test.rb

@@ -0,0 +1,395 @@
# frozen_string_literal: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, we test the following scenarios here:

"record has no attachment"
"new record, creating attachments"
"persisted record, creating attachments"
"persisted record, updating attachments"
"persisted record, updating some other field"
"persisted record, destroying attachments"
"destroying record with attachments"
"new record, with no attachment"
"persisted record, with no attachment"
"destroying record, with no attachment"
"specifying :minimum option"
"specifying :maximum option"
"specifying both :minimum and :maximum options"
"specifying no options"
"specifying redundant options"
"validating with `validates()`"
"validating with `validates()`, Range shortcut option"
"validating with `validates()`, invalid shortcut option"
"validating with `validates_attachment()`"
"validating with `validates_attachment()`, Range shortcut option"
"validating with `validates_attachment()`, invalid shortcut option"
"validating with `validates_attachment_size()`"
"specifying a :message option"
"inheritance of default ActiveModel options"

Run it with

cd activestorage/
bundle exec ruby -Itest test/models/validations/attachment_size_validator_test.rb

=== Size

Validates the size (in bytes) of the attached `Blob` object:

    validates :avatar, attachment_size: { in: 0..1.megabyte }
    validates :avatar, attachment_size: { minimum: 17.kilobytes }
    validates :avatar, attachment_size: { maximum: 38.megabytes }

Also accepts a `Range` as a shortcut option for `:in`:

    validates :avatar, attachment_size: 0..1.megabyte

=== Content Type

Validates the content type of the attached `Blob` object:

    validates :avatar, attachment_content_type: { in: %w[image/jpeg image/png] }
    validates :avatar, attachment_content_type: { not: %w[application/pdf] }

Also accepts a `Array` or `String` as a shortcut option for `:in`:

    validates :avatar, attachment_content_type: %w[image/jpeg image/png]
    validates :avatar, attachment_content_type: "image/jpeg"

=== Validation Helper

This commit also provides a more readable validation helper named `validates_attachment()`
which provides the same functionality as `validates()` but does not require the
`attachment_` prefix on keys:

    validates_attachment :avatar, size: { in: 0..1.megabyte }, content_type: "image/jpeg"
@abhchand
Copy link
Contributor Author

@connorshea - Thanks so much for proofreading! Incorporated all your suggested changes 👍

@reckerswartz
Copy link

@abhchand
do you want help in adding ?

  • validates if file(s) attached / Validation Helper
  • validates number of uploaded files (min/max required) / Validation Helper

@connorshea
Copy link
Contributor

I'd say for this PR we should keep the scope limited, adding more validations in separate PRs would make it much easier to review and probably more likely to be merged.

@abhchand
Copy link
Contributor Author

abhchand commented Mar 4, 2019

Agreed with @connorshea on keeping this PR small, but those are great ideas for a follow up PR @reckerswartz 💯

@abhchand
Copy link
Contributor Author

abhchand commented Mar 8, 2019

Hey @georgeclaghorn 👋 - You had reviewed this once before, just checking back in to see if you had any further requested changes? Thanks!

@reckerswartz
Copy link

any update on getting this merged?

@connorshea
Copy link
Contributor

@georgeclaghorn apologies for another ping, could you comment on whether this could make it into Rails 6? I'm fine with waiting, I just want confirmation :)

@georgeclaghorn
Copy link
Contributor

Sorry, I’m afraid it probably isn’t going to make Rails 6. The first RC is coming any day now.

@connorshea
Copy link
Contributor

I'll look forward to 6.1 then :) Thanks for answering.

@krbullock
Copy link

@abhchand @connorshea What do you think of releasing this as a gem until it can be merged into core? The existing active_storage_validations gem has some shortcomings as you've mentioned, including (in my limited testing) a bug that allows an invalid attachment to replace an existing one. I'm willing to help with publishing and maintaining it as a gem.

@reckerswartz
Copy link

@krbullock if possible please open issue active_storage_validations about the bug. I am sure @igorkasyanchuk will help you get the bug fix until this merge into rails, there are a lot of users using active_storage_validations gem who are running rails 5. we are happy to provide support for them too instead of creating a new gem. 👨

@abhchand
Copy link
Contributor Author

👍 on keeping this in rails core versus a new gem. In fact the initial reasoning for opening this was for the validations to be part of rails core and community maintained.

Anyone know what the timeline for 6.1 is? Just curious, and wasn't sure where to look that up.

Thanks again to everyone's contributions here so far!

@connorshea
Copy link
Contributor

@georgeclaghorn since Rails 6.0.0 final seems like it should be landing quite soon and 6.1 development is (I think?) in full swing now, could this get a review? :)

@rails-bot
Copy link

rails-bot bot commented Dec 17, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@rails-bot rails-bot bot added the stale label Dec 17, 2019
@connorshea
Copy link
Contributor

@georgeclaghorn @rafaelfranca Hi, sorry for the ping. I wanted to ask again about getting this PR reviewed since it's been a few months since I asked. It has a conflict in the Changelog, but other than that it should work fine.

cc: @abhchand in case you're around to rebase it.

@rails-bot rails-bot bot removed the stale label Dec 18, 2019
@georgeclaghorn
Copy link
Contributor

Sorry for the delay. I’ve been procrastinating on a longer response to this.

The long and short of it is that @dhh and I discussed Active Storage validations a while ago and weren’t comfortable with shipping an implementation that doesn’t cover direct uploads, which are supposed to be the common case. We’re unlikely to merge PRs that implement Active Storage validations until we at least have a plan to address direct uploads.

@murdoch
Copy link

murdoch commented Dec 20, 2019

What can people do to help get this moving?
How do you validate direct uploads in Basecamp?
Could we borrow code from there?
Is there an existing front-end technology we can borrow tests from?

rjlynch added a commit to DFE-Digital/curriculum-materials that referenced this pull request Feb 21, 2020
Note:
AS validations aren't built into rails as they won't work with the AS
direct upload flow (client js puts file to signed url)
See this comment
rails/rails#35390 (comment)
I don't think this will be an issue for us as, at the minute atleast, it
seems like we'll be pushing file into our api, rather than havign client
js push it into our bucket.
Worth noting this potential gotcha around validations though.
rjlynch added a commit to DFE-Digital/curriculum-materials that referenced this pull request Feb 21, 2020
Cmb 476/create resource model

Context
We needed a model to represent an activity's resources.

Changes proposed in this pull request
Adds teacher and pupil resources as has_many_attached on the activity model.

Guidance to review
Based on the current state of the prototype it looks like active_storage models give us all the attributes we need for displaying information about an activities resources.
I've also added some basic validations for attachments, it's worth noting that these validations wont run if using the active storage direct client upload, but given what we know about the app so far I don't expect us to have js doing direct uploads from the client (see rails/rails#35390 (comment)).

I'll open a separate pr and issue(#72) to handle adding file uploads to the api.
@bsodmike
Copy link
Contributor

Bump, this would be nice to have :)

@georgeclaghorn
Copy link
Contributor

No bumps needed here, "just" a champion. If you’re interested, put together a well-considered pitch for a validation implementation that covers both direct and indirect uploads. Share here or in the rubyonrails-core mailing list.

@rails-bot
Copy link

rails-bot bot commented May 28, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@rails-bot rails-bot bot added the stale label May 28, 2020
@rails-bot rails-bot bot closed this Jun 4, 2020
@stevenbuccini
Copy link

stevenbuccini commented Jun 5, 2020

Hello all, I would like to volunteer myself as tribute. I would love to hear your thoughts on the following proposal, specifically @georgeclaghorn in his role as the maintainer of this project.

Background

There is demonstrated interest among Rails users in adding validations to uploaded files. Developers seem most interested in validating content type and size requirements, so we will use this as our starting point. However, any implementation should provide a generic framework so additional validations can be provided in the future.

One major issue is that there is no way to validate files before they are uploaded to the service. Any implementation would ideally analyze the file before beginning the upload process.

Indirect uploads

I propose using this PR as the basis for validating uploads. It is based on battle-tested concepts and code from Paperclip (ActiveStorage's predecessor) and the active_storage_validations. Seeing that there haven't been any strong objections to either the technical approach or implementation, I do not believe we need to spend additional time discussing this aspect of the change.

Direct uploads

Rails maintainers have stated that this supposed to be the common path for handling file uploads in Rails going forward. We note that client-side validations are helpful regardless of whether or not direct uploads are being used.

However, we also note that client-side validations are inherently insecure. The user has complete control over the code and can construct adversarial payloads that hit the upload endpoint directly. Individual cloud providers may provide ways to enforce certain validations (including content type and size, which are the subject of these validations), but we will not address or rely on those features here as we wish to remain platform-agnostic.

In summary, we propose a method for doing "best-effort" client-side validations with the understanding they will not provide any security guarantees.

We propose extending the file_field form helper to accept a validates key with a value of type Hash.

<%= form.file_field :file,
     validates: { 
         size: {
             in: <numeric range>
             min: <integer> || default: 0
             max: <integer> || default: Infinity
         },
         content_type: <string> || <Array of strings>
      }
 %>

This would allow the developer to re-use the same constants they might use on the backend (e.g. MAX_FILE_SIZE = 1.megabyte) in their client-side validation. We would endeavor to replicate the inputs and behavior already laid out for the indirect upload case in this PR.

The code for these validations would be included in the @rails/activestorage NPM package in a new file, validators.js. When the input is being rendered by ActiveView, we can parse these arguments. For each validation, we will add an event listener onto the already-specified direct-upload:initialize event, with the callback being the appropriate function in validators.js.

If the validators fail, we fire a direct-upload:error event with an appropriate message so that the client can notify the user appropriately.

Path Forward

Indirect and direct uploads provide inherently different security postures. Indirect uploads take a latency and compute hit in order to guarantee certain assumptions. Direct uploads can never be relied upon to provide any sort of guarantee. For that reason, I view lack of validators for indirect uploads as a more pressing issue than the lack of direct upload validators. I suggest we attempt to move ahead with getting this approved and merged while we proceed on developing direct upload validations.

Additionally, if it is the view of the maintainers is indeed that the direct uploads "are supposed to be the common case", I would advocate for explicitly mentioning this in the documentation, especially if that will be guiding Rails-core development and design choices going forward as it is in this PR. In my opinion, with the way the ActiveStorage guide is currently laid out, direct uploads appears to be the alternative to the indirect upload option.

Thanks for your consideration, and I welcome any and all feedback.

@leesmith
Copy link

leesmith commented Jun 5, 2020

@stevenbuccini Indirect and direct uploads provide inherently different security postures. Indirect uploads take a latency and compute hit in order to guarantee certain assumptions. Direct uploads can never be relied upon to provide any sort of guarantee.

For those very reasons, maybe it's best that these solutions be separate gems. Even if this PR doesn't get merged, you can still use @igorkasyanchuk's gem that doesn't have all the complexity and overhead that validating direct uploads will require. That's complexity I won't need so that's a win in my book.

@SampsonCrowley
Copy link
Contributor

SampsonCrowley commented Nov 6, 2020

@here sorry to necrobump, but I don't see why client-side validations should be involved at all for implementing server-side validations. DirectUpload client validations should be their own separate issue/PR.

Regardless of direct or indirect, at save time, the validations on the given blob should be the same for the server.

@georgeclaghorn
Copy link
Contributor

My previous comment was unclear. Direct and indirect upload validation don’t need to be implemented together. But direct uploads are intended to be the golden path, so we need to have a straight shot at direct upload validation before we’re going to accept a PR for indirect upload validation.

I have some thoughts on the previously suggested approaches that I haven’t written up yet.

@SampsonCrowley
Copy link
Contributor

SampsonCrowley commented Nov 6, 2020

@georgeclaghorn my point is that indirect vs direct is completely irrelevant to the only feature missing that is absolutely needed.

Client-side validation of files is already trivial, and mostly dependent on your front end. Really as simple as accept=mime_types for even those without javascript knowledge. not to mention only icing on the cake, as all front end validation is insecure by definition.

The only missing feature here is that when it comes time to save the records, we don't have a "rails" way to verify that what was uploaded is valid; and validations in the model shouldn't need to know at all whether the file was indirectly or directly uploaded. The only validations that matter in terms of security are at the server level, and those validations shouldn't care how the file was uploaded, because the server has access to the file in it's uploaded location

If someone wants to add an easier way to validate on the client side, then great, but it doesn't have anything to do with what's needed to have validations for attachments at the server level

@SampsonCrowley
Copy link
Contributor

I guess I just don't see why an inherently insecure feature should be blocking a glaring gap in what is validatable by the server

@georgeclaghorn
Copy link
Contributor

The only missing feature here is that when it comes time to save the records, we don't have a "rails" way to verify that what was uploaded is valid; and validations in the model shouldn't need to know at all whether the file was indirectly or directly uploaded.

We want to validate blobs before uploading them, so as to avoid storing invalid blobs. The goal isn’t just refusing to create DB records for invalid blobs, it’s refusing to store the invalid blobs themselves.

The only validations that matter in terms of security are at the server level

I agree. “Direct upload validation” and “client-side-only validation” aren’t synonyms, and any proposal for direct upload validation will need to speak to this.

@SampsonCrowley
Copy link
Contributor

We want to validate blobs before uploading them, so as to avoid storing invalid blobs. The goal isn’t just refusing to create DB records for invalid blobs, it’s refusing to store the invalid blobs themselves.

The only way to do that is on the client side. And while I'm in absolutely no way against that being an additional feature, I don't agree that something that can only be validated on the client side should block the core validations that would still need to be ran after it was validated and uploaded from being implemented.

In terms of efficiency, being able to prevent bad uploads before they start is absolutely a goal to work toward, but doesn't change the fact that the files that would be allowed in those cases would still need to then be validated post-upload the same way as indirect uploads.

The base server-side validations are the building blocks and much more important than an efficiency upgrade

@SampsonCrowley
Copy link
Contributor

Start with protecting the database, then add the ability to avoid the upload before it starts

@georgeclaghorn
Copy link
Contributor

I’ve already clarified that parity between direct and indirect uploads is important to us. We’re not going to ship a validation feature that straight-up ignores the golden upload path, so this will continue to wait on a proposal for direct upload validation.

@SampsonCrowley
Copy link
Contributor

Server level validations that work for both methods is not ignoring the "golden path" in any way whatsoever. It is implementing the more important protections first

@SampsonCrowley
Copy link
Contributor

Preventing uploads before they start is an optimization.

You shouldnt prevent all validations in the name of an optimization

@georgeclaghorn
Copy link
Contributor

I don’t agree that the database is the most important resource to protect here, or that refusing to store invalid blobs is an optimization, for the reasons I’ve outlined above. That means I also don’t agree this “work[s] for both methods.” It’s just not going to fly for the golden path to store invalid blobs and only refuse to create DB records.

I’ve already said that I had a proposal in mind to write up. Here are the high-level points:

  • To start, we’d offer validations for the most commonly-validated blob attributes, content_type and byte_size. We could support validating filename, too, but I can’t think of a reason I’d want to validate it. Rough API sketch:

    validates_attached :avatar, content_type: { format: { with: /\Aimage\//, message: "must be an image" } }
    validates_attached :avatar, byte_size: { maximum: 50.megabytes, message: "can't be larger than 50 MB" }
  • The direct upload preflight blob creation request already sends content_type and byte_size attributes. We’d validate these attributes upfront, refusing to create a blob record and return a signed direct upload URL if they’re invalid. (This part is what you proposed, basically.)

  • Yes, because only server-side validation is secure, we’d need to validate content_type again on attach, after sniffing the uploaded blob’s magic number. If the sniffed content_type is invalid, we’d refuse to attach the blob and purge it.

  • I don’t think we need to validate byte_size again on attach because signed direct upload URLs account for the Content-Length header. If I’m wrong about that and we have to validate byte_size again, fine. Same as above.

Now that I wrote this summary, I think this may be pretty close to what you had in mind. Sorry if I was talking past you.

I’m focused on other things at the moment, so I won’t be able to flesh this out into a fuller proposal or a PR for a while. If someone else wants to run with it, please do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants