Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store newly-uploaded files on save rather than assignment #33303

Merged
merged 7 commits into from Jul 8, 2018

Conversation

georgeclaghorn
Copy link
Contributor

@georgeclaghorn georgeclaghorn commented Jul 6, 2018

At present, when an uploaded file is assigned to a record via a writer method added by has_one_attached or has_many_attached, it’s persisted to storage immediately. For example, the following causes an uploaded file in params[:avatar] to be stored:

@user.avatar = params[:avatar]

We’d like to support validating blobs attached to records, but for validations to be effective, they ought to prevent invalid files from being stored. It’s of little use to identify an invalid file after it’s already been shipped off to storage: you might use a size validation to limit the cost that a single file can add to your AWS bill, but if the file is stored before validations run, you incur its cost regardless.

This PR changes Active Storage to store attachables assigned to a record after the record is saved rather than immediately. To that end, it introduces change tracking for Active Storage attachments. When you use a writer method added by has_one_attached or has_many_attached, Active Storage tracks a pending change to the record’s attachments. The change is persisted in the application’s database when the record is saved, and the attachables involved are uploaded to storage only after the save transaction commits.

Included are supplementary reorganization and expansion of the attachment API’s tests.

@georgeclaghorn georgeclaghorn force-pushed the activestorage-change-tracking branch 3 times, most recently from b8fbd4c to 37ab8f3 Compare July 6, 2018 22:04
Copy link
Contributor

@kaspth kaspth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I attempted a cursory review 😄

end
CODE

has_one :"#{name}_attachment", -> { where(name: name) }, class_name: "ActiveStorage::Attachment", as: :record, inverse_of: :record, dependent: false
has_one :"#{name}_attachment", -> { where(name: name) }, class_name: "ActiveStorage::Attachment", as: :record, inverse_of: :record, dependent: :destroy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a deprecation cycle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn’t change behavior. We did this before by calling purge_later or detach on destroy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, so it's just more obvious now.


after_commit(on: [:create, :update]) do
begin
public_send("#{name}_change").try(:upload)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use &.upload like above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Delete* changes don’t respond to upload. They could implement upload and do nothing, and I had them do that originally, but it seemed odd to me.

@@ -10,11 +10,11 @@ class Attached::One < Attached
# You don't have to call this method to access the attachment's methods as
# they are all available at the model level.
def attachment
record.public_send("#{name}_attachment")
change.present? ? change.attachment : record.public_send("#{name}_attachment")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change&.attachment || record.public_send("#{name}_attachment")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky: we don’t want to fall back to the attachment in the DB when a Delete* change is pending.


blob_was.purge_later if blob_was && dependent == :purge_later
if !attached? || new_blob != blob
write_attachment build_attachment(blob: new_blob)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this write a "change" to the accessor? I don't understand how calling attach would now ensure the Attachment got uploaded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can explore implementing attach in terms of the writer in the future. I chose to preserve its existing behavior for now. It uses create_blob_from, which uploads as necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it 👍 There's a fair amount of different flows here 😄

def build_attachment(blob:)
ActiveStorage::Attachment.new(record: record, name: name, blob: blob)
Attachment.new(record: record, name: name, blob: blob)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also strip ActiveStorage:: from the change classes.

@@ -15,17 +15,18 @@ class ActiveStorage::Attachment < ActiveRecord::Base
delegate_missing_to :blob

after_create_commit :analyze_blob_later, :identify_blob
after_destroy_commit :purge_dependent_blob_later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're switching to delete below because of this, right? Is this callback just to ensure that we always honor the dependent: :destroy from the macros?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This, along with the switch to dependent: :destroy on attachment associations, keeps changes from having to track removed attachments and purge them on commit: they can just replace the associated attachments and let AR destroy the removed ones.

end

def clear_#{name}_change
@active_storage_attached_#{name}_change = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to replace these ivars with a private/nodoc'ed attachment_changes hash in the vein of Active Model's changes. Then the change classes could use that and skip the need for public_send.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I'll hook that up.

@georgeclaghorn georgeclaghorn merged commit e8682c5 into master Jul 8, 2018
@georgeclaghorn georgeclaghorn deleted the activestorage-change-tracking branch July 8, 2018 03:25

def upload
case attachable
when ActionDispatch::Http::UploadedFile, Rack::Test::UploadedFile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we first check if these constants are defined? Are we sure they're guaranteed to be required by now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure they're guaranteed to be required by now?

Yes. They’re required in lib/active_storage/attached.rb.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks!

@yves-vogl
Copy link

Thanks!

@tobscher
Copy link

@georgeclaghorn That is some excellent work to improve ActiveStorage. It looks like this would potentially fix issue #31985.

If this fixes the issue I mentioned before is there any way to benefit from this fix prior to Rails 6?

cbothner added a commit to galahq/gala that referenced this pull request Oct 12, 2018
In Rails < 6, attachment changes are persisted as soon as they are
assigned, regardless of whether the record they’re attached to is
successfully saved. That’s why Rails < 6 doesn’t support validation of
attachments. We need to, but the model is left in an inconsistent state
if validation fails: the old blob is deleted as soon as the new
attachment is assigned, but if validation fails the old attachment is
preserved, and points to a dead blob.

Therefore, we’re monkey patching rails to proactively delete any
extant attachments that point to a blob before purging it. This will
only be necessary until Rails 6 comes out, because of
rails/rails#33303
abhaynikam referenced this pull request Apr 23, 2019
In Rails updating an Active Storage relation will now replace the entire
association instead of merely adding to it.
#35817 (comment)

Fixes #35817

cc @georgeclaghorn
@@ -9,7 +9,7 @@ class Attached::Many < Attached
#
# All methods called on this proxy object that aren't listed here will automatically be delegated to +attachments+.
def attachments
record.public_send("#{name}_attachments")
change.present? ? change.attachments : record.public_send("#{name}_attachments")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this either returns an Array or ActiveStorage::Attachment::ActiveRecord_Associations_CollectionProxy which makes the interface quite a moving target. with the latter i can do something like attachments.where(...) whereas that breaks when getting an array.

suketa added a commit to suketa/rails_sandbox that referenced this pull request May 18, 2019
Store newly-uploaded files on save rather than assignment
rails/rails#33303
suketa added a commit to suketa/rails_sandbox that referenced this pull request May 24, 2019
Store newly-uploaded files on save rather than assignment
rails/rails#33303
@pjmartorell
Copy link

@georgeclaghorn Is there any plan to add this fix/feature to Rails 5.2 as well? it would be really appreciated 🙏

claudioclutter added a commit to clutter/mdma that referenced this pull request Jun 6, 2019
claudioclutter added a commit to clutter/mdma that referenced this pull request Jun 6, 2019
In Rails 6, the internals of Active Storage have changed: files
are only uploaded after a model is saved. Specifically the are
uploaded inside an `after_commit_update`, see:

rails/rails#33303

Since our code also uses `after_commit_update` to run the job
that analyzes that file to extract the manifest, we have to
make sure that our own callback is run **after** the Active Storage
one. Adding `prepend: true` accomplishes that.
@inopinatus
Copy link
Contributor

This PR broke the usual cycle of validation before save.

The justification appears to be cost optimisation for folks like Github and Basecamp.

There is now no official way to access content prior to upload.

I’m now seeing repeated “wtf?”-level questions as a result in a variety of public and private fora.

There’s a job to finish here, because out in meat-and-potatoes business apps land where we just want to validate content before save as normal, programmer happiness has been diminished.

@georgeclaghorn
Copy link
Contributor Author

georgeclaghorn commented Sep 10, 2019

This PR broke the usual cycle of validation before save.

This PR made it possible to validate file uploads at all. Previously, files were indiscriminately uploaded before validation.

The justification appears to be cost optimisation for folks like Github and Basecamp.

No.

There is now no official way to access content prior to upload.

There wasn’t a way to access pre-upload file data in 5.2, either.

I’m now seeing repeated “wtf?”-level questions as a result in a variety of public and private fora.

There’s a job to finish here, because out in meat-and-potatoes business apps land where we just want to validate content before save as normal, programmer happiness has been diminished.

I’m going to take a look eventually, but I’ve been repeatedly annoyed by your tone on here. Adjust it if you’d like to participate further.

Edit: I misread Josh’s comment as a demand directed at me. I apologize for reacting so defensively.

@inopinatus
Copy link
Contributor

inopinatus commented Sep 11, 2019

There wasn’t a way to access pre-upload file data in 5.2, either.

Just to be clear, since there might be a terminology misunderstanding here:

  • By "save", I mean precisely calling #save on a model object, and not any more general notion, and definitely not as as a synonym for uploading a blob; and

  • By "validate", I mean using Active Model validation callbacks to check the validity of a model's attributes prior to its insertion into the database, and preventing that insertion if validation fails, and including in this the desire to check the validity and acceptability of the content of attachment fields on that model.

In 5.2.3, there was no moment of "pre-upload" at all. As you say, attachments are always uploaded straight away. As a result, blob#download always works if there is an attachment at all, and this validation has been working perfectly well in 5.2.3:

class ImageValidator < ActiveModel::EachValidator
  def validate_each(record, attribute, value)
    unless value.attached?
      record.errors.add attribute, :blank
      return
    end

    if options[:size]
      record.errors.add attribute, :size unless options[:size] === value.byte_size
    end

    if options[:content]
      record.errors.add attribute, :content unless options[:content] === value.download
    end

    record.errors.add attribute, :unrepresentable unless value.representable?
  end
end

class User < ApplicationRecord
  class ContentCheck
    # Dummy content check for demonstration purposes.
    def ===(content)
      Digest::SHA256.hexdigest(content) =~ /\A[a-f]/
    end
  end

  has_one_attached :avatar
  validates :avatar, image: { size: ...1.megabyte, content: ContentCheck.new }

  #...
end

In 6.0.0, this throws an exception in the validator when it hits value.download. The context here is super mundane: submitting a form with a file field to a #create action which does @user = User.new(params) and then if @user.save and so on. As with so many business apps, it really is that dull and ordinary a case.

To put it simply, I am only concerned about having access to the content, whether it is in its final resting place or not, and perceive the loss of that as a regression. It is the model's validation-before-save that matters to us, and not whether or not a loose blob made it to S3 and needs cleaning up later.

I can work around this in a few ways; the simplest, especially for those doing a 5.2 -> 6.0 upgrade, seems to be by using an undocumented internal API, c.f. #37005.

@inopinatus
Copy link
Contributor

Edit: I misread Josh’s comment as a demand directed at me. I apologize for reacting so defensively.

George, my remark was evidently ambiguous in that regard and I apologise both for the miscommunication and that it stung you.

@jbielick
Copy link

jbielick commented Feb 8, 2022

It would have been helpful for this information to be in the guides if not the changelog / upgrade guide itself. These changes were breaking and had no deprecation cycle. The work here is beneficial for many I'm sure, but the loop isn't closed on this one.

In my case, validations included analyzing the attached blob. Like other issues linked here, a FileNotFound became the error raising across our application/test suite.

@trevorturk
Copy link
Contributor

@jbielick please do consider improving the documentation if you're able! https://guides.rubyonrails.org/v5.1/contributing_to_ruby_on_rails.html#contributing-to-the-rails-documentation

@jbielick
Copy link

jbielick commented Feb 8, 2022

@trevorturk The last PR to add some information about this was closed #37005

I think closing the loop in the sense that I meant it would be providing a public API to retain existing behavior. Is there a workaround you know could be documented?

@trevorturk
Copy link
Contributor

No, I was just reacting to you saying "it would have been helpful for this information to be in the guides" and suggesting you improve the documentation if you're able. If you think changes to the code should be made, you should probably consider opening a new issue referencing this one, or ideally a pull request with your proposed changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants