Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store newly-uploaded files on save rather than assignment #33303

Merged
merged 7 commits into from Jul 8, 2018

Conversation

@georgeclaghorn
Copy link
Contributor

@georgeclaghorn georgeclaghorn commented Jul 6, 2018

At present, when an uploaded file is assigned to a record via a writer method added by has_one_attached or has_many_attached, it’s persisted to storage immediately. For example, the following causes an uploaded file in params[:avatar] to be stored:

@user.avatar = params[:avatar]

We’d like to support validating blobs attached to records, but for validations to be effective, they ought to prevent invalid files from being stored. It’s of little use to identify an invalid file after it’s already been shipped off to storage: you might use a size validation to limit the cost that a single file can add to your AWS bill, but if the file is stored before validations run, you incur its cost regardless.

This PR changes Active Storage to store attachables assigned to a record after the record is saved rather than immediately. To that end, it introduces change tracking for Active Storage attachments. When you use a writer method added by has_one_attached or has_many_attached, Active Storage tracks a pending change to the record’s attachments. The change is persisted in the application’s database when the record is saved, and the attachables involved are uploaded to storage only after the save transaction commits.

Included are supplementary reorganization and expansion of the attachment API’s tests.

@georgeclaghorn georgeclaghorn force-pushed the activestorage-change-tracking branch Jul 6, 2018
@georgeclaghorn georgeclaghorn force-pushed the activestorage-change-tracking branch 3 times, most recently Jul 6, 2018
@georgeclaghorn georgeclaghorn force-pushed the activestorage-change-tracking branch to 62856d1 Jul 6, 2018
Copy link
Member

@kaspth kaspth left a comment

Nice! I attempted a cursory review 😄

end
CODE

has_one :"#{name}_attachment", -> { where(name: name) }, class_name: "ActiveStorage::Attachment", as: :record, inverse_of: :record, dependent: false
has_one :"#{name}_attachment", -> { where(name: name) }, class_name: "ActiveStorage::Attachment", as: :record, inverse_of: :record, dependent: :destroy

This comment has been minimized.

@kaspth

kaspth Jul 7, 2018
Member

Does this need a deprecation cycle?

This comment has been minimized.

@georgeclaghorn

georgeclaghorn Jul 7, 2018
Author Contributor

No, it doesn’t change behavior. We did this before by calling purge_later or detach on destroy.

This comment has been minimized.

@kaspth

kaspth Jul 7, 2018
Member

Got it, so it's just more obvious now.


after_commit(on: [:create, :update]) do
begin
public_send("#{name}_change").try(:upload)

This comment has been minimized.

@kaspth

kaspth Jul 7, 2018
Member

Why not use &.upload like above?

This comment has been minimized.

@georgeclaghorn

georgeclaghorn Jul 7, 2018
Author Contributor

The Delete* changes don’t respond to upload. They could implement upload and do nothing, and I had them do that originally, but it seemed odd to me.

@@ -10,11 +10,11 @@ class Attached::One < Attached
# You don't have to call this method to access the attachment's methods as
# they are all available at the model level.
def attachment
record.public_send("#{name}_attachment")
change.present? ? change.attachment : record.public_send("#{name}_attachment")

This comment has been minimized.

@kaspth

kaspth Jul 7, 2018
Member

change&.attachment || record.public_send("#{name}_attachment")?

This comment has been minimized.

@georgeclaghorn

georgeclaghorn Jul 7, 2018
Author Contributor

Tricky: we don’t want to fall back to the attachment in the DB when a Delete* change is pending.


blob_was.purge_later if blob_was && dependent == :purge_later
if !attached? || new_blob != blob
write_attachment build_attachment(blob: new_blob)

This comment has been minimized.

@kaspth

kaspth Jul 7, 2018
Member

Should this write a "change" to the accessor? I don't understand how calling attach would now ensure the Attachment got uploaded?

This comment has been minimized.

@georgeclaghorn

georgeclaghorn Jul 7, 2018
Author Contributor

We can explore implementing attach in terms of the writer in the future. I chose to preserve its existing behavior for now. It uses create_blob_from, which uploads as necessary.

This comment has been minimized.

@kaspth

kaspth Jul 7, 2018
Member

Ah, got it 👍 There's a fair amount of different flows here 😄

def build_attachment(blob:)
ActiveStorage::Attachment.new(record: record, name: name, blob: blob)
Attachment.new(record: record, name: name, blob: blob)

This comment has been minimized.

@kaspth

kaspth Jul 7, 2018
Member

Could also strip ActiveStorage:: from the change classes.

@@ -15,17 +15,18 @@ class ActiveStorage::Attachment < ActiveRecord::Base
delegate_missing_to :blob

after_create_commit :analyze_blob_later, :identify_blob
after_destroy_commit :purge_dependent_blob_later

This comment has been minimized.

@kaspth

kaspth Jul 7, 2018
Member

We're switching to delete below because of this, right? Is this callback just to ensure that we always honor the dependent: :destroy from the macros?

This comment has been minimized.

@georgeclaghorn

georgeclaghorn Jul 7, 2018
Author Contributor

Yes. This, along with the switch to dependent: :destroy on attachment associations, keeps changes from having to track removed attachments and purge them on commit: they can just replace the associated attachments and let AR destroy the removed ones.

end
def clear_#{name}_change
@active_storage_attached_#{name}_change = nil

This comment has been minimized.

@kaspth

kaspth Jul 7, 2018
Member

It might be nice to replace these ivars with a private/nodoc'ed attachment_changes hash in the vein of Active Model's changes. Then the change classes could use that and skip the need for public_send.

This comment has been minimized.

@georgeclaghorn

georgeclaghorn Jul 7, 2018
Author Contributor

Agree. I'll hook that up.

@georgeclaghorn georgeclaghorn force-pushed the activestorage-change-tracking branch to 5ff2904 Jul 8, 2018
@georgeclaghorn georgeclaghorn merged commit e8682c5 into master Jul 8, 2018
3 checks passed
3 checks passed
codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@georgeclaghorn georgeclaghorn deleted the activestorage-change-tracking branch Jul 8, 2018

def upload
case attachable
when ActionDispatch::Http::UploadedFile, Rack::Test::UploadedFile

This comment has been minimized.

@exviva

exviva Jul 13, 2018
Contributor

Shouldn't we first check if these constants are defined? Are we sure they're guaranteed to be required by now?

This comment has been minimized.

@georgeclaghorn

georgeclaghorn Jul 13, 2018
Author Contributor

Are we sure they're guaranteed to be required by now?

Yes. They’re required in lib/active_storage/attached.rb.

This comment has been minimized.

@exviva

exviva Jul 13, 2018
Contributor

Cool, thanks!

@yves-vogl
Copy link

@yves-vogl yves-vogl commented Jul 17, 2018

Thanks!

@tobscher
Copy link

@tobscher tobscher commented Sep 14, 2018

@georgeclaghorn That is some excellent work to improve ActiveStorage. It looks like this would potentially fix issue #31985.

If this fixes the issue I mentioned before is there any way to benefit from this fix prior to Rails 6?

cbothner added a commit to galahq/gala that referenced this pull request Oct 12, 2018
In Rails < 6, attachment changes are persisted as soon as they are
assigned, regardless of whether the record they’re attached to is
successfully saved. That’s why Rails < 6 doesn’t support validation of
attachments. We need to, but the model is left in an inconsistent state
if validation fails: the old blob is deleted as soon as the new
attachment is assigned, but if validation fails the old attachment is
preserved, and points to a dead blob.

Therefore, we’re monkey patching rails to proactively delete any
extant attachments that point to a blob before purging it. This will
only be necessary until Rails 6 comes out, because of
rails/rails#33303
abhaynikam referenced this pull request Apr 23, 2019
In Rails updating an Active Storage relation will now replace the entire
association instead of merely adding to it.
#35817 (comment)

Fixes #35817

cc @georgeclaghorn
@@ -9,7 +9,7 @@ class Attached::Many < Attached
#
# All methods called on this proxy object that aren't listed here will automatically be delegated to +attachments+.
def attachments
record.public_send("#{name}_attachments")
change.present? ? change.attachments : record.public_send("#{name}_attachments")

This comment has been minimized.

@glaszig

glaszig May 2, 2019
Contributor

this either returns an Array or ActiveStorage::Attachment::ActiveRecord_Associations_CollectionProxy which makes the interface quite a moving target. with the latter i can do something like attachments.where(...) whereas that breaks when getting an array.

suketa added a commit to suketa/rails_sandbox that referenced this pull request May 18, 2019
Store newly-uploaded files on save rather than assignment
rails/rails#33303
suketa added a commit to suketa/rails_sandbox that referenced this pull request May 24, 2019
Store newly-uploaded files on save rather than assignment
rails/rails#33303
@pjmartorell
Copy link

@pjmartorell pjmartorell commented Jun 4, 2019

@georgeclaghorn Is there any plan to add this fix/feature to Rails 5.2 as well? it would be really appreciated 🙏

claudioclutter added a commit to clutter/mdma that referenced this pull request Jun 6, 2019
claudioclutter added a commit to clutter/mdma that referenced this pull request Jun 6, 2019
In Rails 6, the internals of Active Storage have changed: files
are only uploaded after a model is saved. Specifically the are
uploaded inside an `after_commit_update`, see:

rails/rails#33303

Since our code also uses `after_commit_update` to run the job
that analyzes that file to extract the manifest, we have to
make sure that our own callback is run **after** the Active Storage
one. Adding `prepend: true` accomplishes that.
@inopinatus
Copy link
Contributor

@inopinatus inopinatus commented Sep 10, 2019

This PR broke the usual cycle of validation before save.

The justification appears to be cost optimisation for folks like Github and Basecamp.

There is now no official way to access content prior to upload.

I’m now seeing repeated “wtf?”-level questions as a result in a variety of public and private fora.

There’s a job to finish here, because out in meat-and-potatoes business apps land where we just want to validate content before save as normal, programmer happiness has been diminished.

@georgeclaghorn
Copy link
Contributor Author

@georgeclaghorn georgeclaghorn commented Sep 10, 2019

This PR broke the usual cycle of validation before save.

This PR made it possible to validate file uploads at all. Previously, files were indiscriminately uploaded before validation.

The justification appears to be cost optimisation for folks like Github and Basecamp.

No.

There is now no official way to access content prior to upload.

There wasn’t a way to access pre-upload file data in 5.2, either.

I’m now seeing repeated “wtf?”-level questions as a result in a variety of public and private fora.

There’s a job to finish here, because out in meat-and-potatoes business apps land where we just want to validate content before save as normal, programmer happiness has been diminished.

I’m going to take a look eventually, but I’ve been repeatedly annoyed by your tone on here. Adjust it if you’d like to participate further.

Edit: I misread Josh’s comment as a demand directed at me. I apologize for reacting so defensively.

@inopinatus
Copy link
Contributor

@inopinatus inopinatus commented Sep 11, 2019

There wasn’t a way to access pre-upload file data in 5.2, either.

Just to be clear, since there might be a terminology misunderstanding here:

  • By "save", I mean precisely calling #save on a model object, and not any more general notion, and definitely not as as a synonym for uploading a blob; and

  • By "validate", I mean using Active Model validation callbacks to check the validity of a model's attributes prior to its insertion into the database, and preventing that insertion if validation fails, and including in this the desire to check the validity and acceptability of the content of attachment fields on that model.

In 5.2.3, there was no moment of "pre-upload" at all. As you say, attachments are always uploaded straight away. As a result, blob#download always works if there is an attachment at all, and this validation has been working perfectly well in 5.2.3:

class ImageValidator < ActiveModel::EachValidator
  def validate_each(record, attribute, value)
    unless value.attached?
      record.errors.add attribute, :blank
      return
    end

    if options[:size]
      record.errors.add attribute, :size unless options[:size] === value.byte_size
    end

    if options[:content]
      record.errors.add attribute, :content unless options[:content] === value.download
    end

    record.errors.add attribute, :unrepresentable unless value.representable?
  end
end

class User < ApplicationRecord
  class ContentCheck
    # Dummy content check for demonstration purposes.
    def ===(content)
      Digest::SHA256.hexdigest(content) =~ /\A[a-f]/
    end
  end

  has_one_attached :avatar
  validates :avatar, image: { size: ...1.megabyte, content: ContentCheck.new }

  #...
end

In 6.0.0, this throws an exception in the validator when it hits value.download. The context here is super mundane: submitting a form with a file field to a #create action which does @user = User.new(params) and then if @user.save and so on. As with so many business apps, it really is that dull and ordinary a case.

To put it simply, I am only concerned about having access to the content, whether it is in its final resting place or not, and perceive the loss of that as a regression. It is the model's validation-before-save that matters to us, and not whether or not a loose blob made it to S3 and needs cleaning up later.

I can work around this in a few ways; the simplest, especially for those doing a 5.2 -> 6.0 upgrade, seems to be by using an undocumented internal API, c.f. #37005.

@inopinatus
Copy link
Contributor

@inopinatus inopinatus commented Sep 13, 2019

Edit: I misread Josh’s comment as a demand directed at me. I apologize for reacting so defensively.

George, my remark was evidently ambiguous in that regard and I apologise both for the miscommunication and that it stung you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants