Skip to content

Always save ActiveStorage::Blob before uploading to service #34827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

julik
Copy link
Contributor

@julik julik commented Dec 30, 2018

Summary

Closes #34805

This PR will be contentious as there is an assumption made that it is possible, with a sufficiently random key, to permit the user to upload to the storage without reserving that key first. I believe the assumption is flawed because it makes the following situation possible:

  • User A uploads a PDF of an important legal document (I'm exaggerating, on purpose)
  • 4 months pass
  • User B edits his comment on something totally unrelated and ponders adding a NSFW picture as an attachment to that comment. Because the user A is unlucky, the picture gets issued the same key as the legal document user A has already uploaded.
  • The text editing control user B is using preemptively uploads the file under the key so that saving the comment is fast once the user is done writing. It does receive an error from the ActiveStorage endpoint and tries again. The legal document of the user A is at this point overwritten, and at this stage has been irrecoverably lost. The legal document has been replaced with the NSFW picture.

So I believe, however unlikely (the value spread for the key is very very large indeed) this introduces a possibility of a very sneaky data corruption, which would overwrite data uploaded by the users of the application. This is exacerbated by the fact that when you generate a pre-signed URL to an external service the service will most likely not verify the checksum for you. This means that a person meaning to download the legal document of User A is going to receive the NSFW picture from User B, without any warning.

I know the solution proposed here was probably not the one desired (not holding a DB transaction during upload is a valid requirement) but I believe having risks of silent user data clobbering is worse.

There are a few ways to solve this problem:

  • Only use storage backends which support "upload here or error if already taken". Off the top of my head only Redis supports this with SETNX, and then again you need to hold the entire set in memory in Redis for this to work. S3 will happily overwrite on any PUT (it will also overwrite your checksum value)
  • Pre-reserve keys using a deterministic sequence number obtained from the database. I tried this and it causes deadlocks on MySQL and is hard to do if your database does not have native cheap sequences. PostgreSQL does but most other databases do not.
  • And the last option: insert the row into the database before allowing the upload to take place, so that the database observes the constraint and refuses if the key is taken.

I think part of the reason for this is the method naming for generate_unique_secret_token, because despite it is name it offers no actual guarantees of uniqueness - and it does recommend adding a database constraint on top. The problem here is rather we deliberately allow a state change on an external service which cannot be rolled back before allowing this index to fire.

I've left the test failing for now because they make assumptions about how build... is supposed to work in this case. I do believe that these assumptions should be changed to prevent user data corruption, but would like to poll first.

If the concern is holding a DB transaction open maybe the scope of this transaction should somehow be reduced to key reservation only, I just don't know if ActiveRecord has this facility or not.

Empirical evidence - I work for a company that does lots of file uploads, and I did see wrong things happen, more than once. Some of those wrong things did involve probabilities and preemptive key generation.

@@ -53,19 +53,21 @@ def find_signed(id)
# When providing a content type, pass <tt>identify: false</tt> to bypass automatic content type inference.
def build_after_upload(io:, filename:, content_type: nil, metadata: nil, identify: true)
new(filename: filename, content_type: content_type, metadata: metadata).tap do |blob|
blob.reserve_key!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this makes the documentation for this method ("returns new, unsaved blob instance") incorrect and the method name misleading.

I think that if the possible key collision is an issue, the build_* methods should be removed - and we should instead corresponding create_* methods.

I If we are creating before uploading I don't think that the reserve_key! method would still be required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if the possible key collision is an issue, the build_* methods should be removed - and we should instead corresponding create_* methods.

I would vote for that too. Don't know how much "public-ness" is attached to those build methods though, so not my call to make. "Create before upload always" is the proposed solution anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I’m okay with removing build_after_upload. build_after_unfurling will likely need to stay to preserve desirable behaviors of the attaching API, but it’s strictly internal and used safely.

blob.upload(io, identify: identify)
end
end

def build_after_unfurling(io:, filename:, content_type: nil, metadata: nil, identify: true) #:nodoc:
new(filename: filename, content_type: content_type, metadata: metadata).tap do |blob|
blob.reserve_key!
Copy link
Contributor

@georgeclaghorn georgeclaghorn Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method doesn’t need to reserve the resulting blob’s key. In fact, we don’t want it to do so: it’s used to build a blob that is saved with an attached record and uploaded only after the create transaction commits. If that transaction rolls back, the unused blob should not be left behind in the DB.

We can “just” skip reserving the key here since our usage of this method means we’re protected by the uniqueness constraint on the key column.

# time), while having an open database transaction.
# Returns a saved blob instance after the +io+ has been uploaded to the service. Note, the blob is saved twice,
# first right before the +io+ is uploaded, and then once the metadata of the blob becomes known. This is necessary to
# correctly reserve keys on the destination storage.
# When providing a content type, pass <tt>identify: false</tt> to bypass automatic content type inference.
def create_after_upload!(io:, filename:, content_type: nil, metadata: nil, identify: true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method’s name no longer matches its behavior. create_and_upload!?

@julik
Copy link
Contributor Author

julik commented Jan 2, 2019

@georgeclaghorn I'll try to decipher

This method doesn’t need to reserve the resulting blob’s key. In fact, we don’t want it to do so: it’s used to build a blob that is saved with an attached record and uploaded only after the create transaction commits. If that transaction rolls back, the unused blob should not be left behind in the DB.

Do I understand correctly that in case of build_after_unfurling the actual upload takes place in an after_commit?

I think I’m okay with removing build_after_upload. build_after_unfurling will likely need to stay to preserve desirable behaviors of the attaching API, but it’s strictly internal and used safely.

I don't see it used from anywhere but the Blob class itself, so that's a 👍 from me if agreement has been reached.

@georgeclaghorn
Copy link
Contributor

Do I understand correctly that in case of build_after_unfurling the actual upload takes place in an after_commit?

That’s correct.

@julik
Copy link
Contributor Author

julik commented Jan 2, 2019

Ok, so 🔪 build_after_upload is what we go for?

@georgeclaghorn
Copy link
Contributor

👍

@julik
Copy link
Contributor Author

julik commented Jan 2, 2019

Will do. What can we do for those still on Rails 5 in this situation?

@georgeclaghorn
Copy link
Contributor

We can backport this without the method removals: add create_and_upload! and prefer it internally.

@julik julik changed the title [WIP/RFC] Reserve ActiveStorage keys before upload Always save ActiveStorage::Blob before uploading to service Jan 2, 2019
@julik
Copy link
Contributor Author

julik commented Jan 15, 2019

@georgeclaghorn if there is a RecordNotUnique should recovery be attempted? Or do we consider it a "once in a lifetime" event?

@georgeclaghorn
Copy link
Contributor

I’m fine with not recovering.

@julik julik force-pushed the activestorage-reserve-key-before-upload branch from c3d2ca4 to 67794c4 Compare January 15, 2019 02:10
@rails-bot rails-bot bot added the actiontext label Jan 15, 2019
@julik
Copy link
Contributor Author

julik commented Jan 15, 2019

@georgeclaghorn Hopefully all good now. I rebased to take the create blob calls in ActionText tests along.

@julik
Copy link
Contributor Author

julik commented Feb 16, 2019

Any chance of this making 6.0?

@julik
Copy link
Contributor Author

julik commented Apr 26, 2019

@georgeclaghorn Any chance of getting this to move forward?

@cpsoinos
Copy link

Any word on when this will be merged? It's been causing a lot of problems in my app.

@julik
Copy link
Contributor Author

julik commented Jun 1, 2019

@cpsoinos on Rails 5 I use this patch for now but indeed would be great if this could get merged (and hopefully even backported). https://gist.github.com/julik/4195b74774dc21dcd2cbd694a25102ce

@cpsoinos
Copy link

cpsoinos commented Jun 1, 2019

@julik thanks for the gist! After seeing this PR I implemented a patch myself that essentially runs the logic you have in create_and_upload!, though that's in the business-logic layer of my app. Currently monitoring how it does.

@julik
Copy link
Contributor Author

julik commented Aug 26, 2019

@georgeclaghorn is there anything I can do?

@julik
Copy link
Contributor Author

julik commented Sep 14, 2019

@georgeclaghorn Would you like me to rebase and update?

@georgeclaghorn
Copy link
Contributor

Hi Julik,

Sorry for the delay here. Please do rebase.

Last thing: on revisiting this, I don’t think we can remove .build_after_upload and .create_after_upload without deprecation. Can you restore them and add deprecation warnings? Then we can remove them in 6.2.

@georgeclaghorn georgeclaghorn self-assigned this Sep 22, 2019
@georgeclaghorn georgeclaghorn added this to the 6.1.0 milestone Sep 22, 2019
@julik
Copy link
Contributor Author

julik commented Sep 23, 2019

Glad we can back to 🔧 on this.

Can I make create_after_upload an alias for create_and_upload + deprecation (which it factually is)?

@georgeclaghorn
Copy link
Contributor

I would personally prefer to restore them with their original behavior, but I don’t feel strongly either way.

Change implementation of create_after_upload!, and rename
it to create_and_upload!

Always save the Blob to the database first, and upload to the
service afterwards - to prevent randomly-generating storage
keys overwriting existing files before the database constraint
can fire.
@julik julik force-pushed the activestorage-reserve-key-before-upload branch from 67794c4 to 15f603a Compare September 23, 2019 20:26
@julik
Copy link
Contributor Author

julik commented Sep 23, 2019

I opted for aliasing because the original behaviour of create_after_upload leads to the original issue (overwriting), while I believe just having it do something sensible instead and not break API consumers should be good enough ;-)

Copy link
Contributor

@georgeclaghorn georgeclaghorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a changelog entry?

@georgeclaghorn georgeclaghorn merged commit 3c35de7 into rails:master Sep 24, 2019
@georgeclaghorn
Copy link
Contributor

Backported to 6-0-stable (without the deprecations) in 7e23c05.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Activestorage does not reserve keys which can lead to overwritten blob contents
4 participants