-
Notifications
You must be signed in to change notification settings - Fork 21.8k
Always save ActiveStorage::Blob before uploading to service #34827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always save ActiveStorage::Blob before uploading to service #34827
Conversation
@@ -53,19 +53,21 @@ def find_signed(id) | |||
# When providing a content type, pass <tt>identify: false</tt> to bypass automatic content type inference. | |||
def build_after_upload(io:, filename:, content_type: nil, metadata: nil, identify: true) | |||
new(filename: filename, content_type: content_type, metadata: metadata).tap do |blob| | |||
blob.reserve_key! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this makes the documentation for this method ("returns new, unsaved blob instance") incorrect and the method name misleading.
I think that if the possible key collision is an issue, the build_*
methods should be removed - and we should instead corresponding create_*
methods.
I If we are creating before uploading I don't think that the reserve_key!
method would still be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if the possible key collision is an issue, the build_* methods should be removed - and we should instead corresponding create_* methods.
I would vote for that too. Don't know how much "public-ness" is attached to those build methods though, so not my call to make. "Create before upload always" is the proposed solution anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I’m okay with removing build_after_upload
. build_after_unfurling
will likely need to stay to preserve desirable behaviors of the attaching API, but it’s strictly internal and used safely.
blob.upload(io, identify: identify) | ||
end | ||
end | ||
|
||
def build_after_unfurling(io:, filename:, content_type: nil, metadata: nil, identify: true) #:nodoc: | ||
new(filename: filename, content_type: content_type, metadata: metadata).tap do |blob| | ||
blob.reserve_key! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn’t need to reserve the resulting blob’s key. In fact, we don’t want it to do so: it’s used to build a blob that is saved with an attached record and uploaded only after the create transaction commits. If that transaction rolls back, the unused blob should not be left behind in the DB.
We can “just” skip reserving the key here since our usage of this method means we’re protected by the uniqueness constraint on the key
column.
# time), while having an open database transaction. | ||
# Returns a saved blob instance after the +io+ has been uploaded to the service. Note, the blob is saved twice, | ||
# first right before the +io+ is uploaded, and then once the metadata of the blob becomes known. This is necessary to | ||
# correctly reserve keys on the destination storage. | ||
# When providing a content type, pass <tt>identify: false</tt> to bypass automatic content type inference. | ||
def create_after_upload!(io:, filename:, content_type: nil, metadata: nil, identify: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method’s name no longer matches its behavior. create_and_upload!
?
@georgeclaghorn I'll try to decipher
Do I understand correctly that in case of
I don't see it used from anywhere but the Blob class itself, so that's a 👍 from me if agreement has been reached. |
That’s correct. |
Ok, so 🔪 |
👍 |
Will do. What can we do for those still on Rails 5 in this situation? |
We can backport this without the method removals: add |
@georgeclaghorn if there is a |
I’m fine with not recovering. |
c3d2ca4
to
67794c4
Compare
@georgeclaghorn Hopefully all good now. I rebased to take the create blob calls in ActionText tests along. |
Any chance of this making 6.0? |
@georgeclaghorn Any chance of getting this to move forward? |
Any word on when this will be merged? It's been causing a lot of problems in my app. |
@cpsoinos on Rails 5 I use this patch for now but indeed would be great if this could get merged (and hopefully even backported). https://gist.github.com/julik/4195b74774dc21dcd2cbd694a25102ce |
@julik thanks for the gist! After seeing this PR I implemented a patch myself that essentially runs the logic you have in |
@georgeclaghorn is there anything I can do? |
@georgeclaghorn Would you like me to rebase and update? |
Hi Julik, Sorry for the delay here. Please do rebase. Last thing: on revisiting this, I don’t think we can remove |
Glad we can back to 🔧 on this. Can I make |
I would personally prefer to restore them with their original behavior, but I don’t feel strongly either way. |
Change implementation of create_after_upload!, and rename it to create_and_upload! Always save the Blob to the database first, and upload to the service afterwards - to prevent randomly-generating storage keys overwriting existing files before the database constraint can fire.
67794c4
to
15f603a
Compare
I opted for aliasing because the original behaviour of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a changelog entry?
Backported to 6-0-stable (without the deprecations) in 7e23c05. |
Summary
Closes #34805
This PR will be contentious as there is an assumption made that it is possible, with a sufficiently random
key
, to permit the user to upload to the storage without reserving that key first. I believe the assumption is flawed because it makes the following situation possible:key
as the legal document user A has already uploaded.key
so that saving the comment is fast once the user is done writing. It does receive an error from the ActiveStorage endpoint and tries again. The legal document of the user A is at this point overwritten, and at this stage has been irrecoverably lost. The legal document has been replaced with the NSFW picture.So I believe, however unlikely (the value spread for the
key
is very very large indeed) this introduces a possibility of a very sneaky data corruption, which would overwrite data uploaded by the users of the application. This is exacerbated by the fact that when you generate a pre-signed URL to an external service the service will most likely not verify the checksum for you. This means that a person meaning to download the legal document of User A is going to receive the NSFW picture from User B, without any warning.I know the solution proposed here was probably not the one desired (not holding a DB transaction during upload is a valid requirement) but I believe having risks of silent user data clobbering is worse.
There are a few ways to solve this problem:
I think part of the reason for this is the method naming for
generate_unique_secret_token
, because despite it is name it offers no actual guarantees of uniqueness - and it does recommend adding a database constraint on top. The problem here is rather we deliberately allow a state change on an external service which cannot be rolled back before allowing this index to fire.I've left the test failing for now because they make assumptions about how
build...
is supposed to work in this case. I do believe that these assumptions should be changed to prevent user data corruption, but would like to poll first.If the concern is holding a DB transaction open maybe the scope of this transaction should somehow be reduced to key reservation only, I just don't know if ActiveRecord has this facility or not.
Empirical evidence - I work for a company that does lots of file uploads, and I did see wrong things happen, more than once. Some of those wrong things did involve probabilities and preemptive key generation.