Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return hashes of uploaded content for dav uploads #19351

Merged
merged 2 commits into from
Apr 15, 2020
Merged

Conversation

icewind1991
Copy link
Member

hashes are set in "X-Hash-MD5", "X-Hash-SHA1" and "X-Hash-SHA256" headers.

these headers are set for file uploads and the MOVE request at the end of a multipart upload.

cc @tobiasKaminsky

@icewind1991 icewind1991 added the 3. to review Waiting for reviews label Feb 7, 2020
@icewind1991 icewind1991 added this to the Nextcloud 19 milestone Feb 7, 2020
@tobiasKaminsky
Copy link
Member

Nice :-)
I unfortunately cannot answer my questions via looking into code:

  • will this fail upload, if checksum mismatch? (I do not think so)
  • how can I get the hash value for existing files?

@icewind1991
Copy link
Member Author

  • no
  • this pr doesn't add that

This only returns the hash of the uploaded content so clients can detect corrupted transfers

@rullzer
Copy link
Member

rullzer commented Feb 13, 2020

To give a bit more context ;)

will this fail upload, if checksum mismatch? (I do not think so)

No but you can detect it at least client side. And do something.

how can I get the hash value for existing files?

Adding this is tricky. Just imagine on a big instance and everybody starts to request the hash of several GB files. This could easily kill the server.

Long story short. This is a step in the right direction but not the full fix at all.

@tobiasKaminsky
Copy link
Member

will this fail upload, if checksum mismatch? (I do not think so)

No but you can detect it at least client side. And do something.

So I would upload the file, then download info of it, compare hash.
If not same, I have to remove it, and show an error to user.

That sounds a bit too complicated… Can't we adjust it so that server will compare hash and if it is not the same refuses the file to be moved and returns an error.

Adding this is tricky. Just imagine on a big instance and everybody starts to request the hash of several GB files. This could easily kill the server.

True. So we can reliable use this for all files, once each upload and change on server computes a hash sum?
(background of this this to have a reliable way that a file to upload does not exist in a folder)

@icewind1991
Copy link
Member Author

Keeping hashes for files stored on the server is a much more complex problem then just returning a hash from uploaded content.
In order to keep hashes for stored files we need to be 100% sure that we update (or invalidate) the hash for every change made to the file on the server

@rullzer
Copy link
Member

rullzer commented Feb 13, 2020

That sounds a bit too complicated… Can't we adjust it so that server will compare hash and if it is not the same refuses the file to be moved and returns an error.

That is a next step. But a lot more complicated and not covered in this PR.

@tobiasKaminsky
Copy link
Member

Keeping hashes for files stored on the server is a much more complex problem then just returning a hash from uploaded content.
In order to keep hashes for stored files we need to be 100% sure that we update (or invalidate) the hash for every change made to the file on the server

Totally understandable.

That is a next step. But a lot more complicated and not covered in this PR.

Then I will wait until the next step, as we otherwise would implement the error handling logic on client side and then later can remove it again.
But really nice to see improvement in this area :-)

@rullzer
Copy link
Member

rullzer commented Apr 4, 2020

@icewind1991 mind to fix ci so we can get this in as well?

@rullzer rullzer mentioned this pull request Apr 4, 2020
80 tasks
@icewind1991 icewind1991 force-pushed the dav-upload-hash branch 2 times, most recently from 71ccecf to f9999c6 Compare April 9, 2020 10:52
@rullzer rullzer mentioned this pull request Apr 9, 2020
59 tasks
@kesselb
Copy link
Contributor

kesselb commented Apr 12, 2020

So I would upload the file, then download info of it, compare hash.

@tobiasKaminsky the calculated hash is part of the response of the move or put operation. No need to download the file info.

Copy link
Contributor

@kesselb kesselb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Needs a rebase due those code style changes.

hashes are set in "X-Hash-MD5", "X-Hash-SHA1" and "X-Hash-SHA256" headers.

these headers are set for file uploads and the MOVE request at the end of a multipart upload.

Signed-off-by: Robin Appelman <robin@icewind.nl>
Signed-off-by: Robin Appelman <robin@icewind.nl>
@rullzer rullzer merged commit d63abeb into master Apr 15, 2020
@rullzer rullzer deleted the dav-upload-hash branch April 15, 2020 08:23
@tobiasKaminsky
Copy link
Member

@tobiasKaminsky the calculated hash is part of the response of the move or put operation. No need to download the file info.

True, but still this needs to be done on client side.
In my opinion a wrong hash sum must lead to an upload error. But as Roeland said, this will happen hopefully in next step.

@tobiasKaminsky
Copy link
Member

@tobiasKaminsky the calculated hash is part of the response of the move or put operation.

I only see it as a response header in put operation, but not in final move (of chunked upload)?

@tobiasKaminsky
Copy link
Member

@tobiasKaminsky the calculated hash is part of the response of the move or put operation.

I only see it as a response header in put operation, but not in final move (of chunked upload)?

Nevermind. I did something wrong, but now it works…

5 similar comments
@tobiasKaminsky
Copy link
Member

@tobiasKaminsky the calculated hash is part of the response of the move or put operation.

I only see it as a response header in put operation, but not in final move (of chunked upload)?

Nevermind. I did something wrong, but now it works…

@tobiasKaminsky
Copy link
Member

@tobiasKaminsky the calculated hash is part of the response of the move or put operation.

I only see it as a response header in put operation, but not in final move (of chunked upload)?

Nevermind. I did something wrong, but now it works…

@tobiasKaminsky
Copy link
Member

@tobiasKaminsky the calculated hash is part of the response of the move or put operation.

I only see it as a response header in put operation, but not in final move (of chunked upload)?

Nevermind. I did something wrong, but now it works…

@tobiasKaminsky
Copy link
Member

@tobiasKaminsky the calculated hash is part of the response of the move or put operation.

I only see it as a response header in put operation, but not in final move (of chunked upload)?

Nevermind. I did something wrong, but now it works…

@tobiasKaminsky
Copy link
Member

@tobiasKaminsky the calculated hash is part of the response of the move or put operation.

I only see it as a response header in put operation, but not in final move (of chunked upload)?

Nevermind. I did something wrong, but now it works…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants