New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store client's file checksum as metadata and return it again in downloads and PROPFIND #18716

Closed
karlitschek opened this Issue Aug 31, 2015 · 21 comments

Comments

Projects
None yet
@karlitschek
Member

karlitschek commented Aug 31, 2015

We want to handle checksums in webdav requests.

Goals:

  • secure the webdav/http transport layer
  • partly secure again corruption on the storage layer

Limitations:

  • files are uploaded and created from several different sides. for example directly on the external storage. Because of that checksum are not always available or there generation would be too expensive. Because of that there are sometimes available and sometimes not. So checksums will be optional

Approach:

  • Everytime a clients who supports this feature. (the ownCloud Desktop Client in the first step) can send a checksum of the file in parallel with the payload.
  • The ownCloud server stores this checksum in the file cache table
  • If a checksum for a file is available in the filecache table then this value is returned with every propfind.
  • The client can then compare the checksum and detect possible corruption of the file.
  • In this first step of the checksumming no checksums are generated on the server side. The server acts purely as a storage at the moment. In the later stage this could be extended.

Because checksums are not always available and are completely optional the clients can't rely on them for the actual syncing algorithm.

This approach can be extended in different directions in the future but we should start with this step.


QA ticket: owncloud/QA#113
Documentation ticket: owncloud/documentation#2132

@karlitschek

This comment has been minimized.

Show comment
Hide comment
Member

karlitschek commented Aug 31, 2015

@karlitschek

This comment has been minimized.

Show comment
Hide comment
@karlitschek
Member

karlitschek commented Aug 31, 2015

@MTRichards MTRichards added this to the 9.0-next milestone Aug 31, 2015

@aspdye

This comment has been minimized.

Show comment
Hide comment
@aspdye

aspdye commented Aug 31, 2015

👍

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Sep 16, 2015

Member

Sounds great 😄

Member

PVince81 commented Sep 16, 2015

Sounds great 😄

@ckamm

This comment has been minimized.

Show comment
Hide comment
@ckamm

ckamm Oct 1, 2015

Member

The client-side ticket is owncloud/client#3735. When this is added to the server, please also add a list of supported checksumming algorithms to the capabilities API. The client currently supports "Adler32", "MD5" and "SHA1".

You can already ask the client to send checksums for uploads by adding transmissionChecksum=MD5 to the [General] section of owncloud.cfg.

Member

ckamm commented Oct 1, 2015

The client-side ticket is owncloud/client#3735. When this is added to the server, please also add a list of supported checksumming algorithms to the capabilities API. The client currently supports "Adler32", "MD5" and "SHA1".

You can already ask the client to send checksums for uploads by adding transmissionChecksum=MD5 to the [General] section of owncloud.cfg.

@guruz guruz changed the title from checksum support to Store client's file checksum as metadata and return it again Oct 2, 2015

@guruz guruz changed the title from Store client's file checksum as metadata and return it again to Store client's file checksum as metadata and return it again in downloads and PROPFIND Oct 2, 2015

@rullzer rullzer self-assigned this Oct 14, 2015

@dragotin

This comment has been minimized.

Show comment
Hide comment
@dragotin

dragotin Oct 19, 2015

Contributor

I would suggest that both client and server work with a HTTP header of the form:

OC-Checksum: <checksum-type>:<sum>

Currently we support the checksum types MD5, SHA1 and Adler32

Contributor

dragotin commented Oct 19, 2015

I would suggest that both client and server work with a HTTP header of the form:

OC-Checksum: <checksum-type>:<sum>

Currently we support the checksum types MD5, SHA1 and Adler32

@PVince81 PVince81 added the overview label Nov 26, 2015

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 7, 2015

Member

@dragotin is the client already sending something ?

TODO for the server:

  • enhance the FilesPlugin to return the stored checksum in a new "oc:checksum" attribute (for PROPFIND)
  • make sure the attribute CANNOT be set using PROPPATCH, but only through a legitimate upload
  • at upload time, if "OC-Checksum" header is set, store it
  • where to store it ? new filecache table column ? CC @icewind1991
Member

PVince81 commented Dec 7, 2015

@dragotin is the client already sending something ?

TODO for the server:

  • enhance the FilesPlugin to return the stored checksum in a new "oc:checksum" attribute (for PROPFIND)
  • make sure the attribute CANNOT be set using PROPPATCH, but only through a legitimate upload
  • at upload time, if "OC-Checksum" header is set, store it
  • where to store it ? new filecache table column ? CC @icewind1991
@dragotin

This comment has been minimized.

Show comment
Hide comment
@dragotin

dragotin Dec 7, 2015

Contributor

yes, since 2.1.0 the client sends either Adler32 (probably not useful), MD5 or SHA1 if you enable it in the config file by setting:

transmissionChecksum=MD5

in the [General] section.

Contributor

dragotin commented Dec 7, 2015

yes, since 2.1.0 the client sends either Adler32 (probably not useful), MD5 or SHA1 if you enable it in the config file by setting:

transmissionChecksum=MD5

in the [General] section.

@icewind1991

This comment has been minimized.

Show comment
Hide comment
@icewind1991

icewind1991 Dec 7, 2015

Member

where to store it ? new filecache table column

I would say so yes

Member

icewind1991 commented Dec 7, 2015

where to store it ? new filecache table column

I would say so yes

@dragotin

This comment has been minimized.

Show comment
Hide comment
@dragotin

dragotin Dec 7, 2015

Contributor

I'd suggest to be able to store different types of checksums for one file. We will have different checksum types for securing the transmission versus checksums over the content.

Contributor

dragotin commented Dec 7, 2015

I'd suggest to be able to store different types of checksums for one file. We will have different checksum types for securing the transmission versus checksums over the content.

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 7, 2015

Member

Regarding filecache table: I heard that adding a column on an existing table that has 10K or more entries might be slow at update time. So we might also need to consider a separate table.
But I'm not database expert. 😄

Member

PVince81 commented Dec 7, 2015

Regarding filecache table: I heard that adding a column on an existing table that has 10K or more entries might be slow at update time. So we might also need to consider a separate table.
But I'm not database expert. 😄

@dragotin

This comment has been minimized.

Show comment
Hide comment
@dragotin

dragotin Dec 7, 2015

Contributor

I'd suggest a detail table with colums fileId, checksumtypeId and checksum, with primary key (fileId, checksumtype) and another detail table mapping checksumtypeId to a meaningful string.

Contributor

dragotin commented Dec 7, 2015

I'd suggest a detail table with colums fileId, checksumtypeId and checksum, with primary key (fileId, checksumtype) and another detail table mapping checksumtypeId to a meaningful string.

@icewind1991

This comment has been minimized.

Show comment
Hide comment
@icewind1991

icewind1991 Dec 8, 2015

Member

I'd suggest a detail table with colums fileId, checksumtypeId and checksum, with primary key (fileId, checksumtype) and another detail table mapping checksumtypeId to a meaningful string.

Makes more sense yes

Member

icewind1991 commented Dec 8, 2015

I'd suggest a detail table with colums fileId, checksumtypeId and checksum, with primary key (fileId, checksumtype) and another detail table mapping checksumtypeId to a meaningful string.

Makes more sense yes

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Dec 8, 2015

Member

Regarding filecache table: I heard that adding a column on an existing table that has 10K or more entries might be slow at update time.

should be of lesser issue if the column is nullable

Member

DeepDiver1975 commented Dec 8, 2015

Regarding filecache table: I heard that adding a column on an existing table that has 10K or more entries might be slow at update time.

should be of lesser issue if the column is nullable

@rullzer

This comment has been minimized.

Show comment
Hide comment
@rullzer

rullzer Feb 3, 2016

Contributor

#21997 is in.

Documentation ticket in owncloud/documentation#2132
And ticket for QA intergration tests in owncloud/QA#113

Contributor

rullzer commented Feb 3, 2016

#21997 is in.

Documentation ticket in owncloud/documentation#2132
And ticket for QA intergration tests in owncloud/QA#113

@rullzer

This comment has been minimized.

Show comment
Hide comment
@rullzer

rullzer Feb 8, 2016

Contributor

Closing time :)

Contributor

rullzer commented Feb 8, 2016

Closing time :)

@rullzer rullzer closed this Feb 8, 2016

@MorrisJobke

This comment has been minimized.

Show comment
Hide comment
@MorrisJobke

MorrisJobke Feb 9, 2016

Member

Proper return values for multiple checksums went in as #22199

Member

MorrisJobke commented Feb 9, 2016

Proper return values for multiple checksums went in as #22199

@dragotin

This comment has been minimized.

Show comment
Hide comment
@dragotin

dragotin Feb 18, 2016

Contributor

Just for the record:

So far, the client never asks for the checksum in PROPFINDs. As a result, the checksum is never returned in a PROPFIND result and thus does not need to be read from database for the normal PROPFIND usecase.

For the current functionality, the client does not need the content checksum at all. So far we agreed to only use the checksum for transportation verification. For that, the checksum is contained in a HTTP header of the PUT or GET request, and from there the client is reading it.

For change detection of server files, we continue to rely on the ETags which makes sense. So I currently see no reason why the client would EVER ask for the checksum of the whole file in a PROPFIND. It is very good to have that for special cases but not for the normal case.

Let me kindly ask for re-considering to store multiple checksums in a detail table rather than in one column in oc_filecache given these facts.

Contributor

dragotin commented Feb 18, 2016

Just for the record:

So far, the client never asks for the checksum in PROPFINDs. As a result, the checksum is never returned in a PROPFIND result and thus does not need to be read from database for the normal PROPFIND usecase.

For the current functionality, the client does not need the content checksum at all. So far we agreed to only use the checksum for transportation verification. For that, the checksum is contained in a HTTP header of the PUT or GET request, and from there the client is reading it.

For change detection of server files, we continue to rely on the ETags which makes sense. So I currently see no reason why the client would EVER ask for the checksum of the whole file in a PROPFIND. It is very good to have that for special cases but not for the normal case.

Let me kindly ask for re-considering to store multiple checksums in a detail table rather than in one column in oc_filecache given these facts.

@bugsyb

This comment has been minimized.

Show comment
Hide comment
@bugsyb

bugsyb Feb 18, 2016

Even if it is outside of this feature, it would be good to have
implemented checksum verification by client as an additional option on
top of ETags (or in parallel).

Dreaming further, it would be good to be able to sync differences within
files only (using rolling checksum window, similar to DropBox and rsync).

Regards,
David

On 2016-02-18 11:26, Klaas Freitag wrote:

Just for the record:

So far, the client never asks for the checksum in PROPFINDs. As a
result, the checksum is never returned in a PROPFIND result and thus
does not need to be read from database for the normal PROPFIND usecase.

For the current functionality, the client does not need the content
checksum at all. So far we agreed to only use the checksum for
transportation verification. For that, the checksum is contained in a
HTTP header of the PUT or GET request, and from there the client is
reading it.

For change detection of server files, we continue to rely on the ETags
which makes sense. So I currently see no reason why the client would
EVER ask for the checksum of the whole file in a PROPFIND. It is very
good to have that for special cases but not for the normal case.

Let me kindly ask for re-considering to store multiple checksums in a
detail table rather than in one column in oc_filecache given these facts.


Reply to this email directly or view it on GitHub
#18716 (comment).

bugsyb commented Feb 18, 2016

Even if it is outside of this feature, it would be good to have
implemented checksum verification by client as an additional option on
top of ETags (or in parallel).

Dreaming further, it would be good to be able to sync differences within
files only (using rolling checksum window, similar to DropBox and rsync).

Regards,
David

On 2016-02-18 11:26, Klaas Freitag wrote:

Just for the record:

So far, the client never asks for the checksum in PROPFINDs. As a
result, the checksum is never returned in a PROPFIND result and thus
does not need to be read from database for the normal PROPFIND usecase.

For the current functionality, the client does not need the content
checksum at all. So far we agreed to only use the checksum for
transportation verification. For that, the checksum is contained in a
HTTP header of the PUT or GET request, and from there the client is
reading it.

For change detection of server files, we continue to rely on the ETags
which makes sense. So I currently see no reason why the client would
EVER ask for the checksum of the whole file in a PROPFIND. It is very
good to have that for special cases but not for the normal case.

Let me kindly ask for re-considering to store multiple checksums in a
detail table rather than in one column in oc_filecache given these facts.


Reply to this email directly or view it on GitHub
#18716 (comment).

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Feb 18, 2016

Member

After further discussion with @dragotin we will for now keep the current approach because it is simply enough for what we are trying to accomplish for 9.0

Even if storage backends can provide multiple checksums - the question is if we really need to store them in out database.

Assuming we once want to verify the checksums between owncloud and the storage system - I see no need in storing them because the checksums are only evaluated on upload to the storage.

The checksum send from/to the client is for a different scenarios/part of the workflow.

Member

DeepDiver1975 commented Feb 18, 2016

After further discussion with @dragotin we will for now keep the current approach because it is simply enough for what we are trying to accomplish for 9.0

Even if storage backends can provide multiple checksums - the question is if we really need to store them in out database.

Assuming we once want to verify the checksums between owncloud and the storage system - I see no need in storing them because the checksums are only evaluated on upload to the storage.

The checksum send from/to the client is for a different scenarios/part of the workflow.

@dragotin

This comment has been minimized.

Show comment
Hide comment
@dragotin

dragotin Mar 29, 2016

Contributor

I think this is the important bit to fix #11811

Contributor

dragotin commented Mar 29, 2016

I think this is the important bit to fix #11811

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment