Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store checksum in journal and verifiy upload for eml-files and others #3235

Closed
dragotin opened this issue May 13, 2015 · 26 comments

Comments

@dragotin
Copy link
Contributor

commented May 13, 2015

Once #2542 is implement, the checksum should be stored in local sync journal. We could use it to verify if an upload is needed, or if just the files metadata (mtime specifically) has changed. This would solve trouble we have with eml files on windows, and this could be a nice first usage case.

Problems: If we detect that only the mtime has changed, we could propagate the new mtime to the server, using PROPPATCH (needs to be verified). But that will probably not change the ETag of the file and also not the ETags of the parent directories. If it would, a client seeing a changed ETag would not understand that only the mtime has changed, and not the content of the file.

Note to self: If the server could do file checksums as well, it should also maintain directory checksums. The checksums only would change if the file contents or the contents of files within a directory changes. The ETag would just be an indication that something changed, either content or metadata. The client would need to consider the checksum to find that out.

@dragotin dragotin added this to the 1.8.2 - Bugfix 2 milestone May 13, 2015

@guruz

This comment has been minimized.

Copy link
Collaborator

commented May 15, 2015

It would be good in a way to have it support multiple checksum algos at the same time.
For example Adler32 is only good for error or change detection but not for deduplication (e.g. for #230)

@dragotin dragotin self-assigned this May 19, 2015

@MTRichards MTRichards modified the milestones: 2.1-next, 2.0 - Multi-account Jun 19, 2015

@MTRichards

This comment has been minimized.

Copy link

commented Jun 19, 2015

Moving to 2.1 to keep 2.0 reasonable size.

@cdamken cdamken added the blue-ticket label Jun 26, 2015

@cdamken

This comment has been minimized.

Copy link
Member

commented Jun 26, 2015

@MorrisJobke

00002839

@dragotin

This comment has been minimized.

Copy link
Contributor Author

commented Jun 26, 2015

@cdamken please elaborate. Pasting numbers and adding people is not enough.

@MorrisJobke

This comment has been minimized.

Copy link
Member

commented Jun 26, 2015

@cdamken please elaborate. Pasting numbers and adding people is not enough.

I think this is just for reference. I was mentioned, because it became a blue ticket and the number is the backlink to their system ;)

@dragotin

This comment has been minimized.

Copy link
Contributor Author

commented Jun 26, 2015

@MorrisJobke yes, I know, but I would like to know the content of the blue-ticket as well.

@cdamken

This comment has been minimized.

Copy link
Member

commented Jun 29, 2015

@dragotin The content of the ticket: *.EML files can't be deleted, even they are deleted in one client will reappear again and again, additionally generates a lot of network traffic.

@guruz

This comment has been minimized.

Copy link
Collaborator

commented Oct 2, 2015

@dragotin @ogoffart @ckamm @moscicki I was in general wondering if we should have a csync_journal table
phash, mtime, checksumtype, checksum that is only lazily filled.
Else we risk taking a lot of time to checksum everything..

@ckamm

This comment has been minimized.

Copy link
Member

commented Oct 14, 2015

Storing the checksums in the DB will be part of #3735, this ticket can then be about using them to avoid unnecessary uploads.

@ckamm

This comment has been minimized.

Copy link
Member

commented Nov 11, 2015

This has a couple of steps

  • Calculate checksums on upload even if the server doesn't support them, store them in the local db
  • If the new checksum is the same as the old one, assume the content is the same.
    • This could either be done while reconciling the local and db trees or in the upload propagation job.
  • Do a remote metadata update instead of a full upload.
    • Check that PROPPATCH works
@moscicki

This comment has been minimized.

Copy link
Contributor

commented Nov 11, 2015

I would guess you'd want md5 for that. The interesting bit is to see if it
is possible to marry it with adler32 in the server used to check the
transfers.

On Wed, Nov 11, 2015 at 4:40 PM, ckamm notifications@github.com wrote:

This has a couple of steps

  • Calculate checksums on upload even if the server doesn't support
    them, store them in the local db
  • If the new checksum is the same as the old one, assume the content
    is the same.
    • This could either be done while reconciling the local and db
      trees or in the upload propagation job.
  • Do a remote metadata update instead of a full upload.
    • Check that PROPPATCH works


Reply to this email directly or view it on GitHub
#3235 (comment).


Best regards,
Kuba

@ckamm

This comment has been minimized.

Copy link
Member

commented Nov 12, 2015

I've tested PROPPATCH for updating file mtimes on the server, but it currently makes the file etag change (owncloud/core#20474).

@dragotin and @ogoffart agree that in that case we are better off not telling the server about the new mtime, as it'd cause all other clients to re-download the unchanged file.

ckamm added a commit to ckamm/owncloud-client that referenced this issue Nov 23, 2015

ckamm added a commit to ckamm/owncloud-client that referenced this issue Nov 23, 2015

@ckamm

This comment has been minimized.

Copy link
Member

commented Nov 23, 2015

Current pull request is #4180

ckamm added a commit that referenced this issue Nov 23, 2015

ckamm added a commit that referenced this issue Nov 23, 2015

Checksums: Don't reupload if size and checksum are unchanged #3235
* Compute the content checksum (in addition to the optional
  transmission checksum) during upload (.eml files only)

* Add hook to compute and compare the checksum in csync_update

* Add content checksum to database, remove transmission checksum

@ckamm ckamm added the ReadyToTest label Nov 23, 2015

@Dianafg76

This comment has been minimized.

Copy link

commented Nov 25, 2015

@ckamm what are the steps to reproduce this issue?

@ckamm

This comment has been minimized.

Copy link
Member

commented Nov 25, 2015

@Dianafg76

  • Create a new .eml file locally, wait for it to upload
  • Update the mtime of the file without changing the contents (touch command)
  • Verify that the client does not re-upload the file. (log file or activity log)
  • Verify that the client still uploads the file if the content changes.
  • Verify that the client still uploads other files. :)
@Dianafg76

This comment has been minimized.

Copy link

commented Nov 27, 2015

I tested this issue and is working OK
Desktop v ownCloud-2.1.0.5655-nightly20151126-setup.exe
Server v {"installed":true,"maintenance":false,"version":"8.2.1.4","versionstring":"8.2.1","edition":"Enterprise"}

Closed

@mocolini

This comment has been minimized.

Copy link

commented Jan 19, 2016

It seems that this bug affects owncloud-client yet.
If I don't remove ".eml" in "HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindowsCurrentVersionPropertySystemPropertyHandlers" the client keep uploading (unchanged) *.eml files.
So, every time windows update itself (and so the ".eml" registry key is set back to default), hundred of .eml files (and MB) are re-uploaded to the server.

EDIT:
O.S.: windows 10 Home
Client version: 2.1.0 (build 5683)

@guruz

This comment has been minimized.

Copy link
Collaborator

commented Jan 19, 2016

@mocolini Are you saying the content of those .eml files does not change but they are still reuploaded?

@mocolini

This comment has been minimized.

Copy link

commented Jan 19, 2016

@guruz exactly.
Today windows 10 upgrade itserlf and re-set the ".eml" key in the register (the one I've removed days ago), after that owncloud-client has begun to re-upload all the (unchanged) *.eml files.

@ogoffart

This comment has been minimized.

Copy link
Collaborator

commented Jan 19, 2016

I think it needs to be uploaded at least once after upgrading before the checksum are stored in the databse and are compared.

Does it re-upload several time the same file?

@mocolini

This comment has been minimized.

Copy link

commented Jan 19, 2016

To be onest I don't know, 'cause I suddenly remove the ".eml" key .
So the upload is performed just one time to "populate" the db? Am 'I right?
And this happen for each windows PC, right?

@ogoffart

This comment has been minimized.

Copy link
Collaborator

commented Jan 19, 2016

yes, previous version of the client did not store the checksum in the database. and we only store it after we complete an upload or a download.

@mocolini

This comment has been minimized.

Copy link

commented Jan 19, 2016

Ok, perfect, I misunderstood the client behavior.
I'll check if the problem persist after complete upload/download, in that case I'll bother you again!
:-)

Cheers
Michelino

@guruz

This comment has been minimized.

Copy link
Collaborator

commented Jan 19, 2016

Sounds good :)
If you re-added the registry key, you can also try to remove it..

@mocolini

This comment has been minimized.

Copy link

commented Jan 19, 2016

Sure, I'll let you know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.