Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document ignore-checksum behavior on modTime changes + same file size #690

Closed
tkeith opened this issue Jul 11, 2016 · 15 comments
Closed

document ignore-checksum behavior on modTime changes + same file size #690

tkeith opened this issue Jul 11, 2016 · 15 comments

Comments

@tkeith
Copy link

tkeith commented Jul 11, 2016

  1. Create a new file: echo 1 > test.txt
  2. Push it: drive push test.txt
  3. Modify it: echo 2 > test.txt
  4. Push it: drive push test.txt
  5. View it on google drive web interface
    Result: I still see "1" as the content, expected to see "2"
  6. Modify it with a different length: echo 2x > test.txt
  7. Push it: drive push test.txt
  8. View it on the web interface
    Result: I see the updated version containing "2x"
@odeke-em
Copy link
Owner

Hello there @tkeith. Thank you for reporting this issue and welcome to drive!

Aha, so you encountered this. Yes, by default checksumming is turned off because we deemed it too aggressive with issue #117 and doesn't cover most of the cases so size comparisons are sufficient mostly. You can turn it on by using flag --ignore-checksum=false during a push or pull.
It is detailed in the README here https://github.com/odeke-em/drive#note-checksum-verification
screen shot 2016-07-11 at 1 22 26 am

@CountryBumkin
Copy link

I thought the change in modtime of the file would have caused the new version to upload ??

@odeke-em
Copy link
Owner

@CountryBumkin, that just causes a 'touch'(a separate operation that changes the modTime) of the file. No need to reupload a 1GB file if only the modTime changed and the checksum wasn't considered, nor size.

@CountryBumkin
Copy link

Doing that makes the help comment you posted ambiguous. The phrase in your screen shot 'common case in which size + modTime differences are sufficient to detect file changes' could imply that a size change or a modtime change produces the same action, but that is not the case ?

@odeke-em
Copy link
Owner

@CountryBumkin, file changes involve modTime, and content changes. drive can sufficiently change the modTime of a file since that's a file change of its own, just like you have touch in your shell. In fact, conflicts are detected when modTimes are different on localFS vs remote vs localDB. Maybe to resolve the ambiguity we could improve the documentation to explicitly state what happens on a modTime change? Let me know what you think.

@CountryBumkin
Copy link

I think it would be better to spell out how the conflict is resolved and indicate if only a touch is performed.. A thought would be resolve the conflict by checksumming the particular file, this is the safe solution.

@tkeith
Copy link
Author

tkeith commented Jul 11, 2016

@odeke-em Thanks for the explanation on this behavior. I did read the "Checksum Verification" info you posted prior to posting this issue, but I had the same understanding as @CountryBumkin -- I took it to mean that the full file would be updated if either size of modtime had changed. It seems to me that going by size alone is a risky way to determine if content is the same, I've encountered many filetypes that will often be a constant size regardless of content. Instances where modtime would change without content changing seem rare, so wouldn't it make more sense to re-upload the file when modtime changes?

Anyway, now that I understand this issue, I will be able to work around it by using checksum verification. Thanks for the clarification, and I appreciate your work on this project!

@tkeith
Copy link
Author

tkeith commented Jul 11, 2016

@odeke-em Do you think the rsync approach would be a reasonable alternative to the current strategy? rsync first checks if size and modtime match between the client & server. If both size & modtime are equal, the file is considered to be up to date. If modtime is different, checksum is calculated to check if the files match. This reduces the checksum verifications to only files where modtime is different and mitigates the risk of failing to push an updated file because the size hasn't changed.

@odeke-em
Copy link
Owner

@tkeith that was the old behavior and that is what happens when you turn off --ignore-checksum. With the old behavior, if the modTime was different but size was the same, it compared checksums, but that behavior was deemed too aggressive, so it became an opt-in rather than a default.

@tkeith
Copy link
Author

tkeith commented Jul 12, 2016

@odeke-em Oh ok, I thought --ignore-checksum=false forced checksum validation on all files regardless of size & modtime. If --ignore-checksum=false will only perform checksum validation when size or modtime is different, that is perfect for my usage.

I have read through #117, and I still don't understand where the idea of only updating the remote mtime if the local mtime has changed but the size is the same originated from. To me, this feels like very dangerous behavior, and the rsync approach makes much more sense, and checksumming only when mtime is different doesn't seem excessive (how often do mtimes change without the file content changing?) Anyway, now that I am aware of this I will be careful to always use --ignore-checksum=false.

Either way, I'm glad that the tool offers this functionality, and thank you for taking the time to correct my misconceptions. I really appreciate the work you have put into keeping the project active!

@tkeith tkeith closed this as completed Jul 12, 2016
@odeke-em
Copy link
Owner

Gotcha. Yes and no, IMO empirically it seems dangerous but actually when you have people who are aiming for faster pushes because they are used to fast pushes and used to sync that sits for a long time, trying to mimick that same speed with a selective push becomes hard and the people that actually need their files to be properly synced carefully should be using --ignore-checksum=false. I was for a long time opposed to the opt-in behavior but I was convinced by everyone else to satisfy the common case. In the future, hopefully we'll come to a common ground.
Thank you for the discussion, for the kind words and for using drive!

@odeke-em odeke-em added this to the v0.3.8 milestone Jul 12, 2016
@odeke-em
Copy link
Owner

I've marked it as needing documentation and for milestone v0.3.8. I'll reopen it so that we can document it and be clearer as suggested by @CountryBumkin and @tkeith.

@odeke-em odeke-em reopened this Jul 12, 2016
@odeke-em odeke-em changed the title pushing modified files doesn't work if size is the same document ignore-checksum behavior on modTime changes + same file size Jul 12, 2016
@tkeith
Copy link
Author

tkeith commented Sep 14, 2016

Sorry for bringing this up again... Is there a configuration option to change the behavior? Or a simple source code change I can make? I want to ensure that I don't accidentally forget to use "--ignore-checksum=false" for a push or pull.

@odeke-em
Copy link
Owner

No worries @tkeith, so drive accepts custom configurations that are read in, in the path. These are .driverc files similar to your ~/.bashrc or ~/.bash_profile that you can include in your home ie ~/ or in the root of your mounted drive or the relevant directory that you are pushing or pulling from. Please see https://github.com/odeke-em/drive#driverc for more information

Here is a sample .driverc

$ cat << ! >> .driverc
> ignore-checksum=false
> !

and you can have different .driverc files in every single directory, but please beware of those files.

odeke-em added a commit that referenced this issue Sep 17, 2016
Fixes #690.

Document that modTime on its own is an operation that doesn't
necessarily warrant resyncing the contents of a file.
@tkeith
Copy link
Author

tkeith commented Oct 9, 2016

Thank you, I wasn't aware of the driverc feature, this is just what I need! I appreciate all your help on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants