-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid calculating checksum if size differs (performance optimization opportunity) #49
Comments
checksumDiffers just checks if the DifferMd5Checksum bit is set in a mask. However you do have a point that the precomputation of differences if the size is not the same, then naturally the checksum is different. Before the code to skip this check was in but it was a little messy and got replaced, just need to revert it. I'll jump on this when I get back home in about 9 hours and ping you up. Thank you for the suggestion. |
Thank you @tkjacobsen for bringing this up. It is modest of you to call this a minor enhancement, it is actually a major one if you consider that for the common case, mostly when sizes are equal the files are mostly the same. Please take a look at c281591 |
Well the file size difference check itself is minor - the ignore-checksum is not :). Comment to the change: However, when setting the ignoreChecksum flag introduces one (possibly unintended) side effect: If a change is just a modTime difference, the DifferentMd5Checksum mask will not be set causing a re-download of a potentially identical file. Not sure what right solution would be. Potential solution might be to do a checksum only if date differes (when the flag is set). |
I see. However, ModTime changes on their own should cause just the receipient's time to be modified, no need for re download/upload of a file. Also if DifferMd5Checksum is not set, this cuts out directly the need to even think about downloading the file. |
Good point. This would only cause erroneous updates if a file changes but keeps same size. In this case modTime will be updated, but if checksum is not calculated, file will not be updated. My point is (I think) that if modTime has changed (even if file size is the same) there is an increased likelihood that the file content has changed as well (see use case below). (If neither modTime nor size changed, there is a little chance that content - and thus checksum - changed -- unless someone actively updated the file and went back to set modTime to original value). I would therefore suggest one of the following:
Use case:
|
In deed, however the point of -ignore-checksum is that a user is explicitly making a risk for example as highlighted in the use case that prompted that flag where computing the checksum might be quite expensive take a look at issue #47 .
I can give you two cases for which this is may not be true:
Hmm, see suggestion 1 is a bit misleading because -ignore-checksum is meant to handle the case where the user does not need to perform full checksum verification, if you want checksum verification don't use -ignore-checksum. Suggestion 2 introduces a bit of ambiguity in the flag name itself.
Unfortunately this use case cannot work for |
yea, probably not worth it to introduce this complexity and ambiguity for such a corner case. Anyway, thanks a bunch for the original patch! It works really well on on pull requests containing large files (esp on a low powered device as my Raspberry Pi): [18:01] user@alarmpi Large Files $ time drive pull && time drive pull -ignore-checksum
Resolving...
md5Checksum: `[snip].zip` (407.53MB)
might take time to checksum.
md5Checksum: `[snip].zip` (462.36MB)
might take time to checksum.
Everything is up-to-date.
real 4m2.485s
user 1m39.450s
sys 1m55.420s
Resolving...
Everything is up-to-date.
real 0m4.350s
user 0m1.150s
sys 0m0.090s I'll make sure to learn GO and send a working patch next time :) |
Good to know that this improves things. Thank you for the great suggestion [ keep 'em honest ;) ] as well as for the thought experiment. For sure, your suggestions, feedback and ideas are always welcome :) |
Seems function "checksumDiffers" in types.go calculates the checksum of a local file even if it differs in size from the remote file. This seem unnecessary and increases resolving time if many files have been changed.
Edit: Enhancement
The text was updated successfully, but these errors were encountered: