Usingnrclone 1.28 on an ubuntu arm machine (raspberry pi).
I am trying to sync my onedrive contents to google drive - e.g. cloud to cloud. Out of the 24k or so files 465 consistently are corrupted and thus not transferred. I know that those files are not corrupted on onedrive or my local drive (that uses microsoft's onedrive client to sync files), however those files I am simply unable to sync. I have started out with an empty google drive folder so there are no duplicates.
See attached log files. The only correlation that I can see is that these files are all small, not more than 100k. On average they are 56k (median 57kb), with minimum size of 67 bytes and max 190kb. It does not matter how many times I retry, these files are consistently corrupted.
full log file here (10mb, result of -v) http://darkvater.homenet.org/rclone_server_6.txt
Update: I am doing a test now to just sync onedrive to local and although it hasn't finished yet, I can already see corrupted files. I'll update in the morning, but it could be a onedrive issue instead as these files are correct in the cloud.
Is there any chance you could send me one of those files that does get corrupted?
Can you also post the command line you are using just for completeness?
I've uploaded 1,000 random files to drive with mean size 64k, then copied them to onedrive but I didn't manage to reproduce the problem :-( I'm using linux on ubuntu/amd64.
Sure, no problem, here's a link to a onedrive folder (Public/ooenttd/goofs) with corrupted files in it: https://onedrive.live.com/redir?resid=8A5D53472BD1F293!586&authkey=!ANUn7GClDXvuwSQ&ithint=folder%2cpng
Command executed: tomi@alexandria:/mnt/sdb2$ work/rclone-v1.28-linux-arm/rclone --delete-excluded --filter-from ~/filter_file --dump-filters -v sync onedrive: local:stuff: 2>rclone_server_8.log
tomi@alexandria:/mnt/sdb2$ work/rclone-v1.28-linux-arm/rclone --delete-excluded --filter-from ~/filter_file --dump-filters -v sync onedrive: local:stuff: 2>rclone_server_8.log
Local folder contents:
tomi@alexandria:/mnt/sdb2/stuff:/Public/openttd/goofs$ ls -l
-rw-r--r-- 1 tomi users 324590 Jan 22 2008 595x395.png
-rw-r--r-- 1 tomi users 169251 Jan 22 2008 surrealistic.png
-rw-r--r-- 1 tomi users 42497 Jan 22 2008 trees-1.png
-rw-r--r-- 1 tomi users 132091 Jan 22 2008 trees-alot.png
List of corrupted files in that folder: http://darkvater.homenet.org/corrupted_goofs.txt
I did a test on an ubuntu 13.19 64bit vm I have at home and did a copy with that from onedrive. Same, or at, least very similar results. Many of the corrupted files are the same with the same sizes for corruption.
I've managed to replicate the bug with your corrupted goofs - thank you very much for those.
Just downloading the files from onedrive with rclone is enough to cause the problem, so we can rule drive out of the equation.
I downloaded the files using the web interface and they were correct.
I haven't worked out what is going on yet but I will :-)
To take the example of 1 file. it looks this big
-rw-rw-r-- 1 ncw ncw 97258 Feb 15 2014 HIGH-bridges.png
However according to rclone it is this big
$ rclone ls onedrive:goofs
The web interface agrees that it is 200k.
So somehow onedrive has its metadata in a twist...
Which gives me an idea...
When I try uploading the files with --no-gzip-compression they appear properly, so somehow onedrive has decompressed and recompressed the file on the fly.
Can you see if re-uploading the bad files with --no-gzip-compression to onedrive and downloading them with the same fixes the problem for you?
I think this is probably a bug in onedrive, but possibly one that can be worked around!
I uploaded the same file 100 times to onedrive and it is 200k half the time and 97k half the time!
After a lot of investigation, I've discovered that it is the updating the modification time which we do after the file is uploaded which triggers the problem. If I stop doing that then the file is no longer has the wrong size when uploaded. I'm reasonably convinced this is some sort of race condition in onedrive - I've been trying to reproduce it with the python SDK so I can report it as a bug.
After a few more hours of experimentation, I've discovered that
It seems to be very sensitive to something that I haven't worked out yet.
Since I managed to reproduce the bug using the official SDK I reported it as a bug.
Hopefully someone from Microsoft will escalate the problem to the right person.
Same bug: https://onedrive.uservoice.com/forums/262982-onedrive/suggestions/6711029-fix-size-metadata-bug. A dev has responded here: http://stackoverflow.com/a/27031491/681490 1½ year ago, so they don't seem to be in a hurry to fix it.
Thanks for finding that Klaus. Not quite sure what we should do for rclone. Setting the size in the object returned by Put will allow the upload without the corrupted report but it will be uploaded every sync which maybe is OK since it doesn't happen to many files.
Hey ncw, thanks a lot for the investigation! Really appreciate it. I'm currently in India but if you need anything I'll happily help in two weeks when I'm back
I have received an official response from Microsoft on OneDrive/onedrive-sdk-python#27
It is a known bug that Microsoft haven't fixed and if you would like it fixed then vote on Uservoice - I did!
I can't think of a sensible work-around for this - rclone needs to make sure the size of the file is correct and if it can't rely on the size of the file then it will assume that it has been corrupted.
There is a response by the python sdk developer - I believe rightly - that file size should not be used to check file integrity. I see that rclone has a –checksum option. Does this also still verify file size as the documentation suggests it does? Maybe we can adapt the behaviour to only check the checksum if this option is set and not the filesize.
Checksum will still check file size. I'd need to add a --no-check-size option which you would use with --checksum that might be worth a try.
Hi ncw, not sure how the program is structured internally, but I think that if we are making changes that --checksum should not check file size. A checksum will fail if file size is not the same anyways, so a file size check is nothing more than a quick failure shortcut. I would change the checksum option to ignore file size. An option could be added to the really brave to disable all kind of checks by setting --no-check-size as file size is default mechanism if you so wish so.
I've implemented an --ignore-size flag which you can use to work-around this issue. I've verified it does the right thing.
Here is a beta with that fix in for you to try.
Please re-open the issue if you have any problems with it.
Add --ignore-size flag - fixes #399
Hi ncw, thanks a lot for the patch! I tried it and can verify that the corruption as I have experienced is now gone. Good stuff!
I see a new version has already been released with these changes. Just would like to add that in the changelog the description only talks about "corrupted images". This is however incorrect as corruption happens also for other files as word or PDF documents, I believe anything that onedrive has previews for.