Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal rclone is doing deletions better #2723

Open
lucasyvas opened this issue Nov 4, 2018 · 14 comments

Comments

@lucasyvas
Copy link

commented Nov 4, 2018

What is the problem you are having with rclone?

The rclone sync command appears to never exit, even after all the checks and transfers appear to be complete. I have let it loop at the message below for about an hour and it has not exited, no matter how many times I try.

What is your rclone version (output from rclone version)

rclone v1.44
- os/arch: linux/amd64
- go version: go1.11.1

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Linux (Ubuntu 18.04), 64-bit

Which cloud storage system are you using? (eg Google Drive)

OneDrive Business

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync -vvL "${BACKUP_SRC}" "${BACKUP_DST}"

A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp)

Prior to this block, there's the usual large stream output of every file being checked. We have a lot of files, but none have changed in between runs of the script. It would appear everything is successful, except for...

2018/11/04 11:28:48 INFO  : One drive root 'Backup': Waiting for checks to finish
2018/11/04 11:28:48 INFO  : One drive root 'Backup': Waiting for transfers to finish
2018/11/04 11:28:49 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 0
Checks:            588581 / 588581, 100%
Transferred:            0 / 0, -
Elapsed time:     25m0.7s

2018/11/04 11:29:49 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 0
Checks:            588581 / 588581, 100%
Transferred:            0 / 0, -
Elapsed time:     26m0.7s

2018/11/04 11:30:49 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 0
Checks:            588581 / 588581, 100%
Transferred:            0 / 0, -
Elapsed time:     27m0.7s

2018/11/04 11:31:49 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 0
Checks:            588581 / 588581, 100%
Transferred:            0 / 0, -
Elapsed time:     28m0.7s

2018/11/04 11:32:49 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 0
Checks:            588581 / 588581, 100%
Transferred:            0 / 0, -
Elapsed time:     29m0.7s

2018/11/04 11:33:49 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 0
Checks:            588581 / 588581, 100%
Transferred:            0 / 0, -
Elapsed time:     30m0.7s

2018/11/04 11:34:49 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Errors:                 0
Checks:            588581 / 588581, 100%
Transferred:            0 / 0, -
Elapsed time:     31m0.7s
@lucasyvas

This comment has been minimized.

Copy link
Author

commented Nov 4, 2018

Well, I left it for another run! It does indeed finish. Turns out some things had been deleted and it just took a really long time to figure out what to delete. There are a lot of files, so I'm not judging 😄

Maybe the output could be improved to make it more obvious what is going on? I can see how there's almost no issue when you don't have many files, but with a lot you can spend a lot of time sitting around wondering what's going on.

@ncw

This comment has been minimized.

Copy link
Collaborator

commented Nov 4, 2018

Well, I left it for another run! It does indeed finish. Turns out some things had been deleted and it just took a really long time to figure out what to delete. There are a lot of files, so I'm not judging smile

Ah I see what is happening...

The default delete mode for rclone is --delete-after so rclone will effectively pause at the end while it deletes lots of files.

Maybe the output could be improved to make it more obvious what is going on? I can see how there's almost no issue when you don't have many files, but with a lot you can spend a lot of time sitting around wondering what's going on.

rclone is quite good at telling you about files it is checking or transferring, but not so much deleting.

I think I should do two things

  • Put a Debug log in when files are deleted
  • Put a deleted files counter in the stats

What do you think?

@ncw ncw added the enhancement label Nov 4, 2018

@ncw ncw added this to the v1.45 milestone Nov 4, 2018

@lucasyvas

This comment has been minimized.

Copy link
Author

commented Nov 4, 2018

I think those sound like they could be really helpful additions for sure! Please feel free to completely disregard the original intention of the issue and rename/close it as you see fit. Thanks so much for your work on this fantastic tool by the way.

@ncw ncw changed the title rclone sync waiting for checks to finish forever (OneDrive Business) Signal rclone is doing deletions better Nov 4, 2018

@Cnly

This comment has been minimized.

Copy link
Collaborator

commented Nov 5, 2018

This might be a little off-topic but @lucasyvas do you have over 500,000 files on your OneDrive? And you're managing them with no problem using rclone? Other than the long waiting time, do you get anything else such as an error?

I'm asking because we have #2707 where @gualdo0 is seeing problems with ~100,000 files, and your info might be helpful! Thanks.

@lucasyvas

This comment has been minimized.

Copy link
Author

commented Nov 5, 2018

We have just started using rclone for this and it is far and away the largest file set I've ever attempted with it. The significance of this is that I have not been managing such a large file set long enough to confidently say how reliable it is over time.

Most of the past few days have been spent restarting the rclone sync routine whenever a failure is encountered. That said, lots of our failures were not clearly to do with rclone and instead appeared to be from session timeouts with the machine when running our script over SSH.

But there are a huge number of files and we did achieve sync over a few days. I have not seen the majority of errors cited in that issue.

I'm not sure if it's worth mentioning, but I did opt to set up our own client_id and secret instead of using whatever is embedded in rclone - from experience this just seemed like a safer course of action.

The only problem I'm having that would appear to be an actual problem is that I'm having a hell of a time syncing ".one" OneNote files (I may be opening an issue for this soon as I can't make sense of it).

It seems to hate these files and claims throttling errors on them - they appear to transfer to 100% but stall at the full X/X bytes transfered status. Just yesterday when I tried this produced I/O errors similar to the issue, however I'm not sure if this is anecdotal evidence of the described issue.

I think it's fair to say that rclone is capable of this many files, but there is something going on that makes it a less than smooth experience. Now that we are mostly synced up (and it's a "production" environment) I can't muck around with it too much, but I'd be happy to try some things if it helps and I am able to.

Edit: Also might be mentioning our source is just the local disk and the destination is a folder named "Backup" at the root of the OneDrive.

@gualdo0

This comment has been minimized.

Copy link

commented Nov 5, 2018

Thanks for having a look at this. I'm also using my own own client_id and secret.
One important point (in my current view) is that all (>100,000 files) are in just 1 folder -the problematic folder. Is this also your case? I have in addition many other folders, a few of them with up to ~30,000 files, that are synced with no problem.

@lucasyvas

This comment has been minimized.

Copy link
Author

commented Nov 5, 2018

@gualdo0 That may be the differentiating factor. I don't think we've got anything near that in our setup. We have a pretty "impressive" tree and scores of files and folders all over the place, but we do not have anything like a flat 100,000+ in a single directory. That is madness 😄

It's on behalf of a firm, so they have lots of existing folder structure and it fits a practical "business" use case. It's organized with humans in mind, so the counts are not too high in any one directory.

@gualdo0

This comment has been minimized.

Copy link

commented Nov 5, 2018

You are probably right some of us astrophysicists may be a bit crazy :-)) but to analyze in detail some data we have to generate a grid of models. 3000 models, each one yielding 36 files, gives the number. The files are used by programs that fit a number of sources (galaxies), that's why they are in the same folder -it's the easiest way. Now, i'm trying to split that folder in 2 folders to see if the error disappears (involving some work on the programs that fit the galaxies); will let you know. In any case, it's good that the community knows on the limit of the number of files in 1 folder that rclone can manage? Thanks for your answer!

@lucasyvas

This comment has been minimized.

Copy link
Author

commented Nov 5, 2018

@gualdo0 We're all a little crazy! It seems like a totally valid use case to me though - I think we are both pushing this thing pretty hard in our own right. I'd be interested to hear the outcome so I know if I should try to avoid such a scenario in the short term if the occasion rises.

@gualdo0

This comment has been minimized.

Copy link

commented Nov 6, 2018

@lucasyvas After deleting some files and moving files to another folder, the monstrous folder has "only" 48,082 files and the problem has disappeared. It has 3 sub-folders with 23800, 10640, and 360 files, which are also synced successfully. Conclusion: ~50,000 files in a single folder is OK, but >100,000 files in a single folder gives timeout problems.

@ncw

This comment has been minimized.

Copy link
Collaborator

commented Nov 7, 2018

@gualdo0 wrote

After deleting some files and moving files to another folder, the monstrous folder has "only" 48,082 files and the problem has disappeared. It has 3 sub-folders with 23800, 10640, and 360 files, which are also synced successfully. Conclusion: ~50,000 files in a single folder is OK, but >100,000 files in a single folder gives timeout problems.

Great work! If we do nothing else then we should write that in the onesrive docs! Do you want to send a PR with that in on docs/content/onedrive.md?

@gualdo0

This comment has been minimized.

Copy link

commented Nov 11, 2018

@ncw sorry for my late answer! -very busy week.
Yes -I think this should be shared in some place, but i'm not sure how to do it. Would you mind to do it yourself? Or please let me know how to do it... when searching for docs/content/onedrive.md, i get something like an official page of rclone for onedrive but not sure i should edit that... Thanks!

@ncw

This comment has been minimized.

Copy link
Collaborator

commented Nov 11, 2018

@gualdo0 no worries. I'll put this in the limitation sections of the docs

OneDrive seems to be OK with up to 50,000 files in a folder, but at
100,000 rclone will get errors listing the directory like couldn’t list files: UnknownError:. See
#2707 for more info.

@gualdo0

This comment has been minimized.

Copy link

commented Nov 11, 2018

Great @ncw many thanks! Just one note: we know that there is no problem with 50,000 files, and we have a problem with >100,000, so that the limiting number should be in between both numbers -but i couldn't check the actual value (better "at least 50,000 files", rather than "up to 50,000 files"). On the other hand, i'm not sure if there is any other parameter (e.g. the connection speed) that could matter...

@ncw ncw modified the milestones: v1.45, v1.46 Nov 24, 2018

@ncw ncw modified the milestones: v1.46, v1.47 Feb 9, 2019

@ncw ncw modified the milestones: v1.47, v1.48 Apr 15, 2019

@ncw ncw modified the milestones: v1.48, v1.49 Jun 19, 2019

@ncw ncw modified the milestones: v1.49, v1.50 Aug 27, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.