New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelizing Rclone with --checkers thread when using --files-from #2835

Closed
nishkalprakash opened this Issue Dec 13, 2018 · 8 comments

Comments

Projects
None yet
2 participants
@nishkalprakash
Copy link

nishkalprakash commented Dec 13, 2018

What is your current rclone version (output from rclone version)?

rclone v1.45

  • os/arch: windows/amd64
  • go version: go1.11

What problem are you are trying to solve?

I am trying to Transfer files from HTTP remote to a GD remote using a --files-from flag which has over 10k entries. Rclone seems to be checking if each file in the --file-from exists, one at a time.

How do you think rclone should be changed to solve that?

This process of checking if files exist could be parallelized with checkers.
This would make this process run much faster.

@ncw ncw added the enhancement label Dec 13, 2018

@ncw ncw added this to the v1.46 milestone Dec 13, 2018

ncw added a commit that referenced this issue Dec 13, 2018

filter: parallelise reading of --files-from - fixes #2835
Before this change rclone would read the list of files from the
files-from parameter and check they existed one at a time.  This could
take a very long time for lots of files.

After this change, rclone will check up to --checkers in parallel.
@ncw

This comment has been minimized.

Copy link
Owner

ncw commented Dec 13, 2018

I've had a go at this here

https://beta.rclone.org/branch/v1.45-033-ga3a64a74-fix-2835-parallel-files-from-beta/ (uploaded in 15-30 mins)

Can you test please?

Try different numbers in --checkers

@nishkalprakash

This comment has been minimized.

Copy link

nishkalprakash commented Dec 13, 2018

Will test.

The link is not found.
"404 Not Found"

@ncw

This comment has been minimized.

Copy link
Owner

ncw commented Dec 13, 2018

Ok that needs a fix! WIll reply with another URL shortly!

@ncw

This comment has been minimized.

Copy link
Owner

ncw commented Dec 13, 2018

@nishkalprakash

This comment has been minimized.

Copy link

nishkalprakash commented Dec 13, 2018

Right now its not found,
So I should wait 15-30 mins right?

@ncw

This comment has been minimized.

Copy link
Owner

ncw commented Dec 13, 2018

It is there now! I paste the above message when the CI build starts. It takes ~30 mins and usually works, so I can go off and do other things at that point.

@nishkalprakash

This comment has been minimized.

Copy link

nishkalprakash commented Dec 13, 2018

I started the transfer 20 mins back, the transfer didn't start until now,
but then i realised that the file had 50k links, so i stopped that.

So i tried on a test file with 1k links, and
--checkers 50 which completed checking in 14 sec,
--checkers 4, completed checking in 107 sec...

So i guess it was a success.
Great Job.
Thank you.

@ncw ncw closed this in 5ee1816 Dec 13, 2018

@ncw

This comment has been minimized.

Copy link
Owner

ncw commented Dec 13, 2018

Thanks for testing :-)

I've merged that to master now - you can find it in the latest beta in 15-30 mins and in v1.46 when it is released!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment