Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy in serial? #34

Closed
jamestalmage opened this issue Jul 22, 2016 · 10 comments · Fixed by #69
Closed

Copy in serial? #34

jamestalmage opened this issue Jul 22, 2016 · 10 comments · Fixed by #69

Comments

@jamestalmage
Copy link

It seems really unlikely to me that copying a bunch of files in parallel would speed anything up. Indeed, I would not be surprised to find it slowed things down. Increasing the number of files read simultaneously seems likely to encourage lots of seek delays for HDD's, and SSD's should be able to saturate their bandwidth regardless of how many files are being copied simultaneously.

It's probably worth running some benchmarks to prove this, but I think you would have seen Operating Systems commands / GUI's doing parallel copies if there were any advantage to it.

I guess technically, you could be copying from multiple slow network mounted drives - in which case parallelization might make sense. But I think that's a very small (possibly non-existent) percentage of users.

@sindresorhus
Copy link
Owner

I think it really depends on what you copy. My theory is that lots of small files would be faster to copy serially, especially with sync fs methods, but with large files we could see a speed-up with parallelization. Only a wild guess though, so would need some benchmarks. At the very least, we should limit the concurrency.

@YurySolovyov
Copy link
Contributor

At the very least, we should limit the concurrency.

We can start by replacing Promise.all with https://github.com/sindresorhus/p-all and choosing some finite number for concurrency

@sindresorhus
Copy link
Owner

@YurySolovyov Yup. We should do some benchmarking on this, but I think (os.cpus().length || 1) * 2 or (os.cpus().length || 1) * 3 is generally a good default.

@YurySolovyov
Copy link
Contributor

@sindresorhus I'm just not sure where the bottleneck is, if it is CPU threads, yes this is reasonable, if it is number if file descriptors, then it should be something much higher, like 64 or 128

@sindresorhus
Copy link
Owner

Hence why we need benchmarks.

@YurySolovyov
Copy link
Contributor

YurySolovyov commented Oct 26, 2016

@sindresorhus do you have an experience of writing good ones? (I would like to learn if so)

@kevva
Copy link
Contributor

kevva commented Oct 26, 2016

@YurySolovyov, you could use https://github.com/logicalparadox/matcha for example.

@schnittstabil
Copy link
Collaborator

schnittstabil commented Oct 26, 2016

@YurySolovyov If you want to read some code: globby uses matcha (bench.js)

I don't fully trust results of matcha, but neither I can remember why nor do I know a good alternative – good benchmarking tools are extremely rare, for every language and platform.

Maybe @jamestalmage have had similar concerns when he handcrafted the AVA benchmarks, great work btw, however I would prefer some library, even though it's matcha in the end.

@kevva
Copy link
Contributor

kevva commented Oct 26, 2016

There is https://github.com/bestiejs/benchmark.js too.

@sindresorhus
Copy link
Owner

I think the easiest solution here is to just add p-all and expose a concurrency option, which defaults to (os.cpus().length || 1) * 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants