Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not as fast on Macs #2

Open
bengl opened this issue Mar 21, 2018 · 8 comments
Open

Not as fast on Macs #2

bengl opened this issue Mar 21, 2018 · 8 comments

Comments

@bengl
Copy link
Member

bengl commented Mar 21, 2018

As first reported by @zkat, and verified by others, it looks like the final number on the benchmark, referring to the primed-cache speed of qdd, seems to be not significantly better on Macs. While I'm getting times of 0.5s on my machine (Linux), Mac users seem to be getting closer to 4s or 5s.

I collected .cpuprofile data from my machine and a Mac, and found that around 80% of the time on Macs is being spend in (idle), leading me to believe it's simply waiting on filesystem operations the whole time. On Linux the (idle) time is closer to 20%-25%, so while this might not account for all of the overhead, it at least accounts for a really huge chunk of it.

Since almost all of that operation consists of a recursive copy, I modified that file to use standard fs module operations (the qdd version calls the binding directly), and also added perf_hooks marks to see which of the filesystem calls were taking up the most time. The resulting script is in this gist, which can be run from any arbitrary empty directory. The test downloads a tarball, unpacks it, and measures the time to copy recursively from one directory to another. In qdd these operations happen many, many times in parallel.

Here are the results (time in ms):

My Arch Linux Lenovo X1 Carbon (from 2016):

$ node copytest.js 
readdir 1.995347
stat 16.328005827783088
mkdir 7.106336499999999
copyfile 15.482173730219255
---
77.179121

A Google Compute Cloud instance (TODO put specs in here):

$ node copytest.js 
readdir 1.7026445
stat 14.900154024738319
mkdir 9.860568500000001
copyfile 15.350753629170656
---
71.24901

A macincloud.com Pay-as-you-Go instance (OS X High Sierra):

$ node copytest.js 
readdir 4.9309635
stat 83.9008870846811
mkdir 38.07782
copyfile 78.84369494852221
---
353.116088

While this is still pretty inconclusive, fs.stat and fs.copyFile seem to be taking considerably longer on a Mac than on Linux. In all tests, node@8.9.4 is used. For both my machine and the Google instance, the filesystem is ext4 and for the Mac it's HFS+.

@evanlucas
Copy link

Maybe compare ulimit -n? I think it is 256 on macOS by default... not sure what benchmark you are using, but that is pretty low if you are opening a bunch of files.

@addaleax
Copy link

Does the perf difference carry over to sync calls as well?

@bengl
Copy link
Member Author

bengl commented Mar 22, 2018

@evanlucas For ulimit -n I'm seeing 4096 on my Linux machine, 32768 on macOS.

@addaleax It looks like it does. I added a sync version of the test to the gist. Here are the results:

Linux:

$ node copytestsync.js 
readdir 0.44465699999999997
stat 0.01158147002854424
mkdir 0.056488000000000003
copyfile 0.02846236224976165
---
75.75296

Mac:

$ node copytestsync.js
readdir 0.9944815
stat 0.050528007611798355
mkdir 0.222148
copyfile 0.48516409151572953
---
673.984851

@bengl
Copy link
Member Author

bengl commented Mar 22, 2018

Since the test code here is effectively doing the same thing as a cp -r, I thought I'd try timing that in both environments. The script used is in the gist as copytest.sh.

Linux:

real    0m0.022s
user    0m0.003s
sys     0m0.019s

Mac:

real	0m0.318s
user	0m0.026s
sys	    0m0.281s

@addaleax
Copy link

I think that rules out overhead from the threadpool mechanisms (which is ultimately implemented in a very platform-dependent way). (Btw, file system writes are protected by a global lock on OS X, so they can’t use the threadpool effectively there – but that seems unlikely, too, if it also affects sync code and other functions.)

If you want to hear my best guess, it’s probably an actual perf difference in the OS or the file system. I guess trying to reproduce this with C code using the raw syscalls could prove or disprove that?

@tlhunter
Copy link

@bengl, if you spend the entire night debugging syscalls, which I suspect you will, you should take notes and turn it into a talk ;)

@LarsJK
Copy link

LarsJK commented Jul 18, 2018

Is filevault enabled?

@bengl
Copy link
Member Author

bengl commented Jul 18, 2018

@LarsJK AFAIK yes, but note also that my Linux system is using LUKS, which I'd imagine is pretty similar in terms of overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants