-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase the string chunk size to increase performance #9139
Conversation
This is a *tiny* commit code-wise, but the explanation is a bit longer. When I made string read in chunks, I picked a chunk size from bash's read, under the assumption that they had picked a good one. It turns out, on the (linux) systems I've tested, that's simply not true. My tests show that a bigger chunk size of up to 4096 is better *across the board*: - It's better with very large inputs - It's equal-to-slightly-better with small inputs - It's equal-to-slightly-better even if we quit early My test setup: 0. Create various fish builds with various sizes for STRING_CHUNK_SIZE, name them "fish-$CHUNKSIZE". 1. Download the npm package names from https://github.com/nice-registry/all-the-package-names/blob/master/names.json (I used commit 87451ea77562a0b1b32550124e3ab4a657bf166c, so it's 46.8MB) 2. Extract the names so we get a line-based version: ```fish jq '.[]' names.json | string trim -c '"' >/tmp/all ``` 3. Create various sizes of random extracts: ```fish for f in 10000 1000 500 50 shuf /tmp/all | head -n $f > /tmp/$f end ``` (the idea here is to defeat any form of pattern in the input). 4. Run benchmarks: hyperfine -w 3 ./fish-{128,512,1024,2048,4096}" -c 'for i in (seq 1000) string match -re foot < $f end; true'" (reduce the seq size for the larger files so you don't have to wait for hours - the idea here is to have some time running string and not just fish startup time) This shows results pretty much like ``` Summary './fish-2048 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' ran 1.01 ± 0.02 times faster than './fish-4096 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' 1.02 ± 0.03 times faster than './fish-1024 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' 1.08 ± 0.03 times faster than './fish-512 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' 1.47 ± 0.07 times faster than './fish-128 -c 'for i in (seq 1000) string match -re foot < /tmp/500 end; true'' ``` So we see that up to 1024 there's a difference, and after that the returns are marginal. So we stick with 1024 because of the memory trade-off. ---- Fun extra: Comparisons with `grep` (GNU grep 3.7) are *weird*. Because you both get ``` './fish-4096 -c 'for i in (seq 100); string match -re foot < /tmp/500; end; true'' ran 11.65 ± 0.23 times faster than 'fish -c 'for i in (seq 100); command grep foot /tmp/500; end'' ``` and ``` 'fish -c 'for i in (seq 2); command grep foot /tmp/all; end'' ran 66.34 ± 3.00 times faster than './fish-4096 -c 'for i in (seq 2); string match -re foot < /tmp/all; end; true'' 100.05 ± 4.31 times faster than './fish-128 -c 'for i in (seq 2); string match -re foot < /tmp/all; end; true'' ``` Basically, if you *can* give grep a lot of work at once (~40MB in this case), it'll churn through it like butter. But if you have to call it a lot, string beats it by virtue of cheating.
// Empirically determined. | ||
// This is probably down to some pipe buffer or some such, | ||
// but too small means we need to call `read(2)` and str2wcstring a lot. | ||
#define STRING_CHUNK_SIZE 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, LGTM.
It's probably important that the buffer doesn't exceed the pagesize (4096 here).
READ_CHUNK_SIZE is still at 128 but that's okay.
jq '.[]' names.json | string trim -c '"' >/tmp/all
can probably use jq --raw-output
instead of string trim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
READ_CHUNK_SIZE is still at 128 but that's okay.
Oh, sure, my idea was that read
isn't really useful on long inputs, because it only ever reads up to the next newline/NULL.
So you need one line that exceeds the 128 bytes to trigger it. That's not impossible, but it seems it's not too common.
can probably use jq --raw-output instead of string trim
To be honest I use jq once every blue moon or so, and I find the syntax hard to remember. I happened to have the '.[]'
bit stored in my history, so I tried it and noticed it had quotes so I trimmed them.
For those trying to follow along at home, and don't have GNU utilities, |
If anyone is curious what the absolute timings actually look like without trying it themselves, over the various input sizes, here's how it broke down for macOS/arm64. (I didn't adjust seq as suggested - it doesn't actually take long really. As I ran this a lot of times over a few hours, I'd sometimes see 2048 actually come out on top barely - but at least when I ran it just now and had it formatting for markdown, 1024 was usually fastest except for the smallest one and you can tell it's really close. So thumbs up to 1024. In the worst case 1024 bytes is ~1.17x faster than 128 bytes, and for big inputs almost 1.5x faster.
|
Oh I hadn't tried
|
I'm not getting 3x-4x, I'm getting 1.5x. (that 11x/60x was in comparison to grep - if you have to call it lots of times we beat it just because of the overhead of starting a new process. this isn't really relevant to the patch at hand - it changes little in that regard. but while I was measuring this anyway I thought it fun to slot in |
Huh, don't know where I got that 3-4 figure, weird, nevermind. |
Okay, since this appears to be faster on linux and macos, and is unlikely to be a lot slower elsewhere, let's just merge it. |
- The dermination is from commit 7988cff - See PR fish-shell#9139
This is a tiny commit code-wise, but the explanation is a bit
longer.
When I made string read in chunks, I picked a chunk size from bash's
read, under the assumption that they had picked a good one.
It turns out, on the (linux) systems I've tested, that's simply not true, for our use case.
My tests show that a bigger chunk size of up to 4096 is better across
the board:
My test setup:
STRING_CHUNK_SIZE, name them "fish-$CHUNKSIZE".
https://github.com/nice-registry/all-the-package-names/blob/master/names.json (I
used commit 87451ea77562a0b1b32550124e3ab4a657bf166c, so it's 46.8MB)
(the idea here is to defeat any form of pattern in the input).
(reduce the seq size for the larger files so you don't have to wait
for hours - the idea here is to have some time running string and not
just fish startup time)
This shows results pretty much like
So we see that up to 1024 there's a difference, and after that the
returns are marginal. So we stick with 1024 because of the memory trade-off.
Fun extra:
Comparisons with
grep
(GNU grep 3.7) are weird. Because you both getand
Basically, if you can give grep a lot of work at once (~40MB in this
case), it'll churn through it like butter. But if you have to call it
a lot, string beats it by virtue of cheating.
TODOs: