Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the string chunk size to increase performance #9139

Merged
merged 1 commit into from
Aug 15, 2022

Conversation

faho
Copy link
Member

@faho faho commented Aug 12, 2022

This is a tiny commit code-wise, but the explanation is a bit
longer.

When I made string read in chunks, I picked a chunk size from bash's
read, under the assumption that they had picked a good one.

It turns out, on the (linux) systems I've tested, that's simply not true, for our use case.

My tests show that a bigger chunk size of up to 4096 is better across
the board
:

  • It's better with very large inputs
  • It's equal-to-slightly-better with small inputs
  • It's equal-to-slightly-better even if we quit early

My test setup:

  1. Create various fish builds with various sizes for
    STRING_CHUNK_SIZE, name them "fish-$CHUNKSIZE".
  2. Download the npm package names from
    https://github.com/nice-registry/all-the-package-names/blob/master/names.json (I
    used commit 87451ea77562a0b1b32550124e3ab4a657bf166c, so it's 46.8MB)
  3. Extract the names so we get a line-based version:
jq '.[]' names.json | string trim -c '"' >/tmp/all
  1. Create various sizes of random extracts:
for f in 10000 1000 500 50
    shuf /tmp/all | head -n $f > /tmp/$f
end

(the idea here is to defeat any form of pattern in the input).

  1. Run benchmarks:
hyperfine -w 3 ./fish-{128,512,1024,2048,4096}"
    -c 'for i in (seq 1000)
            string match -re foot < $f
        end; true'"

(reduce the seq size for the larger files so you don't have to wait
for hours - the idea here is to have some time running string and not
just fish startup time)

This shows results pretty much like

Summary
'./fish-2048     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true'' ran
  1.01 ± 0.02 times faster than './fish-4096     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true''
  1.02 ± 0.03 times faster than './fish-1024     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true''
  1.08 ± 0.03 times faster than './fish-512     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true''
  1.47 ± 0.07 times faster than './fish-128     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true''

So we see that up to 1024 there's a difference, and after that the
returns are marginal. So we stick with 1024 because of the memory trade-off.


Fun extra:

Comparisons with grep (GNU grep 3.7) are weird. Because you both get

'./fish-4096 -c 'for i in (seq 100); string match -re foot < /tmp/500; end; true'' ran
11.65 ± 0.23 times faster than 'fish -c 'for i in (seq 100); command grep foot /tmp/500; end''

and

'fish -c 'for i in (seq 2); command grep foot /tmp/all; end'' ran
66.34 ± 3.00 times faster than './fish-4096 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''
100.05 ± 4.31 times faster than './fish-128 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''

Basically, if you can give grep a lot of work at once (~40MB in this
case), it'll churn through it like butter. But if you have to call it
a lot, string beats it by virtue of cheating.

TODOs:

  • [N/A] Changes to fish usage are reflected in user documentation/manpages.
  • [N/A] Tests have been added for regressions fixed
  • User-visible changes noted in CHANGELOG.rst

This is a *tiny* commit code-wise, but the explanation is a bit
longer.

When I made string read in chunks, I picked a chunk size from bash's
read, under the assumption that they had picked a good one.

It turns out, on the (linux) systems I've tested, that's simply not
true.

My tests show that a bigger chunk size of up to 4096 is better *across
the board*:

- It's better with very large inputs
- It's equal-to-slightly-better with small inputs
- It's equal-to-slightly-better even if we quit early

My test setup:

0. Create various fish builds with various sizes for
STRING_CHUNK_SIZE, name them "fish-$CHUNKSIZE".
1. Download the npm package names from
https://github.com/nice-registry/all-the-package-names/blob/master/names.json (I
used commit 87451ea77562a0b1b32550124e3ab4a657bf166c, so it's 46.8MB)
2. Extract the names so we get a line-based version:

```fish
jq '.[]' names.json | string trim -c '"' >/tmp/all
```

3. Create various sizes of random extracts:

```fish
for f in 10000 1000 500 50
    shuf /tmp/all | head -n $f > /tmp/$f
end
```

(the idea here is to defeat any form of pattern in the input).

4. Run benchmarks:

hyperfine -w 3 ./fish-{128,512,1024,2048,4096}"
    -c 'for i in (seq 1000)
            string match -re foot < $f
        end; true'"

(reduce the seq size for the larger files so you don't have to wait
for hours - the idea here is to have some time running string and not
just fish startup time)

This shows results pretty much like

```
Summary
'./fish-2048     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true'' ran
  1.01 ± 0.02 times faster than './fish-4096     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true''
  1.02 ± 0.03 times faster than './fish-1024     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true''
  1.08 ± 0.03 times faster than './fish-512     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true''
  1.47 ± 0.07 times faster than './fish-128     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true''
```

So we see that up to 1024 there's a difference, and after that the
returns are marginal. So we stick with 1024 because of the memory
trade-off.

----

Fun extra:

Comparisons with `grep` (GNU grep 3.7) are *weird*. Because you both
get

```
'./fish-4096 -c 'for i in (seq 100); string match -re foot < /tmp/500; end; true'' ran
11.65 ± 0.23 times faster than 'fish -c 'for i in (seq 100); command grep foot /tmp/500; end''
```

and

```
'fish -c 'for i in (seq 2); command grep foot /tmp/all; end'' ran
66.34 ± 3.00 times faster than './fish-4096 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''
100.05 ± 4.31 times faster than './fish-128 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''
```

Basically, if you *can* give grep a lot of work at once (~40MB in this
case), it'll churn through it like butter. But if you have to call it
a lot, string beats it by virtue of cheating.
@faho faho added enhancement performance Purely performance-related enhancement without any changes in black box output labels Aug 12, 2022
@faho faho added this to the fish 3.6.0 milestone Aug 12, 2022
// Empirically determined.
// This is probably down to some pipe buffer or some such,
// but too small means we need to call `read(2)` and str2wcstring a lot.
#define STRING_CHUNK_SIZE 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, LGTM.
It's probably important that the buffer doesn't exceed the pagesize (4096 here).

READ_CHUNK_SIZE is still at 128 but that's okay.

jq '.[]' names.json | string trim -c '"' >/tmp/all

can probably use jq --raw-output instead of string trim

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

READ_CHUNK_SIZE is still at 128 but that's okay.

Oh, sure, my idea was that read isn't really useful on long inputs, because it only ever reads up to the next newline/NULL.

So you need one line that exceeds the 128 bytes to trigger it. That's not impossible, but it seems it's not too common.

can probably use jq --raw-output instead of string trim

To be honest I use jq once every blue moon or so, and I find the syntax hard to remember. I happened to have the '.[]' bit stored in my history, so I tried it and noticed it had quotes so I trimmed them.

@floam
Copy link
Member

floam commented Aug 13, 2022

For those trying to follow along at home, and don't have GNU utilities, sort -R, at least the one in my current version of macOS should do the same thing as shuf in his example.

@floam
Copy link
Member

floam commented Aug 13, 2022

If anyone is curious what the absolute timings actually look like without trying it themselves, over the various input sizes, here's how it broke down for macOS/arm64. (I didn't adjust seq as suggested - it doesn't actually take long really.

As I ran this a lot of times over a few hours, I'd sometimes see 2048 actually come out on top barely - but at least when I ran it just now and had it formatting for markdown, 1024 was usually fastest except for the smallest one and you can tell it's really close. So thumbs up to 1024. In the worst case 1024 bytes is ~1.17x faster than 128 bytes, and for big inputs almost 1.5x faster.

Command (50 lines) Mean [ms] Min [ms] Max [ms] Relative
./fish-128 -c 'for i in (seq 1000); string match -re foot < 50; end' 27.6 ± 0.8 26.6 30.4 1.18 ± 0.04
./fish-256 -c 'for i in (seq 1000); string match -re foot < 50; end' 25.7 ± 0.6 24.9 28.0 1.10 ± 0.03
./fish-512 -c 'for i in (seq 1000); string match -re foot < 50; end' 24.8 ± 0.8 23.7 28.7 1.06 ± 0.04
./fish-1024 -c 'for i in (seq 1000); string match -re foot < 50; end' 23.7 ± 0.5 23.1 25.7 1.01 ± 0.03
./fish-2048 -c 'for i in (seq 1000); string match -re foot < 50; end' 23.6 ± 0.8 22.7 26.9 1.01 ± 0.04
./fish-4096 -c 'for i in (seq 1000); string match -re foot < 50; end' 23.4 ± 0.4 22.8 24.5 1.00
Command (100 lines) Mean [ms] Min [ms] Max [ms] Relative
./fish-128 -c 'for i in (seq 1000); string match -re foot < 100; end' 37.9 ± 0.5 36.9 39.4 1.26 ± 0.03
./fish-256 -c 'for i in (seq 1000); string match -re foot < 100; end' 33.7 ± 0.8 32.5 38.6 1.12 ± 0.04
./fish-512 -c 'for i in (seq 1000); string match -re foot < 100; end' 31.0 ± 0.7 30.0 34.2 1.03 ± 0.03
./fish-1024 -c 'for i in (seq 1000); string match -re foot < 100; end' 30.1 ± 0.6 29.2 33.0 1.00
./fish-2048 -c 'for i in (seq 1000); string match -re foot < 100; end' 30.3 ± 0.5 29.3 32.0 1.01 ± 0.03
./fish-4096 -c 'for i in (seq 1000); string match -re foot < 100; end' 30.6 ± 0.7 29.4 33.1 1.02 ± 0.03
Command (500 lines) Mean [ms] Min [ms] Max [ms] Relative
./fish-128 -c 'for i in (seq 1000); string match -re foot < 500; end' 115.6 ± 0.6 114.8 117.0 1.39 ± 0.01
./fish-256 -c 'for i in (seq 1000); string match -re foot < 500; end' 96.4 ± 0.9 94.9 99.7 1.16 ± 0.01
./fish-512 -c 'for i in (seq 1000); string match -re foot < 500; end' 86.6 ± 1.1 85.4 91.7 1.04 ± 0.02
./fish-1024 -c 'for i in (seq 1000); string match -re foot < 500; end' 83.0 ± 0.6 81.7 84.5 1.00
./fish-2048 -c 'for i in (seq 1000); string match -re foot < 500; end' 83.6 ± 0.8 81.7 85.7 1.01 ± 0.01
./fish-4096 -c 'for i in (seq 1000); string match -re foot < 500; end' 89.4 ± 1.0 87.8 92.5 1.08 ± 0.01
Command (1000 lines) Mean [ms] Min [ms] Max [ms] Relative
./fish-128 -c 'for i in (seq 1000); string match -re foot < 1000; end' 221.6 ± 1.2 219.9 224.4 1.45 ± 0.02
./fish-256 -c 'for i in (seq 1000); string match -re foot < 1000; end' 180.3 ± 1.5 178.8 184.6 1.18 ± 0.01
./fish-512 -c 'for i in (seq 1000); string match -re foot < 1000; end' 160.3 ± 1.0 158.3 161.9 1.05 ± 0.01
./fish-1024 -c 'for i in (seq 1000); string match -re foot < 1000; end' 153.1 ± 1.4 151.2 156.4 1.00
./fish-2048 -c 'for i in (seq 1000); string match -re foot < 1000; end' 154.0 ± 1.6 150.3 156.7 1.01 ± 0.01
./fish-4096 -c 'for i in (seq 1000); string match -re foot < 1000; end' 164.1 ± 1.2 162.3 167.2 1.07 ± 0.01
Command (10000 lines) Mean [s] Min [s] Max [s] Relative
./fish-128 -c 'for i in (seq 1000); string match -re foot < 10000; end' 2.079 ± 0.013 2.051 2.094 1.47 ± 0.02
./fish-256 -c 'for i in (seq 1000); string match -re foot < 10000; end' 1.682 ± 0.021 1.660 1.735 1.19 ± 0.02
./fish-512 -c 'for i in (seq 1000); string match -re foot < 10000; end' 1.483 ± 0.010 1.468 1.497 1.05 ± 0.01
./fish-1024 -c 'for i in (seq 1000); string match -re foot < 10000; end' 1.416 ± 0.013 1.395 1.440 1.00
./fish-2048 -c 'for i in (seq 1000); string match -re foot < 10000; end' 1.418 ± 0.018 1.400 1.460 1.00 ± 0.02
./fish-4096 -c 'for i in (seq 1000); string match -re foot < 10000; end' 1.527 ± 0.008 1.518 1.542 1.08 ± 0.01

@floam
Copy link
Member

floam commented Aug 13, 2022

it doesn't actually take long really.

Oh I hadn't tried all. Yeah that takes about a second, so we'd be talking hours to do that 1000 times for each binary. Try as I may though, I can't achieve the huge 3x-4x speedups you're getting, seems really to top out around 1.5x still, but we're on different platforms with different hardware.

Command (all 2041543 lines) Mean [s] Min [s] Max [s] Relative
./fish-128 -c 'for i in (seq 10); string match -re foot < all; end' 4.087 ± 0.024 4.059 4.131 1.49 ± 0.01
./fish-256 -c 'for i in (seq 10); string match -re foot < all; end' 3.291 ± 0.019 3.258 3.322 1.20 ± 0.01
./fish-512 -c 'for i in (seq 10); string match -re foot < all; end' 2.872 ± 0.019 2.846 2.904 1.05 ± 0.01
./fish-1024 -c 'for i in (seq 10); string match -re foot < all; end' 2.743 ± 0.019 2.721 2.778 1.00
./fish-2048 -c 'for i in (seq 10); string match -re foot < all; end' 2.774 ± 0.008 2.761 2.790 1.01 ± 0.01
./fish-4096 -c 'for i in (seq 10); string match -re foot < all; end' 3.007 ± 0.022 2.968 3.040 1.10 ± 0.01

@faho
Copy link
Member Author

faho commented Aug 14, 2022

I can't achieve the huge 3x-4x speedups you're getting, seems really to top out around 1.5x still

I'm not getting 3x-4x, I'm getting 1.5x.

(that 11x/60x was in comparison to grep - if you have to call it lots of times we beat it just because of the overhead of starting a new process. this isn't really relevant to the patch at hand - it changes little in that regard. but while I was measuring this anyway I thought it fun to slot in grep and see how that works out)

@floam
Copy link
Member

floam commented Aug 14, 2022

Huh, don't know where I got that 3-4 figure, weird, nevermind.

@faho faho merged commit 7988cff into fish-shell:master Aug 15, 2022
@faho
Copy link
Member Author

faho commented Aug 15, 2022

Okay, since this appears to be faster on linux and macos, and is unlikely to be a lot slower elsewhere, let's just merge it.

@faho faho deleted the string-bigger-chunks branch September 11, 2022 08:00
henrikhorluck added a commit to henrikhorluck/fish-shell that referenced this pull request Jul 27, 2023
- The dermination is from commit 7988cff
- See PR fish-shell#9139
ridiculousfish pushed a commit that referenced this pull request Jul 28, 2023
- The dermination is from commit 7988cff
- See PR #9139
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement performance Purely performance-related enhancement without any changes in black box output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants