Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce write()s for builtins #9229

Merged
merged 4 commits into from
Sep 22, 2022
Merged

Conversation

faho
Copy link
Member

@faho faho commented Sep 21, 2022

Description

Our io_streams_t is, when it comes to pipes (including the terminal), an extremely thin wrapper over write(2).

Whenever you do streams.out.append() or streams.out.push_back() that corresponds to one write(2). This means doing something like

for (wchar_t c : somestring) {
    streams.out.push_back(c);
}

is extremely slow. The pattern we see a lot is

streams.out.append(somestring);
streams.out.push_back(L'\n');

So, this PR tries to reduce how often we do that, sometimes by constructing strings on-the-fly like

streams.out.append(somestring + L"\n");

Or doing some buffering.

I'd like to call out one egregious case, which is printf. This one would out.push_back for every character escape, so

printf (string repeat -n 200 \\x7f)%s\n (string repeat -n 2000 aaa\n)

would issue 400000 (four hundred thousand) write(2) calls. Now it has been reduced to 2000 (one per argument excluding the format string - I experimented with doing one big write but that turned out to be slower in some cases and would of course have worse latency and memory requirements). This used to take ~400ms, now it takes 16ms.

The rest of the wins are more modest, ranging from ~1.1x to 1.7x.

TODOs:

  • [N/A] Changes to fish usage are reflected in user documentation/manpages.
  • [N/A] Tests have been added for regressions fixed
  • User-visible changes noted in CHANGELOG.rst

@faho faho added the performance Purely performance-related enhancement without any changes in black box output label Sep 21, 2022
@faho faho added this to the fish-future milestone Sep 21, 2022
This at least halves the number of "write()" calls we do if it goes to
a pipe or the terminal, or reduces them by 75% if there is a
description.

This makes

```fish
complete -c foo -xa "(seq 50000)"
complete -C"foo "
```

faster by 1.33x.
This writes the output once per argument instead of once per format or
escaped char.

An egregious case:

```fish
printf (string repeat -n 200 \\x7f)%s\n (string repeat -n 2000 aaa\n)
```

Has been sped up by ~20x by reducing write() calls from 40000 to 200.

Even a simple

```fish
printf %s\n (string repeat -n 2000 aaa\n)
```

should now be ~1.2x faster by issuing 2000 instead of 4000 write
calls (the `\n` was written separately!).
This basically immediately issues a "write()" if it's to a pipe or the
terminal.

That means we can reduce syscalls and improve performance, even by
doing something like

```c++
streams.out.append(somewcstring + L"\n");
```

instead of

```c++
streams.out.append(somewcstring);
streams.out.push_back(L'\n');
```

Some benchmarks of the

```fish
for i in (string repeat -n 2000 \n)
    $thing
end
```

variety:

1. `set` (printing variables) sped up 1.75x
2. `builtin -n` 1.60x
3. `jobs` 1.25x (with 3 jobs)
4. `functions` 1.20x
5. `math 1 + 1` 1.1x
6. `pwd` 1.1x

Piping yields similar results, there is no real difference when
outputting to a command substitution.
The impact here depends on the command and how much output it
produces.

It's possible to get up to 1.5x - `string upper` being a good example,
or a no-op `string match '*'`.

But the more the command actually needs to do, the less of an effect
this has.
@faho faho force-pushed the builtin-reduce-writes branch from 39529e0 to f98e2f9 Compare September 21, 2022 16:38
@faho faho changed the title Builtin reduce writes Reduce write()s for builtins Sep 21, 2022
@mqudsi
Copy link
Contributor

mqudsi commented Sep 22, 2022

These changes look good and the wins are appreciable. I had a concern about excessive buffering but we already eschew builtins for cases where fish's internal pipeline buffering causes burdensome delays (such as using sed instead of string ... in some completions) so I don't think there's any risk of any long-running processes piping to/from builtins effectively becoming less usable if we increase buffering (and I think the additional buffering won't even matter in the cases where all output is buffered in one go, anyway).

@faho
Copy link
Member Author

faho commented Sep 22, 2022

had a concern about excessive buffering but we already eschew builtins

For the most part, this is about "buffering" a line. That's hardly "excessive".

This is about instead of this:

write("completion");
write("\t");
write("description");
write("\n");

doing this:

string line = "completion" + "\t" + "description" + "\n";
write(line);

Additionally, this is about what happens when the builtin writes into a pipe - that means either to an external process or the terminal.

These will be changed:

string match "*" | command grep

printf %s\n foo bar baz

string repeat -n 5000 aaa

# Only the `string length` writes to an fd directly
string repeat -n 5000 aaa | string length

math 5 + 5 >/tmp/ten

These cases won't be:

string repeat -n 5000 aaa | while read -l foo; true; end

set -l fivethousandAs (string repeat -n 5000 A)

@ridiculousfish
Copy link
Member

This looks fine to me, but it does impose some awkwardness on the builtins.

A more pleasant-to-use but complex alternative would be to support buffering directly in output_stream_t, example:

 {
      autobuffer_t buffer = streams.out.autobuffer();
      buffer.append(name);
      buffer.append('\n');
 }

and in buffer's destructor, it flushes its data to its origin stream, in one go.

Either that approach or what you've done here is fine with me, just wanted to toss that out as an alternative.

@faho
Copy link
Member Author

faho commented Sep 22, 2022

A more pleasant-to-use but complex alternative would be to support buffering directly in output_stream_t, example:

I have thought about that, but it's not really any less effort if you want that line-buffering. Basically you would replace the streams.out.append(line) with buffer.flush() anyway.

I don't really want to buffer by length, because I'm imagining the latency characteristics might be weird? What if we have 490 bytes ready, we flush every 500 bytes and the next 10 bytes take a second because this is a super tricky regex match?

What would be possible would be a "append_line" that appends the \n itself, for simple cases. But even that's barely clearer than append(str + "\n")

@faho faho merged commit e69be38 into fish-shell:master Sep 22, 2022
@faho faho modified the milestones: fish-future, fish 3.6.0 Sep 22, 2022
@mqudsi
Copy link
Contributor

mqudsi commented Sep 22, 2022

Regarding a possible append_line(): it is/was my intention to add writev() support (where available) and expose that via a variadic append(...) and append_line() functions.

@faho faho deleted the builtin-reduce-writes branch September 23, 2022 05:10
faho added a commit that referenced this pull request Nov 7, 2022
This reverts commit 3739c53.

It misses the point of e69be38 and reintroduces a lot of write calls.

See #9229
faho referenced this pull request Nov 7, 2022
prefer
    streams.out.append(foo)
    streams.out.push_back(L'\n')

vs e.g.
    foo.append(L"\n");
    streams.out.append(foo)
henrikhorluck added a commit to henrikhorluck/fish-shell that referenced this pull request Aug 13, 2023
This is an alternative to the very common pattern of

```rust
streams.err.append(output);
streams.err.append1('\n');
```

Which has negative performance implications, see fish-shell#9229

It takes `Into<WString>` to hopefully avoid allocating anew when the argument is
a WString with leftover capacity
faho pushed a commit that referenced this pull request Aug 16, 2023
This is an alternative to the very common pattern of

```rust
streams.err.append(output);
streams.err.append1('\n');
```

Which has negative performance implications, see #9229

It takes `Into<WString>` to hopefully avoid allocating anew when the argument is
a WString with leftover capacity
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance Purely performance-related enhancement without any changes in black box output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants