Improving the performance of a 'typical' bat run #2545

sharkdp · 2023-04-17T16:05:58Z

I found this cool performance analysis tool called magic-trace (https://github.com/janestreet/magic-trace) and used a trace recording of bat to try it out. I actually spotted a few things that could potentially be improved.

Recently, we made fantastic progress on reducing bats startup speed (see #951), mostly by lazy-loading assets thanks to @Enselic changes. Here, I'm not focusing on startup-speed per-se, but rather on a "typical" run of bat, i.e. input files which are not gigantic or weird in any sense. Let's use bats Cargo.toml as an example. I'm disabling the pager because I want to focus on bat. I'm deliberately not using --no-config to simulate a real world use case (my config only contains comments and --italic-text=always). I don't have any custom assets.

First, let's measure the full execution time using hyperfine:

▶ hyperfine --warmup=20 "bat --paging=never --force-colorization Cargo.toml"
Benchmark 1: bat --paging=never --force-colorization Cargo.toml
  Time (mean ± σ):       8.5 ms ±   1.8 ms    [User: 7.5 ms, System: 1.6 ms]
  Range (min … max):     4.7 ms …  14.6 ms    271 runs

That's pretty fast, but cat only takes 1.0 ms ± 0.5 ms, so there is some room for improvement. Let's record a trace. I'm running bat once beforehand to warm up the disk cache (in order to be comparable with the benchmark above):

▶ bat Cargo.toml > /dev/null; magic-trace run -full-execution bat -- --paging=never --force-colorization Cargo.toml

The resulting trace.fct is attached here: trace.zip

Now let's look at the results on https://magic-trace.org/. The full run (16ms with tracing) looks like this:

The first 2.3ms are occupied with low-level startup procedures that we can't influence(?):

Next, we have some fast initialization stuff, config file loading, parsing and command-line option handling. Alltogether, this part only takes 600µs, so probably not much room for improvement:

Up next, we see the first surprise. Calling App::config ("get the final Config object based on config, command-line options, etc") takes 3.5 ms:

Zooming in closer:

Most of this time is being spent compiling regexes (for glob patterns) for our syntax mappings (not syntax highlighting!), i.e. things like this:

This is certainly something that could be optimized. Either by somehow pre-compiling syntax mappings (although this would only work for the builtin mappings). Or by parallelization. Or by using a faster glob matcher(?).

Next up, we query the Git status (pretty cool that we can see libgit2 internals here), which takes another 2 ms:

There's probably nothing to optimize here, but potentially we could already start other things in parallel! Like what comes next: loading and deserializing the theme and syntax sets, which also takes 2.5 ms:

Finally, we can start printing something to the terminal:

Notice how the first few calls to print_line take a bit longer as some regexes are being compiled lazily (I suppose):

There's also this weird part here, which I don't understand:

To summarize:

magic-trace is cool
Our SyntaxMapping initialization is much slower than it could be
I think we can optimize a few things in bats startup procedure by parallelizing individual tasks (e.g. querying Git information and loading assets)

The text was updated successfully, but these errors were encountered:

eth-p · 2023-04-18T21:41:23Z

Does magic-trace cover how much time is spent in syscalls and the allocator?

I have some ideas on how to improve performance in the interactive printer, but I'm not sure if the gains warrant the effort (i.e. not less than 1%) required to refactor parts of the code at this point in time:

Re-using allocated String objects that were being used as buffers, rather than re-allocating a string for each line.
Writing line contents into a String before calling write!(handle, ...) to reduce the number of write(2) syscalls to one per printed line.

In either case—albeit not a "typical" bat run—one area I would like to improve is that we're re-emitting ANSI SGR sequences when we don't need to be. To quote my findings from my recent PR:

The syntax highlighting style is being emitted for every single text chunk (rather than highlight region).

Essentially, for a string like Text^[33mMore Text^[1mEven More Text what we do is:

For Text

Emit highlight style: ^[38;2;100;100;100m

Emit text: Text

For ^[33m

Passthrough detected ANSI sequence: ^[33m

For More Text

Emit highlight style: ^[38;2;100;100;100m

Re-emit detected ANSI attributes as sequence: ^[33m

Emit text: More Text

For ^[1m

Passthrough detected ANSI sequence: ^[1m

For Even More Text

Emit highlight style: ^[38;2;100;100;100m

Re-emit detected ANSI attributes as sequence: ^[33m^[1m

Emit text: Even More Text

This gives us: ^[38;2;100;100;100mText^[33m^[38;2;100;100;100m^[33mMore Text^[1m^[38;2;100;100;100m^[33m^[1mEven More Text
Which could be simplified to: ^[38;2;100;100;100mText^[33mMore Text^[1mEven More Text

For the plain text syntax we don't actually have a style to emit (i.e. no ^[38;2;100;100;100m everywhere), but it still acts as though there is one and double-prints the passed through styles.

sharkdp added the performance label Apr 17, 2023

nisbet-hubbard mentioned this issue Jul 17, 2023

Improving the performance of search #2623

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving the performance of a 'typical' bat run #2545

Improving the performance of a 'typical' bat run #2545

sharkdp commented Apr 17, 2023

eth-p commented Apr 18, 2023 •

edited

Loading

Improving the performance of a 'typical' bat run #2545

Improving the performance of a 'typical' bat run #2545

Comments

sharkdp commented Apr 17, 2023

eth-p commented Apr 18, 2023 • edited Loading

eth-p commented Apr 18, 2023 •

edited

Loading