Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the performance of a 'typical' bat run #2545

Open
sharkdp opened this issue Apr 17, 2023 · 1 comment
Open

Improving the performance of a 'typical' bat run #2545

sharkdp opened this issue Apr 17, 2023 · 1 comment

Comments

@sharkdp
Copy link
Owner

sharkdp commented Apr 17, 2023

I found this cool performance analysis tool called magic-trace (https://github.com/janestreet/magic-trace) and used a trace recording of bat to try it out. I actually spotted a few things that could potentially be improved.

Recently, we made fantastic progress on reducing bats startup speed (see #951), mostly by lazy-loading assets thanks to @Enselic changes. Here, I'm not focusing on startup-speed per-se, but rather on a "typical" run of bat, i.e. input files which are not gigantic or weird in any sense. Let's use bats Cargo.toml as an example. I'm disabling the pager because I want to focus on bat. I'm deliberately not using --no-config to simulate a real world use case (my config only contains comments and --italic-text=always). I don't have any custom assets.

First, let's measure the full execution time using hyperfine:

▶ hyperfine --warmup=20 "bat --paging=never --force-colorization Cargo.toml"
Benchmark 1: bat --paging=never --force-colorization Cargo.toml
  Time (mean ± σ):       8.5 ms ±   1.8 ms    [User: 7.5 ms, System: 1.6 ms]
  Range (min … max):     4.7 ms …  14.6 ms    271 runs

That's pretty fast, but cat only takes 1.0 ms ± 0.5 ms, so there is some room for improvement. Let's record a trace. I'm running bat once beforehand to warm up the disk cache (in order to be comparable with the benchmark above):

▶ bat Cargo.toml > /dev/null; magic-trace run -full-execution bat -- --paging=never --force-colorization Cargo.toml

The resulting trace.fct is attached here: trace.zip

Now let's look at the results on https://magic-trace.org/. The full run (16ms with tracing) looks like this:

image

The first 2.3ms are occupied with low-level startup procedures that we can't influence(?):

image

Next, we have some fast initialization stuff, config file loading, parsing and command-line option handling. Alltogether, this part only takes 600µs, so probably not much room for improvement:

image

Up next, we see the first surprise. Calling App::config ("get the final Config object based on config, command-line options, etc") takes 3.5 ms:

image

Zooming in closer:

image

Most of this time is being spent compiling regexes (for glob patterns) for our syntax mappings (not syntax highlighting!), i.e. things like this:

image

This is certainly something that could be optimized. Either by somehow pre-compiling syntax mappings (although this would only work for the builtin mappings). Or by parallelization. Or by using a faster glob matcher(?).

Next up, we query the Git status (pretty cool that we can see libgit2 internals here), which takes another 2 ms:

image

There's probably nothing to optimize here, but potentially we could already start other things in parallel! Like what comes next: loading and deserializing the theme and syntax sets, which also takes 2.5 ms:

image

Finally, we can start printing something to the terminal:

image

Notice how the first few calls to print_line take a bit longer as some regexes are being compiled lazily (I suppose):

image

image

There's also this weird part here, which I don't understand:

image

To summarize:

  • magic-trace is cool
  • Our SyntaxMapping initialization is much slower than it could be
  • I think we can optimize a few things in bats startup procedure by parallelizing individual tasks (e.g. querying Git information and loading assets)
@eth-p
Copy link
Collaborator

eth-p commented Apr 18, 2023

Does magic-trace cover how much time is spent in syscalls and the allocator?

I have some ideas on how to improve performance in the interactive printer, but I'm not sure if the gains warrant the effort (i.e. not less than 1%) required to refactor parts of the code at this point in time:

  1. Re-using allocated String objects that were being used as buffers, rather than re-allocating a string for each line.
  2. Writing line contents into a String before calling write!(handle, ...) to reduce the number of write(2) syscalls to one per printed line.

In either case—albeit not a "typical" bat run—one area I would like to improve is that we're re-emitting ANSI SGR sequences when we don't need to be. To quote my findings from my recent PR:

The syntax highlighting style is being emitted for every single text chunk (rather than highlight region).

Essentially, for a string like Text^[33mMore Text^[1mEven More Text what we do is:

  1. For Text
    • Emit highlight style: ^[38;2;100;100;100m
    • Emit text: Text
  2. For ^[33m
    • Passthrough detected ANSI sequence: ^[33m
  3. For More Text
    • Emit highlight style: ^[38;2;100;100;100m
    • Re-emit detected ANSI attributes as sequence: ^[33m
    • Emit text: More Text
  4. For ^[1m
    • Passthrough detected ANSI sequence: ^[1m
  5. For Even More Text
    • Emit highlight style: ^[38;2;100;100;100m
    • Re-emit detected ANSI attributes as sequence: ^[33m^[1m
    • Emit text: Even More Text

This gives us: ^[38;2;100;100;100mText^[33m^[38;2;100;100;100m^[33mMore Text^[1m^[38;2;100;100;100m^[33m^[1mEven More Text
Which could be simplified to: ^[38;2;100;100;100mText^[33mMore Text^[1mEven More Text

For the plain text syntax we don't actually have a style to emit (i.e. no ^[38;2;100;100;100m everywhere), but it still acts as though there is one and double-prints the passed through styles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants