You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found this cool performance analysis tool called magic-trace (https://github.com/janestreet/magic-trace) and used a trace recording of bat to try it out. I actually spotted a few things that could potentially be improved.
Recently, we made fantastic progress on reducing bats startup speed (see #951), mostly by lazy-loading assets thanks to @Enselic changes. Here, I'm not focusing on startup-speed per-se, but rather on a "typical" run of bat, i.e. input files which are not gigantic or weird in any sense. Let's use bats Cargo.toml as an example. I'm disabling the pager because I want to focus on bat. I'm deliberately not using --no-config to simulate a real world use case (my config only contains comments and --italic-text=always). I don't have any custom assets.
First, let's measure the full execution time using hyperfine:
▶ hyperfine --warmup=20 "bat --paging=never --force-colorization Cargo.toml"
Benchmark 1: bat --paging=never --force-colorization Cargo.toml
Time (mean ± σ): 8.5 ms ± 1.8 ms [User: 7.5 ms, System: 1.6 ms]
Range (min … max): 4.7 ms … 14.6 ms 271 runs
That's pretty fast, but cat only takes 1.0 ms ± 0.5 ms, so there is some room for improvement. Let's record a trace. I'm running bat once beforehand to warm up the disk cache (in order to be comparable with the benchmark above):
▶ bat Cargo.toml > /dev/null; magic-trace run -full-execution bat -- --paging=never --force-colorization Cargo.toml
The resulting trace.fct is attached here: trace.zip
Now let's look at the results on https://magic-trace.org/. The full run (16ms with tracing) looks like this:
The first 2.3ms are occupied with low-level startup procedures that we can't influence(?):
Next, we have some fast initialization stuff, config file loading, parsing and command-line option handling. Alltogether, this part only takes 600µs, so probably not much room for improvement:
Up next, we see the first surprise. Calling App::config ("get the final Config object based on config, command-line options, etc") takes 3.5 ms:
Zooming in closer:
Most of this time is being spent compiling regexes (for glob patterns) for our syntax mappings (not syntax highlighting!), i.e. things like this:
This is certainly something that could be optimized. Either by somehow pre-compiling syntax mappings (although this would only work for the builtin mappings). Or by parallelization. Or by using a faster glob matcher(?).
Next up, we query the Git status (pretty cool that we can see libgit2 internals here), which takes another 2 ms:
There's probably nothing to optimize here, but potentially we could already start other things in parallel! Like what comes next: loading and deserializing the theme and syntax sets, which also takes 2.5 ms:
Finally, we can start printing something to the terminal:
Notice how the first few calls to print_line take a bit longer as some regexes are being compiled lazily (I suppose):
There's also this weird part here, which I don't understand:
To summarize:
magic-trace is cool
Our SyntaxMapping initialization is much slower than it could be
I think we can optimize a few things in bats startup procedure by parallelizing individual tasks (e.g. querying Git information and loading assets)
The text was updated successfully, but these errors were encountered:
Does magic-trace cover how much time is spent in syscalls and the allocator?
I have some ideas on how to improve performance in the interactive printer, but I'm not sure if the gains warrant the effort (i.e. not less than 1%) required to refactor parts of the code at this point in time:
Re-using allocated String objects that were being used as buffers, rather than re-allocating a string for each line.
Writing line contents into a String before calling write!(handle, ...) to reduce the number of write(2) syscalls to one per printed line.
In either case—albeit not a "typical" bat run—one area I would like to improve is that we're re-emitting ANSI SGR sequences when we don't need to be. To quote my findings from my recent PR:
The syntax highlighting style is being emitted for every single text chunk (rather than highlight region).
Essentially, for a string like Text^[33mMore Text^[1mEven More Text what we do is:
For Text
Emit highlight style: ^[38;2;100;100;100m
Emit text: Text
For ^[33m
Passthrough detected ANSI sequence: ^[33m
For More Text
Emit highlight style: ^[38;2;100;100;100m
Re-emit detected ANSI attributes as sequence: ^[33m
Emit text: More Text
For ^[1m
Passthrough detected ANSI sequence: ^[1m
For Even More Text
Emit highlight style: ^[38;2;100;100;100m
Re-emit detected ANSI attributes as sequence: ^[33m^[1m
Emit text: Even More Text
This gives us: ^[38;2;100;100;100mText^[33m^[38;2;100;100;100m^[33mMore Text^[1m^[38;2;100;100;100m^[33m^[1mEven More Text
Which could be simplified to: ^[38;2;100;100;100mText^[33mMore Text^[1mEven More Text
For the plain text syntax we don't actually have a style to emit (i.e. no ^[38;2;100;100;100m everywhere), but it still acts as though there is one and double-prints the passed through styles.
I found this cool performance analysis tool called
magic-trace
(https://github.com/janestreet/magic-trace) and used a trace recording ofbat
to try it out. I actually spotted a few things that could potentially be improved.Recently, we made fantastic progress on reducing
bat
s startup speed (see #951), mostly by lazy-loading assets thanks to @Enselic changes. Here, I'm not focusing on startup-speed per-se, but rather on a "typical" run ofbat
, i.e. input files which are not gigantic or weird in any sense. Let's usebat
sCargo.toml
as an example. I'm disabling the pager because I want to focus onbat
. I'm deliberately not using--no-config
to simulate a real world use case (my config only contains comments and--italic-text=always
). I don't have any custom assets.First, let's measure the full execution time using hyperfine:
That's pretty fast, but
cat
only takes 1.0 ms ± 0.5 ms, so there is some room for improvement. Let's record a trace. I'm runningbat
once beforehand to warm up the disk cache (in order to be comparable with the benchmark above):The resulting
trace.fct
is attached here: trace.zipNow let's look at the results on https://magic-trace.org/. The full run (16ms with tracing) looks like this:
The first 2.3ms are occupied with low-level startup procedures that we can't influence(?):
Next, we have some fast initialization stuff, config file loading, parsing and command-line option handling. Alltogether, this part only takes 600µs, so probably not much room for improvement:
Up next, we see the first surprise. Calling
App::config
("get the finalConfig
object based on config, command-line options, etc") takes 3.5 ms:Zooming in closer:
Most of this time is being spent compiling regexes (for glob patterns) for our syntax mappings (not syntax highlighting!), i.e. things like this:
This is certainly something that could be optimized. Either by somehow pre-compiling syntax mappings (although this would only work for the builtin mappings). Or by parallelization. Or by using a faster glob matcher(?).
Next up, we query the Git status (pretty cool that we can see
libgit2
internals here), which takes another 2 ms:There's probably nothing to optimize here, but potentially we could already start other things in parallel! Like what comes next: loading and deserializing the theme and syntax sets, which also takes 2.5 ms:
Finally, we can start printing something to the terminal:
Notice how the first few calls to
print_line
take a bit longer as some regexes are being compiled lazily (I suppose):There's also this weird part here, which I don't understand:
To summarize:
SyntaxMapping
initialization is much slower than it could bebat
s startup procedure by parallelizing individual tasks (e.g. querying Git information and loading assets)The text was updated successfully, but these errors were encountered: