Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce startup time in loop-through mode by 80%-90% #1747

Merged
merged 2 commits into from Jul 29, 2021

Conversation

Enselic
Copy link
Collaborator

@Enselic Enselic commented Jul 22, 2021

HighlightingAssets::get_syntax_set() is never called when e.g. piping the bat output to a file (see Config::loop_through), so by loading the SyntaxSet only when needed, we radically improve startup time when it is not needed.

On my low-end machine, I get the following numbers when doing
time bat tests/examples/multiline.txt > /tmp/output.txt:

git master

0,09s user 0,00s system 93% cpu 0,103 total

git master + this PR

0,01s user 0,00s system 63% cpu 0,026 total

We can also benchmark with hyperfine , in which case we don't need to redirect the output manually, because the internal redirection of hyperfine is enough to trigger loop-through mode:
hyperfine --export-markdown /dev/tty 'bat tests/examples/multiline.txt'

git master

Command Mean [ms] Min [ms] Max [ms] Relative
bat tests/examples/multiline.txt 96.1 ± 2.2 93.3 104.3 1.00

git master + this PR

Command Mean [ms] Min [ms] Max [ms] Relative
bat tests/examples/multiline.txt 11.2 ± 1.1 9.7 15.9 1.00

So on my system, the speedup amounts to (1-11.2/96.1)*100 ~ 88%.

Lazy-loading was one of the key aspects of my prototype on how to improve the startup speed of bat, and this PR is one step on the journey to improve startup speed in general.

I have done sanity checking of this change, and all regression tests pass, so this PR should not be far off production-quality code. But don't hesitate to give criticism when doing code review :) And if you think my approach is completely off track, I would gladly like to know!

I also want to clarify that I think this is a good move long term, even if we start to load partial syntax sets for improved performance. The ability to load the full syntax set is still useful, both for fallback purposes, and for implementing things like —list-languages

Cargo.toml Show resolved Hide resolved
src/assets.rs Outdated Show resolved Hide resolved
@Enselic
Copy link
Collaborator Author

Enselic commented Jul 22, 2021

Also starting to lazy-load theme_set should also improve startup speed, although not as dramatic in absolute numbers since it is much smaller.

I plan on looking into that once this PR has landed.

Edit: I couldn't resist, so I did a quick copy-paste job to see what difference it would make, and it reduces startup time to 5.8 ms on my system. Now the bottleneck is constructing SyntaxMapping, and skipping that takes the time down to 2.8 ms. But let's take one step at a time.

Copy link
Collaborator

@keith-hall keith-hall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice achievement, great job!

@sharkdp
Copy link
Owner

sharkdp commented Jul 25, 2021

This looks great - thank you very much!

Just for my understanding: The use of LazyCell (instead of a simple Option, for example) is needed in order to make the get_syntax_set method &self (and not &mut self), right?

I can confirm the benchmark results.

hyperfine \
  --warmup 5 \
  -L bat bat-master,bat-feature \
  --export-markdown results-loopthrough.md \
  './{bat} tests/examples/multiline.txt' 
Command Mean [ms] Min [ms] Max [ms] Relative
./bat-master tests/examples/multiline.txt 48.5 ± 1.2 47.1 52.6 7.61 ± 1.04
./bat-feature tests/examples/multiline.txt 6.4 ± 0.9 5.3 12.8 1.00

I thought it would also be good to make sure that normal syntax set loading is not (significantly) slower, even if I wouldn't expect it to be. This can also be confirmed:

hyperfine \
  --warmup 5 \
  -L bat bat-master,bat-feature \
   --export-markdown results-normal.md \
  './{bat} -f src/less.rs'
Command Mean [ms] Min [ms] Max [ms] Relative
./bat-master -f src/less.rs 64.6 ± 0.9 63.4 67.9 1.00
./bat-feature -f src/less.rs 64.8 ± 1.0 63.5 69.0 1.00 ± 0.02

@sharkdp sharkdp mentioned this pull request Jul 25, 2021
11 tasks
@Enselic
Copy link
Collaborator Author

Enselic commented Jul 25, 2021

Just for my understanding: The use of LazyCell (instead of a simple Option, for example) is needed in order to make the get_syntax_set method &self (and not &mut self), right?

Yes, the use of LazyCell is to enable get_syntax_set() on &self, which is a recommended pattern (with some variation) to implement "hidden" caching, or lazy-loading. But that is not the full story. I also wanted to avoid breaking the public bat-as-a-library API. In particular HighlightingAssets::syntaxes(). That method assume things can never go wrong, so there would be no sensible way to implement it if get_syntax_set() would return, say, an Option.

When it is time to look over the public API and make breaking API changes, we could perhaps make it necessary for clients to handle failure to load the cache. Not sure it's worth the complexity though... I suggest we wait and see how much a problem my .expect("cache corrupt ... code is in practice.

Thanks for the code reviews. I will try to merge this shortly. I just need some more time to explore how the code behaves when the cache is corrupt.

@Enselic
Copy link
Collaborator Author

Enselic commented Jul 26, 2021

So, the PR in its current form introduces a behavioral change when syntaxes.bin is missing from the cache, and I can't make up my mind about if this is a problem or not. I would love to get a second opinion on it.

Step-by-step

  1. Build a cache. One easy way is to do this:
    bat cache --build --blank --source ~/src/bat/assets
  2. Remove only syntaxes.bin from the cache:
    rm ~/.cache/bat/syntaxes.bin
  3. Ask bat to highlight a file:
    bat examples/simple.rs

git master behavior:

Bat silently falls back to using assets (both themes and syntaxes) from the binary.

PR behavior:

Bat panics:

% bat-pr examples/simple.rs 
thread 'main' panicked at 'cache corrupt, consider rebuilding or clearing, see https://github.com/sharkdp/bat#adding-new-syntaxes--language-definitions on how: Error(Msg("Could not load cached syntax set '/home/martin/.cache/bat/syntaxes.bin'"), State { next_error: Some(Os { code: 2, kind: NotFound, message: "No such file or directory" }), backtrace: InternalBacktrace })', src/assets.rs:324:55
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Alternative approach

It is straightforward to also make bat in this PR fallback to binary assets in the scenario above. We just need to (try to) read the file during startup, and only defer parsing (which is the slow part) of it. The code for that looks like this: Enselic@4197558a35

In terms of performance with a fully valid cache and with no cache, it doesn't matter much what route we take. Here are the numbers, where I compare "this PR" with "this PR + above commit", with and without a cache, and with and without loop-through mode:

% bat cache --build --blank --source ~/src/bat/assets
% hyperfine --export-markdown /dev/tty -L bat bat-pr,bat-pr-prime -L color never,always '{bat} --color {color} examples/simple.rs'
% mv ~/.cache/bat ~/.cache/bat-disabled
% hyperfine --export-markdown /dev/tty -L bat bat-pr,bat-pr-prime -L color never,always '{bat} --color {color} examples/simple.rs'

With cache:

Command Mean [ms] Min [ms] Max [ms] Relative
bat-pr --color never examples/simple.rs 16.8 ± 1.1 15.0 21.8 1.00
bat-pr-prime --color never examples/simple.rs 17.5 ± 1.2 15.7 22.5 1.04 ± 0.10
bat-pr --color always examples/simple.rs 113.5 ± 1.7 109.1 117.7 6.74 ± 0.47
bat-pr-prime --color always examples/simple.rs 115.2 ± 2.3 111.4 122.2 6.84 ± 0.49

Without cache:

Command Mean [ms] Min [ms] Max [ms] Relative
bat-pr --color never examples/simple.rs 11.8 ± 1.4 10.1 21.8 1.00 ± 0.16
bat-pr-prime --color never examples/simple.rs 11.8 ± 1.2 10.2 17.1 1.00
bat-pr --color always examples/simple.rs 109.0 ± 2.5 105.5 117.3 9.24 ± 0.97
bat-pr-prime --color always examples/simple.rs 112.9 ± 2.4 109.1 121.1 9.57 ± 1.01

I'm starting to love hyperfine more and more btw, such a great tool! :D You really have a talent for coming up with useful tools, David :)

Anyway, the question is basically: Do you think we need to worry about people poking around in the cache like this? (I currently lean more towards “no, we don’t”.)

@sharkdp
Copy link
Owner

sharkdp commented Jul 26, 2021

Hm. I usually try to invest a lot of time to get rid of possible panics because I don't want users to see these messages. It would really turn me away from a tool. This sounds like a weird edge case where it might be okay for us to show a panic message to the user, but it does look rather scary for the average user ("panicked?", "corrupt?", "what the hell is a backtrace"?). If we can be sure that this is the only scenario where it will be shown, I'm okay with it. Otherwise, I would prefer your alternative solution.

@sharkdp
Copy link
Owner

sharkdp commented Jul 26, 2021

I'm starting to love hyperfine more and more btw, such a great tool! :D You really have a talent for coming up with useful tools, David :)

Thank you for the feedback! If you are interested, I encourage you to learn about Warmup runs and prepare commands which can be really useful to compare warm disk cache with cold disk cache. In this scenario here, it could actually be really useful to compare the two, as we are actually loading additional files from disk. And they might not be in the (disk) cache, if a user runs bat from time to time only.

@Enselic
Copy link
Collaborator Author

Enselic commented Jul 26, 2021

As far as I know, the panic will only happen in the scenario I described. But I do understand your concerns, so I have decided to go the extra mile and incorporate the extra change into this PR. Thanks for pushing me in the right direction 👍 (That will also be a good opportunity for me to play around with --prepare etc of hyperfine to explore performance numbers with cold file caches.)

Btw, I have noticed you seem to prefer hyperfine --export-markdown results.md .... I was curious if you picked up on my trick you use hyperfine --export-markdown /dev/tty ... instead? That way the results ends up directly in the terminal for immediate copy-paste. In your case, don't you need an extra step to access the contents of results.md?

@sharkdp
Copy link
Owner

sharkdp commented Jul 26, 2021

Btw, I have noticed you seem to prefer hyperfine --export-markdown results.md .... I was curious if you picked up on my trick you use hyperfine --export-markdown /dev/tty ... instead? That way the results ends up directly in the terminal for immediate copy-paste. In your case, don't you need an extra step to access the contents of results.md?

I have used this in the past as well. I didn't know /dev/tty, so I used /dev/stdout. I didn't use it here, as it doesn't work anymore with the latest master version of hyperfine(!), which I just noticed when I commented here. This is probably due to sharkdp/hyperfine#339 which is also very useful. Will have to think about whether to restore that old export-to-stdout "feature" somehow 😄

@sharkdp
Copy link
Owner

sharkdp commented Jul 26, 2021

That way the results ends up directly in the terminal for immediate copy-paste.

For bonus points:

hyperfine 'sleep 0.1' 'sleep 0.2' --export-markdown >(xclip)

Enselic added a commit to Enselic/bat that referenced this pull request Jul 27, 2021
To enable robust and user-friendly support for lazy-loading, we need variants of
get_syntax_set() and syntaxes() that can fail (see discussion about panics in
sharkdp#1747).

This commit deprecates old public syntaxes() and introduces a failable version
called get_syntaxes().
Enselic added a commit to Enselic/bat that referenced this pull request Jul 28, 2021
Or rather, introduce new versions of these methods and deprecate the old ones.

This is preparation to enable robust and user-friendly support for lazy-loading.
With lazy-loading, we don't know if the SyntaxSet is valid until after we try to
use it, so wherever we try to use it, we need to return a Result. See discussion
about panics in sharkdp#1747.
Enselic added a commit that referenced this pull request Jul 29, 2021
Or rather, introduce new versions of these methods and deprecate the old ones.

This is preparation to enable robust and user-friendly support for lazy-loading.
With lazy-loading, we don't know if the SyntaxSet is valid until after we try to
use it, so wherever we try to use it, we need to return a Result. See discussion
about panics in #1747.
They are just a way to get access to data embedded in the binary, so they don't
conceptually belong inside HighlightingAssets.

This has the nice side effect of getting HighlightingAssets::from_cache() and
::from_binary(), that are highly related, next to each other.
@Enselic Enselic changed the title Improve loop-through startup speed by 80%-90% Reduce startup time in loop-through mode with 80%-90% Jul 29, 2021
@Enselic
Copy link
Collaborator Author

Enselic commented Jul 29, 2021

I have now rebased this PR on top of the merged PRs #1758, #1755 and #1755, and here is updated performance numbers and behavior.

First, let's compare startup time between current git master bat and bat from this PR:
hyperfine --export-markdown /dev/tty -L bat bat-new,bat -L color never,always '{bat} --color {color} examples/simple.rs'

Command Mean [ms] Min [ms] Max [ms] Relative
bat-new --color never examples/simple.rs 11.1 ± 0.9 9.8 14.7 1.00
bat --color never examples/simple.rs 95.9 ± 1.2 93.9 98.6 8.62 ± 0.74
bat-new --color always examples/simple.rs 107.6 ± 1.4 105.3 110.8 9.67 ± 0.83
bat --color always examples/simple.rs 108.0 ± 1.6 105.7 112.1 9.71 ± 0.84

And we can confirm that the PR bat is still significantly faster in loop-through mode, and same as before in normal mode. Now let's use the new bat in this PR to compare performance with and without custom assets (that are otherwise byte-by-byte identical):
hyperfine --export-markdown /dev/tty -L bat bat-new -L assets --no-custom-assets, -L color never,always '{bat} --color {color} examples/simple.rs {assets}'

Command Mean [ms] Min [ms] Max [ms] Relative
bat-new --color never examples/simple.rs --no-custom-assets 11.7 ± 0.9 10.3 14.3 1.01 ± 0.11
bat-new --color never examples/simple.rs 11.6 ± 0.8 10.3 14.1 1.00
bat-new --color always examples/simple.rs --no-custom-assets 110.5 ± 1.3 107.8 113.8 9.53 ± 0.68
bat-new --color always examples/simple.rs 108.9 ± 1.4 106.5 112.2 9.39 ± 0.67

And as expected, it practically does not matter if custom or integrated assets are used.

Finally, let's now see what happens if the cache is corrupt. First, let's keep syntaxes.bin around but put garbage in it:

% echo garbage >! ~/.cache/bat/syntaxes.bin
% bat-new examples/simple.rs
[bat error]: Could not parse cached syntax set

Instead of panicing, bat now explains that the cached syntax set could not be parsed. And if we remove the file completely:

% rm ~/.cache/bat/syntaxes.bin
% bat-new examples/simple.rs
[bat error]: Could not load cached syntax set
'/home/martin/.cache/bat/syntaxes.bin'

bat now says that the file can not be found. Note that this is changed behavior compared to before, where we silently used integrated assets if the cached assets were corrupt. But I think the new behavior is better. If the cache is corrupt, the user probably wants to know.

If we remove the cache completely, then of course integrated assets are used:

% rm -rf ~/.cache/bat
% bat-new examples/simple.rs
───────┬─────────────────────────────
       │ File: examples/simple.rs
───────┼─────────────────────────────
[...]

Would be great if you could take a renewed look on this PR before I merge.

@Enselic Enselic changed the title Reduce startup time in loop-through mode with 80%-90% Reduce startup time in loop-through mode by 80%-90% Jul 29, 2021
@Enselic Enselic requested a review from sharkdp July 29, 2021 17:02
Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great -thank you!

Note that this is changed behavior compared to before, where we silently used integrated assets if the cached assets were corrupt. But I think the new behavior is better. If the cache is corrupt, the user probably wants to know.

I completely agree 👍

Instead of 100 ms - 50 ms, startup takes 10 ms - 5 ms.

HighlightingAssets::get_syntax_set() is never called when e.g. piping the bat
output to a file (see Config::loop_through), so by loading the SyntaxSet only
when needed, we radically improve startup time when it is not needed.
@Enselic Enselic merged commit 6acec2c into sharkdp:master Jul 29, 2021
@Enselic Enselic deleted the lazy-load-syntaxes.bin branch July 29, 2021 18:36
@Enselic
Copy link
Collaborator Author

Enselic commented Jul 29, 2021

CI problems are unrelated. Merged. (:tada: )

CHANGELOG.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants