Remove repeated `contains` calls #1421

CosmicHorrorDev · 2020-12-02T09:21:29Z

(I can edit changelog if you want, but this should result in no usage difference nor is it a bugfix)

Background

First off thanks for the great program (not to mention the others you've made that I also use)! I've already recommended it to several of my friends.

So just to give some background I was using bat to read a file and pipe the info through some other programs. I was a bit confused why changes to the program weren't mattering when I finally realized that bat was limiting the throughput. I decided to see if there were any obvious changes that could help out, and from cargo flamegraph I saw a surprising amount of time in HashSet contains which I narrowed down to checking the styles for different values. Long story short it was being checked repeatedly in the hot loop leading to a significant amount of the runtime. Moving the check outside the loop yielded a decent performance increase and shouldn't have an impact on the program's behavior since the values shouldn't change while the program is displaying (at least AFAIK).

(Partially off topic, but does it really make sense to store the styles in a HashSet? I would think that a Vec would be faster unless there are going to be more styles than I would expect.)

Potential Remaining Improvements

The other change that looks like it could be low hanging fruit would be modifying the line range checks since they now take up about ~10% of the runtime from random.file below. I think the behavior could be modified to boil down to determining the next time the range value will change ahead of time, but I haven't dug into it too much.

After that the vast majority of the time is spent either reading or writing. I've already read that you don't want to switch to a buffered writer for valid reasons. A large portion of the reading does seem to be taken up by checking for where the new-line using read_until. Nothing too notably really seemed to stick out from writing, it looks like stdout isn't locked from OutputType: however, my attempt locking it didn't seem to improve performance from what I could tell. Another possible way for improving throughput would be to move reading and writing into separate threads passing values over some sync_channel? It's possible that switching reading and writing to be async would essentially do this too? I'm not very familiar with async code if I'm being honest.

Basic Benchmarks

All results run on my rather lightweight laptop. Linux 5.9.11, Intel i5-5200U @ 2.7GHz. I used three different 1GB files all in a tmpfs (I only have 8GB RAM total so that's pretty much my limit). zero.file is entirely 0x00 bytes, random.file is from /dev/urandom, and controller_1gb.rs is controller.rs duplicated till it was 1GB.

# Note this one caused a spike in memory usage due to extending the buffer to
# cover the whole file since there's no new line.
❯ ./bat_orig zero.file | pv > /dev/null
1.00GiB 0:00:02 [ 490MiB/s]

❯ ./bat_orig random.file | pv > /dev/null
1.00GiB 0:00:03 [ 272MiB/s]

❯ ./bat_orig controller_1gb.rs | pv > /dev/null
1.00GiB 0:00:25 [40.1MiB/s]

~10% of time from random.file is spent in Hashset::contains. The remaining time all looks to be from reading and writing.

❯ ./bat_less_hashing zero.file | pv > /dev/null
1.00GiB 0:00:02 [ 493MiB/s]

❯ ./bat_less_hashing random.file | pv > /dev/null
1.00GiB 0:00:03 [ 294MiB/s]

❯ ./bat_less_hashing controller_1gb.rs | pv > /dev/null
1.00GiB 0:00:23 [44.5MiB/s]

Essentially no change from zero.file as expected. The files that iterate through the loop more have a moderate improvement.

eth-p · 2020-12-02T11:33:51Z

(Partially off topic, but does it really make sense to store the styles in a HashSet? I would think that a Vec would be faster unless there are going to be more styles than I would expect.)

On my own fork, I toyed around with the idea of using a bitfield for this. I didn't end up doing anything with it, but it might be worth revisiting if the HashSet makes that much of a performance difference.

keith-hall

Thanks for your contribution and the detailed description/benchmark findings. This PR looks good to me.

sharkdp · 2020-12-21T08:22:13Z

Thank you very much for your contribution and feedback!

After that the vast majority of the time is spent either reading or writing. I've already read that you don't want to switch to a buffered writer for valid reasons. A large portion of the reading does seem to be taken up by checking for where the new-line using read_until. Nothing too notably really seemed to stick out from writing, it looks like stdout isn't locked from OutputType: however, my attempt locking it didn't seem to improve performance from what I could tell. Another possible way for improving throughput would be to move reading and writing into separate threads passing values over some sync_channel? It's possible that switching reading and writing to be async would essentially do this too? I'm not very familiar with async code if I'm being honest.

I'm happy for any kind of exploratory work in this respect 👍. See also: #712 which I sadly didn't have time to look at so far. Also, if we would like to have a really fast throughput mode (i.e. only if syntax highlighting is disabled, of course): https://endler.dev/2018/fastcat/

Remove repeated contains calls

8aa3344

keith-hall approved these changes Dec 2, 2020

View reviewed changes

Remove commented-out line from Cargo.toml

2053208

sharkdp approved these changes Dec 21, 2020

View reviewed changes

sharkdp merged commit 73cff42 into sharkdp:master Dec 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove repeated `contains` calls #1421

Remove repeated `contains` calls #1421

CosmicHorrorDev commented Dec 2, 2020

eth-p commented Dec 2, 2020 •

edited

Loading

keith-hall left a comment

sharkdp commented Dec 21, 2020

Remove repeated contains calls #1421

Remove repeated contains calls #1421

Conversation

CosmicHorrorDev commented Dec 2, 2020

Background

Potential Remaining Improvements

Basic Benchmarks

eth-p commented Dec 2, 2020 • edited Loading

keith-hall left a comment

Choose a reason for hiding this comment

sharkdp commented Dec 21, 2020

Remove repeated `contains` calls #1421

Remove repeated `contains` calls #1421

eth-p commented Dec 2, 2020 •

edited

Loading