New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix quadratic runtime when updating region highlighter matches #4717
Conversation
7c6c288
to
89993ba
Compare
89993ba
to
4c6b44d
Compare
src/highlighters.cc
Outdated
@@ -2210,7 +2210,7 @@ struct RegionsHighlighter : public Highlighter | |||
m_regexes.insert({key, Regex{str, flags}}); | |||
} | |||
|
|||
void add_matches(const Buffer& buffer, LineRange range, Cache& cache) const | |||
void add_matches(const Buffer& buffer, const LineRangeList& ranges, Cache& cache) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider avoiding having to build up that temporary LineRangeList by builiding and passing the Matchers externally ? add_matches would then just contain the loop at line 2229 and we would not need to refactor line_modifications. (We could introduce a Matchers struct that would own the Vector and could provide a method to do the inplace_merge, add_matches could actually also be a method of that struct).
Not sure if I missed something here, but if we can avoid building a temporary vector and refactoring line_modifications I'd prefer that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I want to avoid the vector by using coroutines with this commit on top but that made CI fail for clang<14), so I'm waiting with that commit
a domain-specific solution with a custom struct sounds possible as well, I'll check later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the BatchMatchAdder to avoid the temporary vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! Sorry for how long this took to review.
…dification LineRangeSet::add_range() calls Vector::erase() in a loop over the same vector. This could cause performance problems when there are many selections. Fix this by only calling Vector::erase() once. I didn't measure anything because my benchmark is dominated by another issue (see next commit). LineRangeSet::remove_range() also has a suspicious call to erase() but that one is only used in test code, so it doesn't matter.
Running %sYeti<ret>casdf on file [example.journal.txt](mawww#4685 (comment)) can cause noticeable lag. This is because we insert text at 6000 selections, which means we need to update highlighters in those lines. The runtime for updating range highlighters is quadratic in the number of selections: for each selection, we call on_new_range(), which calls add_matches(), which calls std::rotate(), which needs needs linear time. Fix the quadratic runtime by calling std::inplace_merge() once instead of repeatedly calling std::rotate(). This is works because ranges are already sorted. I used this script to benchmark the improvements. (In hindsight I could have just used "-ui json" instead of tmux). #!/bin/sh set -ex N=${1:-100} kak=${2:-./kak.opt} for i in $(seq "$N") do echo -n "\ 2022-02-06 * Earth expense:electronics:audio 116.7 USD liability:card -116.7 USD 2022-02-06 * Blue Yeti USB Microphone expense:electronics:audio 116.7 USD liability:card -116.7 USD " done > big-journal.ledger echo > .empty-tmux.conf 'set -sg escape-time 5' test_tmux() { tmux -S .tmux-socket -f .empty-tmux.conf "$@" } test_tmux new-session -d "$kak" big-journal.ledger test_tmux send-keys '%sYeti' Enter c 1234567890 sleep .2 test_tmux send-keys Escape while ! test_tmux capture-pane -p | grep 123 do sleep .1 done test_tmux send-keys ':wq' Enter while test_tmux ls do sleep .1 done rm -f .tmux-socket .empty-tmux.conf This script's runtime used to grow super-linearly but now it grows linearly: kak.old kak.new N=10000 1.142 0.897 N=20000 2.879 1.400 Detailed results: $ hyperfine -w 1 './bench.sh 10000 ./kak.opt.'{old,new} Benchmark 1: ./bench.sh 10000 ./kak.opt.old Time (mean ± σ): 1.142 s ± 0.072 s [User: 0.252 s, System: 0.059 s] Range (min … max): 1.060 s … 1.242 s 10 runs Benchmark 2: ./bench.sh 10000 ./kak.opt.new Time (mean ± σ): 897.2 ms ± 19.3 ms [User: 241.6 ms, System: 57.4 ms] Range (min … max): 853.9 ms … 923.6 ms 10 runs Summary './bench.sh 10000 ./kak.opt.new' ran 1.27 ± 0.09 times faster than './bench.sh 10000 ./kak.opt.old' $ hyperfine -w 1 './bench.sh 20000 ./kak.opt.'{old,new} Benchmark 1: ./bench.sh 20000 ./kak.opt.old Time (mean ± σ): 2.879 s ± 0.065 s [User: 0.553 s, System: 0.126 s] Range (min … max): 2.768 s … 2.963 s 10 runs Benchmark 2: ./bench.sh 20000 ./kak.opt.new Time (mean ± σ): 1.400 s ± 0.018 s [User: 0.428 s, System: 0.083 s] Range (min … max): 1.374 s … 1.429 s 10 runs Summary './bench.sh 20000 ./kak.opt.new' ran 2.06 ± 0.05 times faster than '../repro.sh 20000 ./kak.opt.old'
4c6b44d
to
803873c
Compare
No description provided.