Fix quadratic runtime when updating region highlighter matches #4717

krobelus · 2022-08-16T22:56:13Z

No description provided.

mawww · 2022-09-16T12:40:00Z

src/highlighters.cc

@@ -2210,7 +2210,7 @@ struct RegionsHighlighter : public Highlighter
        m_regexes.insert({key, Regex{str, flags}});
    }

-    void add_matches(const Buffer& buffer, LineRange range, Cache& cache) const
+    void add_matches(const Buffer& buffer, const LineRangeList& ranges, Cache& cache) const


Did you consider avoiding having to build up that temporary LineRangeList by builiding and passing the Matchers externally ? add_matches would then just contain the loop at line 2229 and we would not need to refactor line_modifications. (We could introduce a Matchers struct that would own the Vector and could provide a method to do the inplace_merge, add_matches could actually also be a method of that struct).

Not sure if I missed something here, but if we can avoid building a temporary vector and refactoring line_modifications I'd prefer that.

yeah I want to avoid the vector by using coroutines with this commit on top but that made CI fail for clang<14), so I'm waiting with that commit

a domain-specific solution with a custom struct sounds possible as well, I'll check later

added the BatchMatchAdder to avoid the temporary vector

Thanks ! Sorry for how long this took to review.

…dification LineRangeSet::add_range() calls Vector::erase() in a loop over the same vector. This could cause performance problems when there are many selections. Fix this by only calling Vector::erase() once. I didn't measure anything because my benchmark is dominated by another issue (see next commit). LineRangeSet::remove_range() also has a suspicious call to erase() but that one is only used in test code, so it doesn't matter.

Running %sYeti<ret>casdf on file [example.journal.txt](mawww#4685 (comment)) can cause noticeable lag. This is because we insert text at 6000 selections, which means we need to update highlighters in those lines. The runtime for updating range highlighters is quadratic in the number of selections: for each selection, we call on_new_range(), which calls add_matches(), which calls std::rotate(), which needs needs linear time. Fix the quadratic runtime by calling std::inplace_merge() once instead of repeatedly calling std::rotate(). This is works because ranges are already sorted. I used this script to benchmark the improvements. (In hindsight I could have just used "-ui json" instead of tmux). #!/bin/sh set -ex N=${1:-100} kak=${2:-./kak.opt} for i in $(seq "$N") do echo -n "\ 2022-02-06 * Earth expense:electronics:audio 116.7 USD liability:card -116.7 USD 2022-02-06 * Blue Yeti USB Microphone expense:electronics:audio 116.7 USD liability:card -116.7 USD " done > big-journal.ledger echo > .empty-tmux.conf 'set -sg escape-time 5' test_tmux() { tmux -S .tmux-socket -f .empty-tmux.conf "$@" } test_tmux new-session -d "$kak" big-journal.ledger test_tmux send-keys '%sYeti' Enter c 1234567890 sleep .2 test_tmux send-keys Escape while ! test_tmux capture-pane -p | grep 123 do sleep .1 done test_tmux send-keys ':wq' Enter while test_tmux ls do sleep .1 done rm -f .tmux-socket .empty-tmux.conf This script's runtime used to grow super-linearly but now it grows linearly: kak.old kak.new N=10000 1.142 0.897 N=20000 2.879 1.400 Detailed results: $ hyperfine -w 1 './bench.sh 10000 ./kak.opt.'{old,new} Benchmark 1: ./bench.sh 10000 ./kak.opt.old Time (mean ± σ): 1.142 s ± 0.072 s [User: 0.252 s, System: 0.059 s] Range (min … max): 1.060 s … 1.242 s 10 runs Benchmark 2: ./bench.sh 10000 ./kak.opt.new Time (mean ± σ): 897.2 ms ± 19.3 ms [User: 241.6 ms, System: 57.4 ms] Range (min … max): 853.9 ms … 923.6 ms 10 runs Summary './bench.sh 10000 ./kak.opt.new' ran 1.27 ± 0.09 times faster than './bench.sh 10000 ./kak.opt.old' $ hyperfine -w 1 './bench.sh 20000 ./kak.opt.'{old,new} Benchmark 1: ./bench.sh 20000 ./kak.opt.old Time (mean ± σ): 2.879 s ± 0.065 s [User: 0.553 s, System: 0.126 s] Range (min … max): 2.768 s … 2.963 s 10 runs Benchmark 2: ./bench.sh 20000 ./kak.opt.new Time (mean ± σ): 1.400 s ± 0.018 s [User: 0.428 s, System: 0.083 s] Range (min … max): 1.374 s … 1.429 s 10 runs Summary './bench.sh 20000 ./kak.opt.new' ran 2.06 ± 0.05 times faster than '../repro.sh 20000 ./kak.opt.old'

krobelus mentioned this pull request Aug 17, 2022

[BUG] Syntax highlighting is slow and inefficient. #4685

Closed

krobelus force-pushed the faster-update-matches branch 4 times, most recently from 7c6c288 to 89993ba Compare August 28, 2022 14:43

krobelus force-pushed the faster-update-matches branch from 89993ba to 4c6b44d Compare September 10, 2022 05:57

mawww reviewed Sep 16, 2022

View reviewed changes

krobelus added 2 commits September 17, 2022 06:44

krobelus force-pushed the faster-update-matches branch from 4c6b44d to 803873c Compare September 17, 2022 11:48

mawww merged commit 24d6072 into mawww:master Sep 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix quadratic runtime when updating region highlighter matches #4717

Fix quadratic runtime when updating region highlighter matches #4717

krobelus commented Aug 16, 2022 •

edited

mawww Sep 16, 2022

krobelus Sep 16, 2022 •

edited

krobelus Sep 17, 2022

mawww Sep 17, 2022

Fix quadratic runtime when updating region highlighter matches #4717

Fix quadratic runtime when updating region highlighter matches #4717

Conversation

krobelus commented Aug 16, 2022 • edited

mawww Sep 16, 2022

Choose a reason for hiding this comment

krobelus Sep 16, 2022 • edited

Choose a reason for hiding this comment

krobelus Sep 17, 2022

Choose a reason for hiding this comment

mawww Sep 17, 2022

Choose a reason for hiding this comment

krobelus commented Aug 16, 2022 •

edited

krobelus Sep 16, 2022 •

edited