Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix quadratic runtime when updating region highlighter matches #4717

Merged
merged 2 commits into from Sep 17, 2022

Conversation

krobelus
Copy link
Contributor

@krobelus krobelus commented Aug 16, 2022

No description provided.

@@ -2210,7 +2210,7 @@ struct RegionsHighlighter : public Highlighter
m_regexes.insert({key, Regex{str, flags}});
}

void add_matches(const Buffer& buffer, LineRange range, Cache& cache) const
void add_matches(const Buffer& buffer, const LineRangeList& ranges, Cache& cache) const
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider avoiding having to build up that temporary LineRangeList by builiding and passing the Matchers externally ? add_matches would then just contain the loop at line 2229 and we would not need to refactor line_modifications. (We could introduce a Matchers struct that would own the Vector and could provide a method to do the inplace_merge, add_matches could actually also be a method of that struct).

Not sure if I missed something here, but if we can avoid building a temporary vector and refactoring line_modifications I'd prefer that.

Copy link
Contributor Author

@krobelus krobelus Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I want to avoid the vector by using coroutines with this commit on top but that made CI fail for clang<14), so I'm waiting with that commit

a domain-specific solution with a custom struct sounds possible as well, I'll check later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the BatchMatchAdder to avoid the temporary vector

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! Sorry for how long this took to review.

…dification

LineRangeSet::add_range() calls Vector::erase() in a loop over the
same vector. This could cause performance problems when there are many
selections. Fix this by only calling Vector::erase() once.  I didn't
measure anything because my benchmark is dominated by another issue
(see next commit).

LineRangeSet::remove_range() also has a suspicious call to erase()
but that one is only used in test code, so it doesn't matter.
Running %sYeti<ret>casdf on file
[example.journal.txt](mawww#4685 (comment))
can cause noticeable lag.  This is because we insert text at 6000
selections, which means we need to update highlighters in those lines.
The runtime for updating range highlighters is quadratic in the
number of selections: for each selection, we call on_new_range(),
which calls add_matches(), which calls std::rotate(), which needs
needs linear time.

Fix the quadratic runtime by calling std::inplace_merge() once instead
of repeatedly calling std::rotate().  This is works because ranges
are already sorted.

I used this script to benchmark the improvements.
(In hindsight I could have just used "-ui json" instead of tmux).

	#!/bin/sh
	set -ex
	N=${1:-100}
	kak=${2:-./kak.opt}
	for i in $(seq "$N")
	do
		echo -n "\
	2022-02-06 * Earth
	    expense:electronics:audio    116.7 USD
	    liability:card              -116.7 USD

	2022-02-06 * Blue Yeti USB Microphone
	    expense:electronics:audio    116.7 USD
	    liability:card              -116.7 USD

	"
	done > big-journal.ledger
	echo > .empty-tmux.conf 'set -sg escape-time 5'
	test_tmux() {
		tmux -S .tmux-socket -f .empty-tmux.conf "$@"
	}
	test_tmux new-session -d "$kak" big-journal.ledger
	test_tmux send-keys '%sYeti' Enter c 1234567890
	sleep .2
	test_tmux send-keys Escape
	while ! test_tmux capture-pane -p | grep 123
	do
		sleep .1
	done
	test_tmux send-keys ':wq' Enter
	while test_tmux ls
	do
		sleep .1
	done
	rm -f .tmux-socket .empty-tmux.conf

This script's runtime used to grow super-linearly but now it grows
linearly:

	         kak.old  kak.new
	N=10000    1.142    0.897
	N=20000    2.879    1.400

Detailed results:

	$ hyperfine -w 1 './bench.sh 10000 ./kak.opt.'{old,new}
	Benchmark 1: ./bench.sh 10000 ./kak.opt.old
	  Time (mean ± σ):      1.142 s ±  0.072 s    [User: 0.252 s, System: 0.059 s]
	  Range (min … max):    1.060 s …  1.242 s    10 runs

	Benchmark 2: ./bench.sh 10000 ./kak.opt.new
	  Time (mean ± σ):     897.2 ms ±  19.3 ms    [User: 241.6 ms, System: 57.4 ms]
	  Range (min … max):   853.9 ms … 923.6 ms    10 runs

	Summary
	  './bench.sh 10000 ./kak.opt.new' ran
	    1.27 ± 0.09 times faster than './bench.sh 10000 ./kak.opt.old'
	$ hyperfine -w 1 './bench.sh 20000 ./kak.opt.'{old,new}
	Benchmark 1: ./bench.sh 20000 ./kak.opt.old
	  Time (mean ± σ):      2.879 s ±  0.065 s    [User: 0.553 s, System: 0.126 s]
	  Range (min … max):    2.768 s …  2.963 s    10 runs

	Benchmark 2: ./bench.sh 20000 ./kak.opt.new
	  Time (mean ± σ):      1.400 s ±  0.018 s    [User: 0.428 s, System: 0.083 s]
	  Range (min … max):    1.374 s …  1.429 s    10 runs

	Summary
	  './bench.sh 20000 ./kak.opt.new' ran
	    2.06 ± 0.05 times faster than '../repro.sh 20000 ./kak.opt.old'
@mawww mawww merged commit 24d6072 into mawww:master Sep 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants