This repository has been archived by the owner on May 3, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Segment tree (tournament tree) always makes at most
log_2(size)
count of comparison.Pessimistic heap (our previous approach) makes
X
count of comparison:X = log_2(size + 1) - 1
when iterator exhaustedlog_2(size + 1) <= X <= 2 * log_2(size + 1) - 2
otherwiseAlso tree layout more cache friendly than heap layout and our code was worse than described.
Do exist drawbacks?
Yes.
I don't think these are issues because
Results
I expect speedup consolidation and querying inverted indexes with primary sort
Benchmark iresearch:
tree:
heap:
So we see speedup even for very cheap comparator: just memcmp 6 bytes
I expect more benefits for more expensive comparator or read only workload
Benchmark arangodb:
TODO
Alternatives
Optimistic heap (siftdown instead of unconditional siftdown + siftup in pessimistic heap)
It makes
X
count of comparison:X = 1
when lead iterator not changedX = log_2(size + 1) - 1
when iterator exhausted3 <= X <= 2 * log_2(size + 1) - 2
otherwise, important note: it is discrete, so step is 2 compare to any asymptotic aboveIn experiments and just common sense we could understand:
So yes in lucky cases it could be better than tree, but in all other it is even worse than pessimistic heap.
So I want to use tree because it is improvement for almost any data.
If we try to switch to optimistic heap it will be better for one data and worse for another.