perf: O(log n) sorted set rank and incremental memory tracking by kacy · Pull Request #224 · kacy/ember

kacy · 2026-02-19T23:41:11Z

summary

two related improvements to data structure performance:

sorted set rank: O(n) → O(log n)

the previous rank() used BTreeMap::range(..key).count() which walks every node up to the target — O(n) for a 100k-member leaderboard. the BTreeMap has been replaced with a sorted Vec<(OrderedFloat<f64>, Arc<str>)>, making rank() a binary search: O(log n).

range_by_rank() is also simplified — it's a plain slice rather than skip(n).take(m) over a BTreeMap iterator. the Vec is more cache-friendly for iteration.

as a side effect, each member now costs ~24 bytes for its Vec slot vs ~64 for a BTreeMap node. the tradeoff is O(n) Vec insert/remove, but cache-friendly memmove is faster than BTreeMap pointer-chasing for typical sorted set sizes.

also adds a data_bytes field that caches the sum of member string lengths, making memory_usage() O(1) rather than iterating all members on every mutation.

incremental memory tracking for collection mutations

every list push/pop previously called entry_size() twice (before and after the mutation), each of which iterates all elements in the VecDeque. for a 10k-element list, a single LPUSH triggered 20k element-scans just for memory bookkeeping.

the fix uses the delta that is already computed:

list_push: the total element increase is precomputed for reserve_memory; it's now applied with memory.grow_by() instead of track_size()
list_pop: the popped element size is known; no scan needed for either the shrink or the removal path
zrem, hdel, srem: capture the byte cost of each actually-removed entry while iterating, eliminating the rescan in cleanup_after_remove()

adds MemoryTracker::grow_by() / shrink_by() for callers that know their exact delta.

what was tested

cargo test -p emberkv-core: 344 passed
cargo test -p ember-server: 105 passed
cargo test -p ember-cluster: 105 passed
existing sorted set tests (rank ordering, equal scores, score updates, range queries) all pass with the new Vec-based implementation

design considerations

the Vec-based sorted set is a good fit for the insert-heavy but read-many access pattern common in leaderboards: inserts pay O(n) shifting, but rank and range queries (which are often more frequent in read-heavy workloads) pay O(log n) and O(k) respectively. for workloads that are pure ZADD-heavy with rare ZRANK, the BTreeMap would be faster — this is worth revisiting if profiling shows otherwise.

the previous implementation used a BTreeMap for score-ordered storage, making rank() O(n) — it counted elements with range(..).count() which walks every node up to the target. for a leaderboard with 100k members, ZRANK required 100k comparisons. the new implementation uses a sorted Vec instead: - rank() is now O(log n) via binary_search_by - range_by_rank() is a simple slice instead of skip()+take() over a btree - iteration is more cache-friendly (contiguous memory vs pointer-chasing) - each member costs ~24 bytes for the Vec slot vs ~64 for a BTreeMap node the tradeoff is O(n) insert/remove (Vec shifting), but Vec's memmove is cache-friendly and faster than BTreeMap's O(log n) with high constant factor for typical sorted set sizes. also adds data_bytes: usize to cache the sum of member string lengths, making memory_usage() O(1) rather than iterating all members on every call.

every list push or pop previously called entry_size() twice (before and after the mutation), each of which iterates all elements in the VecDeque to sum their lengths. for a list with 10k elements, a single LPUSH triggered 20k iterations just for memory accounting. the fix is to use the element delta that is already computed: - list_push: element_increase is precomputed for reserve_memory; apply it directly with memory.grow_by() instead of calling track_size() - list_pop: the popped element's size is known; apply it directly with memory.shrink_by() for non-empty lists, or compute exact old_size from constants for the empty-list removal path — no list scan at either step - zrem, hdel, srem: capture the byte cost of each actually-removed member/field while iterating, then pass it to cleanup_after_remove() so the second O(n) rescan is eliminated adds MemoryTracker::grow_by() and shrink_by() for callers that know their exact delta. cleanup_after_remove() now takes removed_bytes instead of rescanning the remaining collection. sorted set mutations were already improved by the SortedSet rewrite (memory_usage() is now O(1) there).

kacy added 2 commits February 19, 2026 18:38

kacy merged commit aa03441 into main Feb 19, 2026
4 of 7 checks passed

kacy deleted the perf/data-structure-improvements branch February 19, 2026 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: O(log n) sorted set rank and incremental memory tracking#224

perf: O(log n) sorted set rank and incremental memory tracking#224
kacy merged 2 commits intomainfrom
perf/data-structure-improvements

kacy commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kacy commented Feb 19, 2026

summary

what was tested

design considerations

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant