Quadratic behavior in topic_table::notify_waiters #7665
Labels
area/controller
kind/bug
Something isn't working
performance
sev/medium
Bugs that do not meet criteria for high or critical, but are more severe than low.
Version & Environment
Redpanda version: 22.2.2
(however it applies to tip of dev as of 22/12/08 also)
What went wrong?
Reactor stalls, high CPU use and slow performance when topics with many controller deltas are recovered.
What should have happened instead?
None of the above.
How to reproduce the issue?
Additional information
The primary hint we have about the behavior is from the following reactor stall backtrace:
The time is being spent inside the
copy_if
here:Consider what happens when recovering a large topic with many deltas and when
_waiters
is empty. The deltas are applied from the log one-by-one andnotify_waiters
is called after each addition. Then thecopy_if
will traverse the entire pending deltas every time, so we have O(n^2) behavior in the number of deltas.We could avoid this by looking for first delta where
d.offset > _last_consumed_by_notifier
, searching in reverse in_pending_deltas
, reducing the complexity to O(n) overall.Another optimization is skipping this work entirely if both
_notifiers
and_waiters
are empty, though I don't know if this is the case during replay.The text was updated successfully, but these errors were encountered: