-
-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RollingQuantile
performance with large windows
#906
Comments
Hi @fox-ds, this is indeed problematic from a computational point of view. However, there is not much we can do currently. IMHO, the main problem is that although multiple quantile estimators enable adding new points, removing old data is problematic. For instance, the P2 algorithm, implemented under In the end, Edit: as @AdilZouitine pointed out, the major bottleneck comes from handling the updates in the sorted list. Even though the element to delete can be found efficiently using binary search (and that already gives us a huge boost in time), list element removal is |
Hi @fox-ds, Context: from river.stats import RollingQuantile
from tqdm import tqdm
q = 0.997
rolling_q = RollingQuantile(window_size=500_000, q=q) Under the hood when the sliding window is not full, there is no performance problem. for x in tqdm(range(500_000)):
rolling_q.update(x)
However, when the sliding window is full, the time to remove an item in the sorted list is high, curious isn't it? for x in tqdm(range(100_000)):
rolling_q.update(x)
One of the culprits is there! river/river/utils/sorted_window.py Line 49 in c0c88d3
Self is a list so the average complexity for deleting data is O(n). The algorithm is quite naive: So I coded a fix to replace the linear search with a binary search and below you will have some benchmarks. import bisect
import collections
class SortedWindowFix(collections.UserList):
def __init__(self, size: int):
super().__init__()
self.unsorted_window = collections.deque(maxlen=size)
@property
def size(self):
return self.unsorted_window.maxlen
def append(self, x):
if len(self) >= self.size:
start_deque = bisect.bisect_left(self, self.unsorted_window[0])
del self[start_deque]
bisect.insort_left(self, x)
self.unsorted_window.append(x)
return self class RollingQuantileFix(RollingQuantile):
def __init__(self, q, window_size):
self.window = SortedWindowFix(size=window_size)
self.q = q
idx = self.q * (self.window.size - 1)
self._lower = int(math.floor(idx))
self._higher = self._lower + 1
if self._higher > self.window.size - 1:
self._higher = self.window.size - 1
self._frac = idx - self._lower rqf = RollingQuantileFix(window_size=500_000, q=q) for x in tqdm(range(500_000)):
rqf.update(x)
for x in tqdm(range(100_000)):
rqf.update(x)
Tadaaaa x3 speed boost ! If the sorted binary tree is better I will make another comment |
Hey @AdilZouitine, would it be possible to keep a full-size binary tree in memory, even if the window is not full? Then instead of removing elements, we could mark them for removal. The tree would need to be rebalanced/rebuilt from time to time, which might become a problem. I am not so sure about the rebuild/rebalancing frequency and whether or not this would become an extra parameter (which is not a good thing, IMHO), but we could think about something like that for the future. |
Thanks for the suggestions all, the x3 speed boost already helps a lot! @smastelini @AdilZouitine I think that indeed keeping a binary tree in memory would be the ideal solution, perhaps we could use the |
I found this blog post which is similar -- it's based on heaps. But it looks complex, so someone needs to dive into it. |
Thanks for the link @MaxHalford. The past week I worked on what at first glance looks like a similar solution - using two heaps that are 'balanced' around the quantile that you want to estimate. I managed to get it up and running really efficiently with |
Interesting @fox-ds! Do you have some comparison figures to share? |
Not yet, but I want to continue with it this week and probably share some code/figures once I'm satisfied with the results. |
Hey @fox-ds and @MaxHalford, I've read both blog posts and I really like those ideias! The more I learn, the more I realize I know nothing about anything. The reference C# implementation @MaxHalford sent also generalizes to arbitrary quantiles :D |
I did continue working on this a few weeks ago, but still need some more time to finalize the results |
We've just released version 0.13 which includes faster implementations of quantile algorithms, thanks to @AdilZouitine. I think we can now close this issue. |
Nice!!! |
Versions
river
version: 0.10.1Python version: 3.10.2
Describe your task
Keeping track of a rolling quantile over a large window. Once the RollingQuantile reaches the windows size, it removes the oldest value and appends the new value. For 'small' windows ( ~ < 100k ) this is no problem and the code runs really fast, but with larger window sizes (say, 500k) once the window is full, there is a significant performance drop when updating the
RollingQuantile
. I think the issue is primarily in the underlyingutils.SortedWindow
.Furthermore, though the
RollingQuantile
is indeed a streaming algorithm, but there may be some downsides on the memory and computational complexity side when handling large windows (O(n)
). Because the entire window is kept in memory and being used to calculate the quantile, it's really accurate. However, maybe we could trade in some of this accuracy for performance, e.g. by estimating the quantile just like the regularQuantile
class, but then windowed (although I currently don't know of a specific algorithm that solves this issue, there are a few papers out there though).What kind of performance are you expecting?
Until the window is full: 347195 iterations/s
Once the window is full: 279 iterations/s
Steps/code to reproduce
The text was updated successfully, but these errors were encountered: