Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of rolling_median() #12609

Closed
etiennebacher opened this issue Nov 21, 2023 · 15 comments · Fixed by #12704
Closed

Performance of rolling_median() #12609

etiennebacher opened this issue Nov 21, 2023 · 15 comments · Fixed by #12704
Assignees
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature performance Performance issues or improvements

Comments

@etiennebacher
Copy link

etiennebacher commented Nov 21, 2023

Description

An important contributor to data.table made a few benchmarks for rolling functions (only mean and median so far): https://github.com/jangorecki/rollbench

While the performance of rolling mean seems acceptable (still 2x slower than data.table though), the performance of rolling median degrades very quickly as the number of observations and/or window size increases. Note that r-polars is used (with bindings to rust-polars 0.35), not py-polars, but I got very similar timings with py-polars. The data.table implementation is not merged yet but can be found in this PR: Rdatatable/data.table#5692

The author links to the paper that implements the "sort-median" algorithm: https://arxiv.org/abs/1406.1717

I have no clue how the rolling median is implemented in polars, but I guess there's room for improvements here.


Here's what I have with py-polars 0.19.15:

import sys
import time
import numpy as np
import polars as pl

np.random.seed(108)


def test_rolling_median(N, W):
    for n in N:
        n = int(n)
        x = pl.DataFrame({"x": np.random.normal(0, 1, n)})
        for w in W:
            w = int(w)
            t = time.time()
            ans = x.select(pl.col("x").rolling_median(window_size=w))
            t = time.time() - t
            sys.stdout.write("polars,median,single,{},{},{:.3f}\n".format(n, w, t))
            sys.stdout.flush()


test_rolling_median(N=[1e6, 1e7, 1e8], W=[1e2, 1e3, 1e4])
polars,median,single,1000000,100,0.146
polars,median,single,1000000,1000,0.197
polars,median,single,1000000,10000,9.717
polars,median,single,10000000,100,1.013
polars,median,single,10000000,1000,2.120
polars,median,single,10000000,10000,113.191
polars,median,single,100000000,100,11.841
polars,median,single,100000000,1000,24.743
polars,median,single,100000000,10000,1148.248
@etiennebacher etiennebacher added the enhancement New feature or an improvement of an existing feature label Nov 21, 2023
@ritchie46
Copy link
Member

@ritchie46
Copy link
Member

@etiennebacher can you take a look?

@etiennebacher
Copy link
Author

etiennebacher commented Nov 22, 2023

Sorry, I'm barely a beginner in Rust, I just got news of these benchmarks because we implemented $rolling() (for Expr only) in r-polars a couple of days ago

@etiennebacher

This comment was marked as outdated.

@stinodego stinodego added performance Performance issues or improvements accepted Ready for implementation labels Nov 23, 2023
@ritchie46
Copy link
Member

ritchie46 commented Nov 24, 2023

Taking a look.

EDIT: didn't find a quick win. Someone/we can take a look at that paper.

@jangorecki
Copy link

Worth to note that polars is by no means slow in this field. Pandas is exceptionally fast for rolling functions, relatively to other stuff in pandas, so being faster than pandas is already pretty good. For example unoptimized base R (on 1e5 input) is way! slower: Rdatatable/data.table#5692 (comment)

@sorhawell
Copy link
Contributor

sorhawell commented Nov 27, 2023

I implemented fast rolling for r-polars and it matches quite closely the data.table::froll_x in speed.
I will try to see if I can translate it into a rust-polars PR.

@ritchie46
Copy link
Member

Thank you @jangorecki. I am certainly not happy about our current state. The first goal was to get the functionality in, but after some looking we see that especially for min, max and median there is still much left on the table.

For the rolling sum/mean I think there is still something left w.r.t. fast paths for the single window increment case. We currently made it generic for large window shifts, but this is not most common. I shall post back here, when we are a bit further.

@jangorecki
Copy link

jangorecki commented Nov 27, 2023

@ritchie46 this article is quite related https://duckdb.org/2021/11/12/moving-holistic.html

For median there are two approaches that scales well, min-max-heap and sort-median.
Former one is implemented in base R, called "Turlach": https://stat.ethz.ch/R-manual/R-devel/library/stats/html/runmed.html The base R one is limited to uneven window size.
Latter one, is very novel, implemented in data.table. I have failed to implemented NA support in it, so for NAs function falls back to much slower (parallel) partial sort.

@sorhawell
Copy link
Contributor

@ritchie46 My simple prototype for fast rolling using vanilla rust-polars iter tools is as fast as data.table and 2x faster than rust-polars impl for the same benchmark. Not sure exactly why. I use .collect_ca_trusted() and perhaps a bit fewer runtime branching.

@orlp
Copy link
Collaborator

orlp commented Nov 30, 2023

@sorhawell I don't see a median implementation in your linked code?

@sorhawell
Copy link
Contributor

sorhawell commented Nov 30, 2023

@sorhawell I don't see a median implementation in your linked code?

No there is none. I was only referring to rolling_sum mean ... etc.

Polars median is very much slower the data.table impl and scales poorly.

My point was, that besides rolling_median polars is also a bit slower (2x) for rolling_mean etc. A vanilla rolling_sum re-impl written with rust-polars is 2x faster then what rust_polars provides itself.

@ritchie46
Copy link
Member

The rolling sum is something we can trivially improve. I think we should just add a fast path in our sum code as we are now generic over different window jumps.

Rolling median is underway. Rollin extramas will also be improved.

@ritchie46
Copy link
Member

@jangorecki nice!.

I am almost finished.
The rolling median in the paper is implemented in #12704. It is generic over any window size, quantile and quantile interpolation and nulls.

I do think that in the future we want a different algorithm for smaller windows sizes.

@etiennebacher
Copy link
Author

FYI the benchmarks have been updated with the latest version of r-polars, which contains the performance improvements of #12704: https://github.com/jangorecki/rollbench

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature performance Performance issues or improvements
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants