-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust!): separate rolling_*_by
from rolling_*(..., by=...)
in Rust
#16102
feat(rust!): separate rolling_*_by
from rolling_*(..., by=...)
in Rust
#16102
Conversation
msg = "in `rolling_min` operation, `by` argument of dtype `i64` is not supported" | ||
msg = r"in `rolling_\*_by` operation, `by` argument of dtype `i64` is not supported" | ||
with pytest.raises(InvalidOperationError, match=msg): | ||
df.select(pl.col("b").rolling_min_by("a", 2)) # type: ignore[arg-type] | ||
df.select(pl.col("b").rolling_min_by("a", "2i")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that rolling_min_by
is split from rolling_min
, it no longer accepts integers as input for window_size
, and raises a different error
I've slightly modified this one to keep the spirit of the existing test (i.e. asserting that the by
column can't be of integer dtype)
msg = "if `by` argument is passed, then `window_size` must be a temporal window" | ||
msg = "`window_size` duration may not be a parsed integer" | ||
with pytest.raises(InvalidOperationError, match=msg): | ||
df.select(pl.col("a").rolling_min_by("b", 2)) # type: ignore[arg-type] | ||
df.select(pl.col("a").rolling_min_by("b", "2i")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
msg = "if `by` argument is passed, then `window_size` must be a temporal window" | ||
msg = "`window_size` duration may not be a parsed integer" | ||
with pytest.raises(InvalidOperationError, match=msg): | ||
df.with_columns(pl.col("a").rolling_sum_by("b", 2, closed="left")) # type: ignore[arg-type] | ||
df.with_columns(pl.col("a").rolling_sum_by("b", "2i", closed="left")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too
710b1a9
to
588e2b4
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16102 +/- ##
==========================================
+ Coverage 80.95% 80.96% +0.01%
==========================================
Files 1386 1387 +1
Lines 178423 178629 +206
Branches 2881 2886 +5
==========================================
+ Hits 144437 144632 +195
- Misses 33493 33503 +10
- Partials 493 494 +1 ☔ View full report in Codecov by Sentry. |
#[cfg(feature = "rolling_window")] | ||
#[cfg(feature = "rolling_window_by")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getting this to work was oddly satisfying 😌
by: str, | ||
by: IntoExpr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the great expressification continues!
aaf9e88
to
a7f9d48
Compare
CodSpeed Performance ReportMerging #16102 will not alter performanceComparing Summary
|
a7f9d48
to
01d74de
Compare
rolling_*_by
from rolling_*(..., by=...)
in Rustrolling_*_by
from rolling_*(..., by=...)
in Rust
So only moving around, no execution logic touched? |
the execution logic is untouched, that's right |
Great! Thanks. That's quite a churn. ^^ |
Dumb question -- I definitely see the benefits of splitting Also documentation on |
i'll update the docs, thanks |
This is quite big. But it's quite rewarding, and will definitely be a better user experience. It's just quite hard to review.
So, summary of changes:
rolling_mean_by
now callsself._pyexpr.rolling_mean_by
, instead ofself._pyexpr.rolling_mean
. There's a thin validation layer before getting to the Rust function, to make sure that deprecations/errors are shownPyExpr.rolling_mean
andPyExr.rolling_mean_by
now each only have the arguments which are relevant to them (i.e. the former doesn't haveclosed
, the latter doesn't haveweights
)RollingOptions
andRollingOptionsImpl
are gone 🔥 There's no just two such struct:RollingOptionsFixedWindow
RollingOptionsDynamicWindow
convert
fromcrates/polars-plan/src/dsl/function_expr/rolling.rs
is gonerolling_*(..., by=....)
would take an integer window size, convert it to string, parse it asDuration
, and then get the window size out again fromwindow_size.duration_ns()
. All those gymnastics can be skipped: the user passes in an integer, and it stays as integer, no need to go viaDuration
rolling_window
androlling_window_by
features are now separateAs a bonus,
rolling_*_by
now accepts expressions forby
!The rest is just moving things around
Things that I'd like to do as follow-ups:
rolling_window
fromrolling_window_by
features - I think ideally the former should have no knowledge ofpolars-time
at allrolling_*_by
. I see no theoretical reason why not, so...I'll work on this_by
ones