New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rust, python)!: Formalize list aggregation difference between groupbys, selection and window functions #6487
Conversation
In the example you give, shouldn't the groupby result in the values of group 2 each part of an individual list?
Also, interesting change and I agree that it is needed. |
Nope, that's not what |
Right, makes sense now that I think about it. Thanks! |
We may need some changes from this version to tackle these issues: - elixir-explorer#498 - elixir-explorer#499 See: pola-rs/polars#6487 And also the release: https://github.com/pola-rs/polars/releases/tag/rs-0.27.0 With this update we also had to change the `Explorer.DataFrame.describe/2` function to adhere the changes from Polars. I changed the backend to simplify, since we shouldn't have a lazy version.
We may need some changes from this version to tackle these issues: - #498 - #499 See: pola-rs/polars#6487 And also the release: https://github.com/pola-rs/polars/releases/tag/rs-0.27.0 With this update we also had to change the `Explorer.DataFrame.describe/2` function to adhere the changes from Polars. I changed the backend to simplify, since we shouldn't have a lazy version.
This is quite a serious breaking change, but one that has to be made. This was a serious inconsistency in the query engine regarding expressions.
Context
Expressions should adhere to the following rules regarding their context.
Within this you should be able to reason what an expression does. E.g. an expression used in
select
should explain to you how it should behave inagg
as well.This was not the case.
A
col().list()
turned ai64
column into alist<i64>
.Based on this, I would assume that an aggregation with
col().list()
turns ai64
in alist<list<i64>>
. The outer list being the groups, the inner list being thelist
aggregation passed by the user. This was not the case.But now it is:
Selection
Groupby
Window functions
For window functions we must have slightly different mental model. If we don't reduce an aggregation in a window function, e.g. bring down to one element via
mean
,sum
,first
etc, we map the groups back to their position in theDataFrame
.For instance a
col().sort().over()
on ai64
returns the samei64
dtype.So window function, though very similar to groupby operations, are always a single level less nested than a groupby result.
A
col().list().over()
will return alist<i64>
where the list is the list aggregation and the groups are processed by putting that aggregation on the approriate location in theDataFrame
.This wraps it up. This likely will break some examples in the polars-book so we will have to look at that as well.
Quite breaking, but for the better. :)
Migration
This is easy. Remove any
.list()
call in agroupby().agg([...])
context.