-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): additional multi-column support for pl.<function>
entries
#13336
feat(python): additional multi-column support for pl.<function>
entries
#13336
Conversation
pl.<func>
functionspl.<function>
entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been meaning to add this support, so having this is nice!
With regards to the implementation: we already have the parse_as_expression
/ parse_as_list_of_expressions
utils. Those should be utilized here.
Nope; there's a reason I did this way - it keeps the return type and usage completely seamless, while still expanding it to handle multiple columns (including bare Using Before and after this PR, all the enhanced functions continue to return a ✅ can chain Returning a list of expressions instead results in... ❌ cannot chain ...which would make these functions dramatically less seamless/useful; if they're going to return a list of expressions, you may as well just write and use those expressions directly instead of using these convenience methods. We would need a new first-class primitive (something like Footnotes |
Right, that makes a lot of sense. Using However, between the existing expression parsing utils, the column selectors, and this new parsing utility, it feels like our input parsing is becoming way too disjointed. I don't really feel comfortable adding more complexity before we address some of this technical debt. I also really dislike allowing some expressions but not others. That goes against the idea of expressions being composable. To continue improvements to input parsing I think addressing #10594 would be most important as well as #12262 for this specific case. |
If we want to keep it as a pure extension of what is currently there then we can omit the "added value" of supporting I'm completely fine with that; supporting |
I'm all in favor of accepting multiple columns where possible - indeed let's just take strings for now, and we'll revisit when we have a better way to bundle expressions. |
5952d03
to
d7a675c
Compare
Done ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we accept that people should not use keyword arguments in these shortcut functions, the whole thing can be a lot simpler, I believe. Curious what you think.
The two-arg approach matches what we have in |
fde1485
to
6979a96
Compare
6979a96
to
fbcfad6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - this is going to be very useful!
All of the following
pl.<function>
entries have been extended with support for multiple column names (and remain backwards compatible with existing single-column calling syntax, so there is no breakage):approx_n_unique
,count
first
implode
last
mean
median
n_unique
In addition, all functions in the updated module (and vertical aggregations) can now take bare
pl.col("name")
expressions as well as"name"
strings, bringing them a little closer to more generic functions such aspl.all_horizontal
.Added new docstring examples for each, and extended test coverage accordingly.
Examples
Before:
After: