Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Add syntactic sugar for col("foo") -> col.foo #10874

Merged
merged 5 commits into from Sep 4, 2023

Conversation

stinodego
Copy link
Member

@stinodego stinodego commented Sep 2, 2023

Closes #10866, closes #7398

Changes

  • Implement col as a factory object rather than a function. It supports col.foo as an alias for col("foo").

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Sep 2, 2023
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Sep 2, 2023

@alexander-beedie Is this how you would implement this functionality? I don't really see a way to satisfy mypy here without making col subclass Expr, but I think that will have some bad side effects 🤔

Not quite; no need for the metaclass, and we can pacify the type-checkers without needing Expr subclassing. I'll update this with a tweak that gets rid of all the mypy errors; one sec... ;)

Screenshot 2023-09-02 at 18 10 04

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Sep 2, 2023

Done; modified the implementation and made the concise c and cs shortcuts available too:

from polars import c, cs

c.colx
# col("colx")

cs.by_name("colx", "coly")
# cs.by_name('colx', 'coly')

@ritchie46
Copy link
Member

I am not really sure a about pre-determining the shortcuts for users. Especially single char c is something will collide a lot.

I think we should just document our idiomatic important names.

@stinodego
Copy link
Member Author

stinodego commented Sep 2, 2023

modified the implementation

Thanks! I missed the idea of instantiating the object and exporting that instead. That makes it a lot cleaner indeed.

I noticed the _Expr_ object wasn't necessary, so I removed it. And I'm also not sure about exporting c and cs directly. I'll leave it out of this PR for now.

I'm also wondering what your intention is with naming the helper class _column_? Default Python style Column seems fully appropriate here. It will be exposed to the user in the end, though they will not instantiate it themselves.

I switched __getattr__ to __getattribute__ as we are certain there are no other things to look up on this class. That will make sure stuff like col.__repr__ will actually fetch the column named "__repr__".

Now, to figure out how to make Sphinx happy 😅

Idiomatic usage

We should think about what we think idiomatic usage should look like here, and what we want to use in our own docs.
I'd like to adhere to a single idiomatic style ourselves. Do we just keep using pl.col("foo") in all our docstrings / book examples? I added a comment to the linked issue to discuss this.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Sep 3, 2023

I think we should just document our idiomatic important names.

Yup, makes sense - c turns up too much to just claim it as our own.

I noticed the _Expr_ object wasn't necessary

Oh, nice - in my first iteration mypy complained hundreds of times about col not being callable, so I introduced the shadow Expr; must have fixed the underlying issue in the second iteration without noticing :)

I'm also wondering what your intention is with naming the helper class column?

Pure habit; anything I don't want the users instantiating themselves I generally name such that they can't claim they didn't know, haha. No objections to changing it to something else, though I'd still probably have "col" in an explicit __all__ 🤷

I switched getattr to getattribute

😎👍

@ritchie46
Copy link
Member

We should think about what we think idiomatic usage should look like here, and what we want to use in our own docs.
I'd like to adhere to a single idiomatic style ourselves. Do we just keep using pl.col("foo") in all our docstrings / book examples? I added a comment to the linked issue to discuss this.

I want to strongly push for keeping that. It is the most complete. It allows for all possible names in a DataFrame. For instance spaces and - are pretty common in header names.

@stinodego
Copy link
Member Author

I want to strongly push for keeping that.

Great, then we're in agreement!

I'd still probably have "col" in an explicit __all__

Yeah on second thought I agree - added it back!

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Sep 3, 2023

Hmm, just tested the latest updates and the move to use __getattribute__ may be a bit too aggressive; it blows-up a lot of typical object-introspection interaction from IDEs; for instance, if you type col and hit . in PyCharm it will immediately throw an error stack at you:

Screenshot 2023-09-03 at 17 05 10

I can imagine Notebooks are going to probe objects/variables in similar ways, as is VSCode, and so on.

At the bare minimum we'd need to special-case and pass-through __dict__, __name__, and __class__, but likely more on a per-IDE/use-case basis. We'd probably be better off moving back to __getattr__ and accepting that some dunder methods are just not going to make good column names 😅

@stinodego
Copy link
Member Author

stinodego commented Sep 4, 2023

Yeah, considering this is not intended as a new idiomatic way of doing things, but rather for quick prototyping that's not intended for supporting every single use case, getattr is probably fine.

Meanwhile, I am down the rabbithole of getting things to look nice in Sphinx... 🙃

EDIT: Figured it out, just making things look nice now :)

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Sep 4, 2023

World's smallest nitpick - rename Column as ColumnFactory?
It creates columns, but definitely isn't one itself.

@stinodego
Copy link
Member Author

It creates columns, but definitely isn't one itself.

That's true - renamed to ColumnFactory!

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks guys!

@ritchie46 ritchie46 merged commit fda2307 into main Sep 4, 2023
13 checks passed
@ritchie46 ritchie46 deleted the column-class branch September 4, 2023 15:58
@stinodego stinodego added the highlight Highlight this PR in the changelog label Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature highlight Highlight this PR in the changelog python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dispatch attributes on col as col("foo") Add syntax sugar over pl.col() using symbol parsing
3 participants