Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor[python]: Dispatch Series namespace methods to Expr using a decorator #4423

Merged
merged 26 commits into from
Aug 26, 2022

Conversation

stinodego
Copy link
Member

@stinodego stinodego commented Aug 15, 2022

Relates to #4422

Changes:

  • Added a decorator for dispatching Series methods to the Expr equivalent.
  • Created a module series.utils to house the decorator. Moved the get_ffi_func here as well.
  • Applied the decorator to all Series namespace methods. Only a handful of methods did not have a directly equivalent expression.

I like that it's now very explicit that these methods do not implement any fancy - they only dispatch to another implementation.

If you like this approach, I will try to apply this to the Series non-namespace methods next, and then see if I can do something similar for DataFrame/LazyFrame.

@github-actions github-actions bot added the python Related to Python Polars label Aug 15, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 15, 2022

Codecov Report

Merging #4423 (8a66cc2) into master (98a6121) will decrease coverage by 0.00%.
The diff coverage is 97.81%.

@@            Coverage Diff             @@
##           master    #4423      +/-   ##
==========================================
- Coverage   79.36%   79.35%   -0.01%     
==========================================
  Files         494      495       +1     
  Lines       78075    77882     -193     
==========================================
- Hits        61962    61802     -160     
+ Misses      16113    16080      -33     
Impacted Files Coverage Δ
py-polars/src/series.rs 77.31% <ø> (-2.48%) ⬇️
py-polars/polars/internals/series/categorical.py 90.90% <83.33%> (+15.90%) ⬆️
py-polars/polars/internals/series/series.py 93.00% <91.66%> (-0.26%) ⬇️
py-polars/polars/internals/series/utils.py 98.30% <98.30%> (ø)
py-polars/polars/internals/expr/categorical.py 100.00% <100.00%> (ø)
py-polars/polars/internals/expr/datetime.py 95.89% <100.00%> (+0.11%) ⬆️
py-polars/polars/internals/expr/list.py 100.00% <100.00%> (ø)
py-polars/polars/internals/expr/meta.py 100.00% <100.00%> (ø)
py-polars/polars/internals/expr/string.py 100.00% <100.00%> (+2.56%) ⬆️
py-polars/polars/internals/expr/struct.py 100.00% <100.00%> (ø)
... and 25 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@ritchie46
Copy link
Member

Now that is elegant! <3

@stinodego
Copy link
Member Author

Now that is elegant! <3

I know, right! I was very pleased when I put this together 😄

I'll give some thought on how we can define the decorator nicely so that it's reusable across modules, then I'll ask for a review.

@stinodego stinodego changed the title PoC: Dispatching to Expr using a decorator refactor[python]: Dispatch Series namespace methods to Expr using a decorator Aug 15, 2022
@stinodego stinodego marked this pull request as ready for review August 15, 2022 15:15
@stinodego stinodego marked this pull request as draft August 15, 2022 19:16
@stinodego stinodego marked this pull request as ready for review August 15, 2022 19:42
Copy link
Collaborator

@zundertj zundertj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@stinodego
Copy link
Member Author

Did a small update to use a private class variable for the namespace accessor instead of a property; like zundertj originally suggested. I think that makes more sense.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 16, 2022

Clever :)

There is one unfortunate downside - in its current form the decorator eats all of the wrapped-function docstrings, so the nice tooltip help & inline documentation gets replaced like this:

...or...

>>> help( df['strcol'].str.contains )

# Help on method wrapper in module polars.internals.series.utils:
#  wrapper(*args: 'P.args', **kwargs: 'P.kwargs') -> 'pli.Series' 
#  method of polars.internals.series.string.StringNameSpace instance

Maybe wrapt, or one of the other more powerful decorator-util libraries has a solution?

@stinodego
Copy link
Member Author

There is one unfortunate downside - in its current form the decorator eats all of the wrapped-function docstrings, so all the nice tooltip help & inline documentation gets replaced like this:

Good catch - let me look into that.

@matteosantama
Copy link
Contributor

@alexander-beedie @stinodego this should fix it

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 16, 2022

@alexander-beedie @stinodego this should fix it

You'd think so (I did), but my suspicion is that it might not, fully... I had a similar scenario at work a few weeks ago (which is why I thought the same thing might happen here), and ended up working around it by just not decorating until I could come back to it in more detail, heh. I was using functools.wraps, but it wasn't sufficient. Will have to take another look now that it has turned up in two places :)

[update]

right, so when adding functools.wraps the tooltip and IDE help/inline docs go from detailing the wrapper function to being blank instead, but at least an explicit call to help returns the right thing... so, gets halfway there 💭

@stinodego
Copy link
Member Author

For reference, in the current form (without @wraps), VSCode shows the correct info just fine.

image

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 16, 2022

For reference, in the current form (without @wraps), VSCode shows the correct info just fine.

Looks like VSCode is doing a better job than PyCharm in this situation ;)

@stinodego
Copy link
Member Author

Looks like VSCode is doing a better job than PyCharm in this situation ;)

Or a worse job, and it's not recognizing the decorator 😄

@wraps definitely fixes the __name__ and __docs__ attribute, so that's nice.

Have you tried plugging the example from the @wraps docs into Pycharm? What happens then? Seems like a shortcoming of Pycharm then, in this case.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 16, 2022

Have you tried plugging the example from the @wraps docs into Pycharm? What happens then? Seems like a shortcoming of PyCharm then, in this case.

Different results depending on whether you're writing code in the editor or are in the interactive console.

Adding @wraps(func) to def wrapper gets you the following:

  • In the editor - you get tooltip & quick docs help (with correct params) BUT the autocomplete / function signature remains messed-up (shows ParamSpec, as above).

  • In the interactive console you get the tooltip & quick docs back BUT without correct params (they show as *args and **kwargs), like so ...

    ... and without proper function autocomplete (also shows as *args and **kwargs):

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 16, 2022

Argh. Looks there is a long-term known issue in PyCharm about it not properly following @functools.wraps :(

https://youtrack.jetbrains.com/issue/PY-23067

@stinodego
Copy link
Member Author

stinodego commented Aug 16, 2022

I meant the functools doc example; not the decorator I wrote for polars. If the example from the functool docs doesn't work in Pycharm, that's a Pycharm problem, not a decorator problem. (I am not a Pycharm user so can't check)

EDIT:

Argh. Looks there is a long-term known issue in PyCharm about it not properly following @functool.wraps :(

https://youtrack.jetbrains.com/issue/PY-23067

Right, that's what I was suspecting!

So then the question becomes, how bad is it that Pycharm users don't get the docs support, and is the code quality improvement in this PR worth it?

Quite unfortunate that from a sleek improvement with only pluses that we're now having to weigh downsides due to a bug in a popular editor 😞

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 16, 2022

So then the question becomes, how bad is it that Pycharm users don't get the docs support, and is the code quality improvement in this PR worth it?

The docs are mostly ok (eg: correct, aside from the generic *args, **kwargs param signature) with functools.wraps in place - it's losing the autocomplete that's probably more painful (though looking up the params in the source code or online each time isn't going to be awesome) :(

Wondering if wrapt or one of the other decorator libraries might help. I'll experiment tomorrow if I can; might be some way to rework it that satisfies all of the editors...

@matteosantama
Copy link
Contributor

matteosantama commented Aug 16, 2022

@alexander-beedie what version of PyCharm are you on? I'm seeing something different

Screen Shot 2022-08-16 at 7 20 36 PM

EDIT: I'm on 2022.2

Copy link
Member Author

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the improvements you made, @alexander-beedie ! I rebased and added a commit to make type checking in Python 3.7 work properly.

The class decorator makes a lot of sense. I was already pondering this, and you went ahead and got it done 👍

I do worry a bit that the result feels a bit 'black magick-y'; there's some stuff in there that I haven't seen before in the 12+ years I've been programming Python 😄 that could be hard for newbies to read and be error prone. Maybe we should include a few tests specifically for this decorator? And maybe we could include a bit more explanation in the docstring.

Otherwise, I think the implementation makes a lot of sense. I left a few small comments about the _is_empty_method check.

I am still left wondering why PyCharm accepts this form of decorator but not the direct method decorator. Did you learn more about the nitty gritty here?

py-polars/polars/internals/series/utils.py Outdated Show resolved Hide resolved
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 24, 2022

I am still left wondering why PyCharm accepts this form of decorator but not the direct method decorator. Did you learn more about the nitty gritty here?

The class decorator works because static analysis (mypy/editor/etc) does not know what that class decorator is going to do; so, static analysis sees the methods exactly as they are written in the code (undecorated - until runtime, when the class decorator automagically applies the method-level dispatch decorator the first time the class is imported). And then, at runtime, the additional __signature__ injection in the decorator seems to help PyCharm finish the job (it gets almost all the way there without it, but that was the missing piece of the puzzle to get correct autocomplete behavior).

As for why PyCharm can't/doesn't follow the decorator properly and proxy the decorator's __wrapped__ property/function signature (as VSCode does) when you use @functools.wraps, I cannot say - let's just hope they eventually fix it :)

@alexander-beedie
Copy link
Collaborator

FYI: slightly noisy test timing for the parametric tests triggered the previous checkin error; it's not a real error, so I just slightly increased the deadline for those (I added/wrote all the parametric testing stuff, so this is ok ;)

@stinodego
Copy link
Member Author

stinodego commented Aug 24, 2022

I think this PR is good to go now!

Very curious what @ritchie46 thinks of all this!

@ritchie46
Copy link
Member

ritchie46 commented Aug 25, 2022

This all looks very interesting and it seems really clean to be able to dispatch this all with a decorator. Thanks for all the help op this. I only do want to be certain that we do not break autocomplete/ parameter hinting in:

  • vscode
  • pycharm/jetbrains
  • jupyter notebooks/ ipython

As these are most used by our users and are really important to ergonomics. I am checking the notebooks and jetrbains here atm.

I checked the notebooks, that seems to work fine. 👍

from typing import TYPE_CHECKING

import polars.internals as pli

if TYPE_CHECKING:
from polars.internals.type_aliases import CategoricalOrdering

if sys.version_info >= (3, 8):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do this one level higher? So that every module below import Final and this branch with checking the sys.version can only be done once. I see we do this check in 9 files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea of re-exporting typing imports, but if this bothers you we can simply get rid of the Final designation on the accessor variables. Then they will just be strings. I'll add a commit for this.

Comment on lines +82 to +90
@wraps(func) # type: ignore[arg-type]
def wrapper(self: Any, *args: P.args, **kwargs: P.kwargs) -> pli.Series:
s = pli.wrap_s(self._s)
expr = pli.col(s.name)
namespace = getattr(self, "_accessor", None)
if namespace is not None:
expr = getattr(expr, namespace)
f = getattr(expr, func.__name__)
return s.to_frame().select(f(*args, **kwargs)).to_series()
Copy link
Contributor

@JakobGM JakobGM Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could possibly be altered in order to preserve the class of self in the case of end-users sub-classing Series. The main change is to use self._from_pyseries rather than pli.wrap_s and wrap the final series produced by DataFrame.to_series with self._from_pyseries(X._s).

Suggested change
@wraps(func) # type: ignore[arg-type]
def wrapper(self: Any, *args: P.args, **kwargs: P.kwargs) -> pli.Series:
s = pli.wrap_s(self._s)
expr = pli.col(s.name)
namespace = getattr(self, "_accessor", None)
if namespace is not None:
expr = getattr(expr, namespace)
f = getattr(expr, func.__name__)
return s.to_frame().select(f(*args, **kwargs)).to_series()
@wraps(func) # type: ignore[arg-type]
def wrapper(self: Any, *args: P.args, **kwargs: P.kwargs) -> pli.Series:
s = self._from_pyseries(self._s)
expr = pli.col(s.name)
namespace = getattr(self, "_accessor", None)
if namespace is not None:
expr = getattr(expr, namespace)
f = getattr(expr, func.__name__)
return self._from_pyseries(
s.to_frame().select(f(*args, **kwargs)).to_series()._s
)

In that case perhaps wrapper's return type could (should?) be annotated with typing.Self (python 3.11+) or typing_extensions.Self (python <= 3.10). What do you think?

Copy link
Collaborator

@alexander-beedie alexander-beedie Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Now you've drawn my attention to it, I notice that _from_pyseries isn't actually accessible from the *NameSpace classes, resulting in the following exception:

class XSeries(pl.Series):
    """Custom Series class"""

s = XSeries("s", ["x!", "y!", "z?"])
res = s.str.contains("?", literal=True)

# AttributeError: 'StringNameSpace' object has no attribute '_from_pyseries'

Further... it doesn't actually seem like _from_pyseries works for a large number of existing (non-decorated) functions anyway? I may be misunderstanding the scope, but it seems the intent is to preserve the user-defined class through Series operations in the general case (which certainly sounds desirable).

class XSeries(pl.Series):
    """Custom Series class"""

s = XSeries("s", [1, -2, 3])

# for example, all of the following return Series instead of XSeries
s.shrink_to_fit()
s.reinterpret()
s.hash()
s.abs()

# ...and so on

Copy link
Collaborator

@alexander-beedie alexander-beedie Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made a small Series repr PR that could help show this a bit more clearly, by including the custom series class name: #4571

Copy link
Member Author

@stinodego stinodego Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case perhaps wrapper's return type could (should?) be annotated with typing.Self (python 3.11+) or typing_extensions.Self (python <= 3.10). What do you think?

Unfortunately, mypy doesn't support the Self type yet: python/mypy#11871

Also, the self in this wrapper function is usually a NameSpace, not a Series. And neither namespaces or Series methods have been set up to preserve type (all these methods have a return type annotation of Series).

So I propose we save this improvement for a future PR. This PR is already big enough! Maybe you could make a separate issue for this?

Copy link
Collaborator

@alexander-beedie alexander-beedie Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I propose we save this improvement for a future PR. This PR is already big enough! Maybe you could make a separate issue for this?

My thoughts exactly; it doesn't seem to work at the moment anyway, so no real advantage holding off on this PR. I don't mind volunteering for the follow-up (I've already got a reasonable idea how it can be done with minimal changes) ;)

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 25, 2022

This all looks very interesting and it seems really clean to be able to dispatch this all with a decorator. Thanks for all the help op this. I only do want to be certain that we do not break autocomplete/ parameter hinting in:

  • vscode
  • pycharm/jetbrains
  • jupyter notebooks/ ipython

I can confirm that PyCharm/DataGrip (Jetbrains) are in good shape, along with .ipynb running under PyCharm.

@stinodego
Copy link
Member Author

This all looks very interesting and it seems really clean to be able to dispatch this all with a decorator. Thanks for all the help op this. I only do want to be certain that we do not break autocomplete/ parameter hinting in:

  • vscode
  • pycharm/jetbrains
  • jupyter notebooks/ ipython

I can confirm that PyCharm/DataGrip (Jetbrains) are in good shape, along with .ipynb running under PyCharm.

Everything looks good in VSCode, as far as I can tell!

@ritchie46
Copy link
Member

This all looks very interesting and it seems really clean to be able to dispatch this all with a decorator. Thanks for all the help op this. I only do want to be certain that we do not break autocomplete/ parameter hinting in:

  • vscode
  • pycharm/jetbrains
  • jupyter notebooks/ ipython

I can confirm that PyCharm/DataGrip (Jetbrains) are in good shape, along with .ipynb running under PyCharm.

Everything looks good in VSCode, as far as I can tell!

Alright good to hear. Then hear we go :shipit: :)

@ritchie46 ritchie46 merged commit a77665a into pola-rs:master Aug 26, 2022
@stinodego stinodego deleted the dispatch-to-expr branch August 30, 2022 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants