-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor[python]: Dispatch Series namespace methods to Expr using a decorator #4423
Conversation
a1949cd
to
28f9575
Compare
Codecov Report
@@ Coverage Diff @@
## master #4423 +/- ##
==========================================
- Coverage 79.36% 79.35% -0.01%
==========================================
Files 494 495 +1
Lines 78075 77882 -193
==========================================
- Hits 61962 61802 -160
+ Misses 16113 16080 -33
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Now that is elegant! <3 |
I know, right! I was very pleased when I put this together 😄 I'll give some thought on how we can define the decorator nicely so that it's reusable across modules, then I'll ask for a review. |
f6dfa72
to
6ce8683
Compare
6ce8683
to
8734e4d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Did a small update to use a private class variable for the namespace accessor instead of a property; like zundertj originally suggested. I think that makes more sense. |
Clever :) There is one unfortunate downside - in its current form the decorator eats all of the wrapped-function docstrings, so the nice tooltip help & inline documentation gets replaced like this: ...or... >>> help( df['strcol'].str.contains )
# Help on method wrapper in module polars.internals.series.utils:
# wrapper(*args: 'P.args', **kwargs: 'P.kwargs') -> 'pli.Series'
# method of polars.internals.series.string.StringNameSpace instance Maybe |
Good catch - let me look into that. |
@alexander-beedie @stinodego this should fix it |
You'd think so (I did), but my suspicion is that it might not, fully... I had a similar scenario at work a few weeks ago (which is why I thought the same thing might happen here), and ended up working around it by just not decorating until I could come back to it in more detail, heh. I was using [update] right, so when adding |
Or a worse job, and it's not recognizing the decorator 😄
Have you tried plugging the example from the |
Argh. Looks there is a long-term known issue in PyCharm about it not properly following |
I meant the functools doc example; not the decorator I wrote for polars. If the example from the functool docs doesn't work in Pycharm, that's a Pycharm problem, not a decorator problem. (I am not a Pycharm user so can't check) EDIT:
Right, that's what I was suspecting! So then the question becomes, how bad is it that Pycharm users don't get the docs support, and is the code quality improvement in this PR worth it? Quite unfortunate that from a sleek improvement with only pluses that we're now having to weigh downsides due to a bug in a popular editor 😞 |
The docs are mostly ok (eg: correct, aside from the generic Wondering if |
@alexander-beedie what version of PyCharm are you on? I'm seeing something different EDIT: I'm on 2022.2 |
…on dispatch ("call_expr")
a5f0f54
to
206aa2c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the improvements you made, @alexander-beedie ! I rebased and added a commit to make type checking in Python 3.7 work properly.
The class decorator makes a lot of sense. I was already pondering this, and you went ahead and got it done 👍
I do worry a bit that the result feels a bit 'black magick-y'; there's some stuff in there that I haven't seen before in the 12+ years I've been programming Python 😄 that could be hard for newbies to read and be error prone. Maybe we should include a few tests specifically for this decorator? And maybe we could include a bit more explanation in the docstring.
Otherwise, I think the implementation makes a lot of sense. I left a few small comments about the _is_empty_method
check.
I am still left wondering why PyCharm accepts this form of decorator but not the direct method decorator. Did you learn more about the nitty gritty here?
The class decorator works because static analysis (mypy/editor/etc) does not know what that class decorator is going to do; so, static analysis sees the methods exactly as they are written in the code (undecorated - until runtime, when the class decorator automagically applies the method-level dispatch decorator the first time the class is imported). And then, at runtime, the additional As for why PyCharm can't/doesn't follow the decorator properly and proxy the decorator's |
…onal inline explanations relating to class decorator behaviour
FYI: slightly noisy test timing for the parametric tests triggered the previous checkin error; it's not a real error, so I just slightly increased the deadline for those (I added/wrote all the parametric testing stuff, so this is ok ;) |
I think this PR is good to go now! Very curious what @ritchie46 thinks of all this! |
This all looks very interesting and it seems really clean to be able to dispatch this all with a decorator. Thanks for all the help op this. I only do want to be certain that we do not break autocomplete/ parameter hinting in:
As these are most used by our users and are really important to ergonomics. I am checking the notebooks and jetrbains here atm. I checked the notebooks, that seems to work fine. 👍 |
from typing import TYPE_CHECKING | ||
|
||
import polars.internals as pli | ||
|
||
if TYPE_CHECKING: | ||
from polars.internals.type_aliases import CategoricalOrdering | ||
|
||
if sys.version_info >= (3, 8): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this one level higher? So that every module below import Final
and this branch with checking the sys.version
can only be done once. I see we do this check in 9 files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of re-exporting typing imports, but if this bothers you we can simply get rid of the Final
designation on the accessor variables. Then they will just be strings. I'll add a commit for this.
@wraps(func) # type: ignore[arg-type] | ||
def wrapper(self: Any, *args: P.args, **kwargs: P.kwargs) -> pli.Series: | ||
s = pli.wrap_s(self._s) | ||
expr = pli.col(s.name) | ||
namespace = getattr(self, "_accessor", None) | ||
if namespace is not None: | ||
expr = getattr(expr, namespace) | ||
f = getattr(expr, func.__name__) | ||
return s.to_frame().select(f(*args, **kwargs)).to_series() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could possibly be altered in order to preserve the class of self
in the case of end-users sub-classing Series
. The main change is to use self._from_pyseries
rather than pli.wrap_s
and wrap the final series produced by DataFrame.to_series
with self._from_pyseries(X._s)
.
@wraps(func) # type: ignore[arg-type] | |
def wrapper(self: Any, *args: P.args, **kwargs: P.kwargs) -> pli.Series: | |
s = pli.wrap_s(self._s) | |
expr = pli.col(s.name) | |
namespace = getattr(self, "_accessor", None) | |
if namespace is not None: | |
expr = getattr(expr, namespace) | |
f = getattr(expr, func.__name__) | |
return s.to_frame().select(f(*args, **kwargs)).to_series() | |
@wraps(func) # type: ignore[arg-type] | |
def wrapper(self: Any, *args: P.args, **kwargs: P.kwargs) -> pli.Series: | |
s = self._from_pyseries(self._s) | |
expr = pli.col(s.name) | |
namespace = getattr(self, "_accessor", None) | |
if namespace is not None: | |
expr = getattr(expr, namespace) | |
f = getattr(expr, func.__name__) | |
return self._from_pyseries( | |
s.to_frame().select(f(*args, **kwargs)).to_series()._s | |
) |
In that case perhaps wrapper
's return type could (should?) be annotated with typing.Self
(python 3.11+) or typing_extensions.Self
(python <= 3.10). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Now you've drawn my attention to it, I notice that _from_pyseries
isn't actually accessible from the *NameSpace
classes, resulting in the following exception:
class XSeries(pl.Series):
"""Custom Series class"""
s = XSeries("s", ["x!", "y!", "z?"])
res = s.str.contains("?", literal=True)
# AttributeError: 'StringNameSpace' object has no attribute '_from_pyseries'
Further... it doesn't actually seem like _from_pyseries
works for a large number of existing (non-decorated) functions anyway? I may be misunderstanding the scope, but it seems the intent is to preserve the user-defined class through Series operations in the general case (which certainly sounds desirable).
class XSeries(pl.Series):
"""Custom Series class"""
s = XSeries("s", [1, -2, 3])
# for example, all of the following return Series instead of XSeries
s.shrink_to_fit()
s.reinterpret()
s.hash()
s.abs()
# ...and so on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just made a small Series repr
PR that could help show this a bit more clearly, by including the custom series class name: #4571
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case perhaps wrapper's return type could (should?) be annotated with typing.Self (python 3.11+) or typing_extensions.Self (python <= 3.10). What do you think?
Unfortunately, mypy doesn't support the Self
type yet: python/mypy#11871
Also, the self
in this wrapper function is usually a NameSpace
, not a Series
. And neither namespaces or Series methods have been set up to preserve type (all these methods have a return type annotation of Series
).
So I propose we save this improvement for a future PR. This PR is already big enough! Maybe you could make a separate issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I propose we save this improvement for a future PR. This PR is already big enough! Maybe you could make a separate issue for this?
My thoughts exactly; it doesn't seem to work at the moment anyway, so no real advantage holding off on this PR. I don't mind volunteering for the follow-up (I've already got a reasonable idea how it can be done with minimal changes) ;)
I can confirm that PyCharm/DataGrip (Jetbrains) are in good shape, along with |
Everything looks good in VSCode, as far as I can tell! |
Alright good to hear. Then hear we go |
Relates to #4422
Changes:
Series
methods to theExpr
equivalent.series.utils
to house the decorator. Moved theget_ffi_func
here as well.I like that it's now very explicit that these methods do not implement any fancy - they only dispatch to another implementation.
If you like this approach, I will try to apply this to the
Series
non-namespace methods next, and then see if I can do something similar for DataFrame/LazyFrame.