DEPR: non-keyword arguments in any #44896

yadav-sachin · 2021-12-15T11:54:31Z

precursor to #44802

What does this implement/fix? Explain your changes.
I used @overload definitions to specify that DataFrame.any should return type Series and Series.any should have return type bool.

Any other comments?
I am currently getting errors on checking with mypy, The erased type of self "pandas.core.frame.DataFrame" is not a supertype of its class "pandas.core.generic.NDFrame".

So I am currently trying to figure out how to resolve this error and whether using @overload is the correct direction to do this. Mentioning this pull request as WIP.

twoertwein · 2021-12-15T14:20:17Z

Any other comments? I am currently getting errors on checking with mypy, The erased type of self "pandas.core.frame.DataFrame" is not a supertype of its class "pandas.core.generic.NDFrame".

If I understand the error correctly: any is a method of NDArray, but your overloads do now only cover DataFrame and Series - NDFrame is not covered by the overloads (since neither of them is a super-class for NDFrame).

I think you will need a third overload that is compatible with the return type of the Series and DataFrame overload. Not sure whether this works:

    @overload
    def any(
        self: NDFrame,
        axis: Axis = ...,
        bool_only: bool_t | None = ...,
        skipna: bool_t = ...,
        level: Level | None = ...,
        **kwargs,
    ) -> Series | bool_t:
        ...

yadav-sachin · 2021-12-15T18:40:28Z

@twoertwein Thanks for the suggestion.
But the error still persists after I make the suggested changes.

pandas/core/generic.py

MarcoGorelli · 2021-12-15T19:43:31Z

Why not define any in both pandas/core/frame.py and pandas/core/series.py, in which they each have the correct return type and call super().any?

yadav-sachin · 2021-12-15T19:58:31Z

Why not define any in both pandas/core/frame.py and pandas/core/series.py, in which they each have the correct return type and call super().any?

Thanks @MarcoGorelli
I was working on this approach exactly, once the @overload approach didn't seem to work.

twoertwein · 2021-12-15T20:21:27Z

Why not define any in both pandas/core/frame.py and pandas/core/series.py, in which they each have the correct return type and call super().any?

If the mypy error is a false-positive, I would prefer adding two ignore comments compared to "implementing" any in the sub-classes. This has the advantage that as soon as mypy is updated, we can remove the ignore comments. We don't get an automatic reminder if we implement any in the sub-classes.

At least in the toy example of python/mypy#11759, the return types are correctly inferred even though mypy complains.

MarcoGorelli · 2021-12-15T21:11:50Z

@twoertwein true, but there's the same construct for replace, where both Series.replace and DataFrame.replace just call super().replace, if I recall correctly

This has the advantage that as soon as mypy is updated, we can remove the ignore comments.

Sure, but even then, what's the advantage of doing this via overloads rather than by overriding any in DataFrame and Series?

Even without the mypy false-positive, using overloads requires adding 3 overloads, while overriding in the child classes only requires 2 extra function definitions.

In terms of lines of code which need adding, overriding any adds fewer

twoertwein · 2021-12-15T21:23:40Z

Thank you @MarcoGorelli , I agree super().any(...) is the better solution. I was hoping that typing should ideally not change the implementation.

…_type

MarcoGorelli · 2021-12-16T12:55:11Z

Nice, thanks! Could you also post screenshots of the docs for Series.any and DataFrame.any?

You can use --single for this, see https://pandas.pydata.org/docs/development/contributing_documentation.html#building-the-documentation

# compile the reference docs for a single function
python make.py clean
python make.py --single pandas.DataFrame.join

yadav-sachin · 2021-12-16T13:51:37Z

DataFrame.any

Series.any

twoertwein · 2021-12-16T13:56:26Z

There is a mismatch between the doc-string and the type annotations: The type annotations say that DataFrame.any returns a Series (doc: Series or DataFrame) and Seriesn.any returns a bool (doc: Series or scalar).

MarcoGorelli · 2021-12-16T14:14:57Z

There is a mismatch between the doc-string and the type annotations: The type annotations say that DataFrame.any returns a Series (doc: Series or DataFrame) and Seriesn.any returns a bool (doc: Series or scalar).

Yes, you're right!

So, the this needs some overloads too, to indicate that the return type depends on whether level was specified

yadav-sachin · 2021-12-17T07:41:20Z

Overloaded function signatures 1 and 2 overlap with incompatible return types
What could be a way to define these overload definitions without overlapping signatures as per our requirement?

MarcoGorelli · 2021-12-17T10:13:20Z

ah, it's because level takes default values, and can be passed both positionally and as keyword, and it has default arguments which come before it. This makes overloading complicated, check https://medium.com/analytics-vidhya/making-sense-of-typing-overload-437e6deecade

So, I'd suggest:

deprecate non-keyword arguments in any, see https://github.com/pandas-dev/pandas/pull/41523/files as an example
for now, the return types can be DataFrame | Series and Series | bool
in a follow-up PR (or, in the next version of pandas, once non-keyword arguments in any have been deprecated), overload the return types depending on the value of level

This has been a bit more challenging that I originally thought, but I hope you find it worthwhile and learn a few new things!

twoertwein · 2021-12-17T13:27:12Z

Level is defined as Union[Hashable, int] but None is also hashable, so they actually overlap.

yadav-sachin · 2021-12-18T10:13:46Z

@MarcoGorelli
Do I also need to add a what's new note and test?

MarcoGorelli

Nice work! And yes, this needs a whatsnew note (and ideally, a couple of little tests too)

MarcoGorelli · 2021-12-18T10:31:55Z

pandas/core/generic.py

-    ) -> Series | bool_t:
+    ):


Can you keep the return type here? -> DataFrame | Series | bool_t?

Also, as this is only used internally, we can make its arguments keyword-only right away without warning

By making arguments keyword-only, do you mean placing arguments after ,*,? (context: https://medium.com/analytics-vidhya/keyword-only-arguments-in-python-3c1c00051720)

yadav-sachin · 2022-04-06T09:55:49Z

What needs to be changed for the failing checks?

MarcoGorelli · 2022-04-06T10:05:20Z

What needs to be changed for the failing checks?

You'll need to update those tests such that they don't throw a warning

=================================== FAILURES ===================================
__________________ TestDataFrameAnalytics.test_any_all_extra ___________________
[gw1] linux -- Python 3.10.4 /usr/share/miniconda/envs/pandas-dev/bin/python

self = <pandas.tests.frame.test_reductions.TestDataFrameAnalytics object at 0x7feb4634aa40>

    def test_any_all_extra(self):
        df = DataFrame(
            {
                "A": [True, False, False],
                "B": [True, True, False],
                "C": [True, True, True],
            },
            index=["a", "b", "c"],
        )
>       result = df[["A", "B"]].any(1)

pandas/tests/frame/test_reductions.py:1058: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (       A      B
a   True   True
b  False   True
c  False  False, 1)
kwargs = {}, arguments = ''

    @wraps(func)
    def wrapper(*args, **kwargs):
        arguments = _format_argument_list(allow_args)
        if len(args) > num_allow_args:
>           warnings.warn(
                msg.format(arguments=arguments),
                FutureWarning,
                stacklevel=stacklevel,
            )
E           FutureWarning: In a future version of pandas all arguments of DataFrame.any and Series.any will be keyword-only.

pandas/util/_decorators.py:[31](https://github.com/pandas-dev/pandas/runs/5830982510?check_suite_focus=true#step:11:31)2: FutureWarning

So, change result = df[["A", "B"]].any(1) to result = df[["A", "B"]].any(axis=1), and likewise for the others

MarcoGorelli

What needs to be changed for the failing checks?

You'll need to update those tests such that they don't throw a warning

=================================== FAILURES ===================================
__________________ TestDataFrameAnalytics.test_any_all_extra ___________________
[gw1] linux -- Python 3.10.4 /usr/share/miniconda/envs/pandas-dev/bin/python

self = <pandas.tests.frame.test_reductions.TestDataFrameAnalytics object at 0x7feb4634aa40>

    def test_any_all_extra(self):
        df = DataFrame(
            {
                "A": [True, False, False],
                "B": [True, True, False],
                "C": [True, True, True],
            },
            index=["a", "b", "c"],
        )
>       result = df[["A", "B"]].any(1)

pandas/tests/frame/test_reductions.py:1058: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (       A      B
a   True   True
b  False   True
c  False  False, 1)
kwargs = {}, arguments = ''

    @wraps(func)
    def wrapper(*args, **kwargs):
        arguments = _format_argument_list(allow_args)
        if len(args) > num_allow_args:
>           warnings.warn(
                msg.format(arguments=arguments),
                FutureWarning,
                stacklevel=stacklevel,
            )
E           FutureWarning: In a future version of pandas all arguments of DataFrame.any and Series.any will be keyword-only.

pandas/util/_decorators.py:[31](https://github.com/pandas-dev/pandas/runs/5830982510?check_suite_focus=true#step:11:31)2: FutureWarning

So, change result = df[["A", "B"]].any(1) to result = df[["A", "B"]].any(axis=1), and likewise for the others

Looks like there's also a warning in the docs that needs updating:

WARNING: 
>>>-------------------------------------------------------------------------
looking for now-outdated files... none found
Warning in /home/runner/work/pandas/pandas/doc/source/whatsnew/v0.13.0.rst at block ending on line 682
Specify :okwarning: as an option in the ipython:: block to suppress this message
----------------------------------------------------------------------------
<ipython-input-92-c0a7f4d519f6>:1: FutureWarning: In a future version of pandas all arguments of DataFrame.any and Series.any will be keyword-only.
  dfi[mask.any(1)]
<<<-------------------------------------------------------------------------

MarcoGorelli

you'll also need to change the .any usage in https://github.com/pandas-dev/pandas/blob/361021b56f3159afb71d690fac3a1f3b381b0da6/doc/source/whatsnew/v0.13.0.rst

MarcoGorelli

still got some failing tests

https://github.com/pandas-dev/pandas/runs/5869362121?check_suite_focus=true

yadav-sachin · 2022-04-08T09:28:28Z

What is the tool used in the tests for docstring validation?

MarcoGorelli · 2022-04-08T09:29:27Z

What is the tool used in the tests for docstring validation?

numpydoc

There was a failure on the main branch but it's been fixed now (see #46690)

yadav-sachin · 2022-04-08T10:29:22Z

I don't know how to solve the current failing checks.
Some of them show FileNotFoundError. Could the given paths be wrong 🤔 ? (in doc/source/user_guide/io.rst)

MarcoGorelli · 2022-04-08T10:33:06Z

They're unrelated, don't worry about them

… any_return_type

jreback · 2022-04-10T19:28:30Z

pandas/core/generic.py

+        @deprecate_nonkeyword_arguments(
+            version=None,
+            allowed_args=["self"],
+            stacklevel=find_stack_level() - 1,


ok i c, yeah we out to fix this but ok for here

jreback · 2022-04-10T19:29:09Z

lgm @MarcoGorelli over to you

MarcoGorelli

yadav-sachin · 2022-04-10T19:50:43Z

Thanks @MarcoGorelli @twoertwein @jreback for the reviews. 😄

* tighten return type in any * correct overload definitions * add overload def for NDFrame input * remove overload defs and define function any in sub-classes * add overload defs for level * correct default val in overload defs * deprecate non-keyword args * add whatsnew note * modify return types and add tests * move non-keyword deprecation to generic * correct deprecation decorators * remove imports in test assertions * place deprecate_nonkeyword at correct place * remove changes from frame.py, series.py * readd changes in frame, series without actual implementations * place deprecate_nonkeyword at other place * add name argument to deprecate_non_keyword_args decorator * add test for name in deprecate_nonkeyword_args * remove changes from frame.py, series.py * correct stacklevel in warning * correct stacklevel * set stacklevel to default * move deprecation message to whatsnew v1.5.0.rst * add name parameter in deprecate_non_keyword_args docstring * correct whitespace in deprecate_nonkeyword_args docstring * update any non-keyword args in other tests * update any in doc * update remaining any() calls in pandas/core * correct docstring of isocalendar in pandas/core/indexes/accessors.py Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com>

From pandas 2.0 any only accepts keyworkd arguments ref pandas-dev/pandas#44896

* Transform positional argument into keyword argument From pandas 2.0 any only accepts keyworkd arguments ref pandas-dev/pandas#44896 * Change how reciprocal is computed I have not fully understood why this solve the problem, but splitting the operation in 2 lines does not seem to work * Catch warnings from pandas.to_datetime Now pandas.to_datetime raises a warning when the column cannot be converted * check_dtype=False in tests datetime features Pandas dataframes created from python integers are created with int column types `int64` but the operation tested returns `int32` which caused issues * Use droplevel before merging Merging dfs with different column lelvels has been disallowed ref pandas-dev/pandas#34862 * Change expected values for months I am not sure why this caused an issue, maybe due to type casting? * run black * run black on tests * isort _variable_type_checks.py * Fix datetime_subtraction --------- Co-authored-by: Claudio Salvatore Arcidiacono <claudio.arcidiacono@mollie.com>

yadav-sachin added 2 commits December 15, 2021 16:55

tighten return type in any

59bd617

correct overload definitions

389cb03

twoertwein added the Typing type annotations, mypy/pyright type checking label Dec 15, 2021

add overload def for NDFrame input

076efc0

twoertwein reviewed Dec 15, 2021

View reviewed changes

pandas/core/generic.py Outdated Show resolved Hide resolved

yadav-sachin added 2 commits December 16, 2021 17:32

remove overload defs and define function any in sub-classes

7f643d6

Merge branch 'master' of github.com:pandas-dev/pandas into any_return…

d6df993

…_type

yadav-sachin changed the title ~~[WIP]: TYP tighten return type in function any~~ [MRG]: TYP tighten return type in function any Dec 16, 2021

yadav-sachin changed the title ~~[MRG]: TYP tighten return type in function any~~ TYP: tighten return type in function any Dec 16, 2021

add overload defs for level

d2653da

correct default val in overload defs

0f15b62

deprecate non-keyword args

831481c

yadav-sachin force-pushed the any_return_type branch from f5412ed to 831481c Compare December 18, 2021 10:11

MarcoGorelli requested changes Dec 18, 2021

View reviewed changes

add whatsnew note

284efd3

MarcoGorelli requested changes Apr 6, 2022

View reviewed changes

update any non-keyword args in other tests

b4b3a1f

MarcoGorelli requested changes Apr 7, 2022

View reviewed changes

update any in doc

11917aa

MarcoGorelli requested changes Apr 7, 2022

View reviewed changes

yadav-sachin added 3 commits April 8, 2022 00:55

update remaining any() calls in pandas/core

35c71b6

correct docstring of isocalendar in pandas/core/indexes/accessors.py

0e6d5eb

Merge remote-tracking branch 'upstream/main' into any_return_type

025c493

MarcoGorelli and others added 3 commits April 8, 2022 14:40

Merge branch 'main' into any_return_type

59ac830

Merge remote-tracking branch 'upstream/main' into any_return_type

d9d14e9

Merge branch 'any_return_type' of github.com:yadav-sachin/pandas into…

15bbc86

… any_return_type

MarcoGorelli self-requested a review April 9, 2022 15:33

Merge remote-tracking branch 'upstream/main' into any_return_type

a8dd741

jreback mentioned this pull request Apr 10, 2022

DEPR: log of deprecations in 1.x (to be removed in 2.0) #30228

Closed

jreback approved these changes Apr 10, 2022

View reviewed changes

MarcoGorelli approved these changes Apr 10, 2022

View reviewed changes

MarcoGorelli merged commit a59582d into pandas-dev:main Apr 10, 2022

yadav-sachin deleted the any_return_type branch April 10, 2022 19:51

MarcoGorelli mentioned this pull request Sep 20, 2022

REGR: df.apply('any', axis=1) fails with ValueError #48656

Closed

3 tasks

ClaudioSalvatoreArcidiacono pushed a commit to ClaudioSalvatoreArcidiacono/feature_engine that referenced this pull request Apr 18, 2023

Transform positional argument into keyword argument

2601aad

From pandas 2.0 any only accepts keyworkd arguments ref pandas-dev/pandas#44896

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPR: non-keyword arguments in any #44896

DEPR: non-keyword arguments in any #44896

yadav-sachin commented Dec 15, 2021 •

edited by MarcoGorelli

twoertwein commented Dec 15, 2021

yadav-sachin commented Dec 15, 2021

MarcoGorelli commented Dec 15, 2021

yadav-sachin commented Dec 15, 2021

twoertwein commented Dec 15, 2021

MarcoGorelli commented Dec 15, 2021

twoertwein commented Dec 15, 2021

MarcoGorelli commented Dec 16, 2021

yadav-sachin commented Dec 16, 2021

twoertwein commented Dec 16, 2021

MarcoGorelli commented Dec 16, 2021

yadav-sachin commented Dec 17, 2021

MarcoGorelli commented Dec 17, 2021

twoertwein commented Dec 17, 2021

yadav-sachin commented Dec 18, 2021 •

edited

MarcoGorelli left a comment

MarcoGorelli Dec 18, 2021

yadav-sachin Dec 21, 2021 •

edited

yadav-sachin commented Apr 6, 2022

MarcoGorelli commented Apr 6, 2022

MarcoGorelli left a comment •

edited

MarcoGorelli left a comment

MarcoGorelli left a comment

yadav-sachin commented Apr 8, 2022

MarcoGorelli commented Apr 8, 2022

yadav-sachin commented Apr 8, 2022

MarcoGorelli commented Apr 8, 2022

jreback Apr 10, 2022

jreback commented Apr 10, 2022

MarcoGorelli left a comment

yadav-sachin commented Apr 10, 2022

DEPR: non-keyword arguments in any #44896

DEPR: non-keyword arguments in any #44896

Conversation

yadav-sachin commented Dec 15, 2021 • edited by MarcoGorelli

twoertwein commented Dec 15, 2021

yadav-sachin commented Dec 15, 2021

MarcoGorelli commented Dec 15, 2021

yadav-sachin commented Dec 15, 2021

twoertwein commented Dec 15, 2021

MarcoGorelli commented Dec 15, 2021

twoertwein commented Dec 15, 2021

MarcoGorelli commented Dec 16, 2021

yadav-sachin commented Dec 16, 2021

twoertwein commented Dec 16, 2021

MarcoGorelli commented Dec 16, 2021

yadav-sachin commented Dec 17, 2021

MarcoGorelli commented Dec 17, 2021

twoertwein commented Dec 17, 2021

yadav-sachin commented Dec 18, 2021 • edited

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli Dec 18, 2021

Choose a reason for hiding this comment

yadav-sachin Dec 21, 2021 • edited

Choose a reason for hiding this comment

yadav-sachin commented Apr 6, 2022

MarcoGorelli commented Apr 6, 2022

MarcoGorelli left a comment • edited

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

yadav-sachin commented Apr 8, 2022

MarcoGorelli commented Apr 8, 2022

yadav-sachin commented Apr 8, 2022

MarcoGorelli commented Apr 8, 2022

jreback Apr 10, 2022

Choose a reason for hiding this comment

jreback commented Apr 10, 2022

MarcoGorelli left a comment

Choose a reason for hiding this comment

yadav-sachin commented Apr 10, 2022

yadav-sachin commented Dec 15, 2021 •

edited by MarcoGorelli

yadav-sachin commented Dec 18, 2021 •

edited

yadav-sachin Dec 21, 2021 •

edited

MarcoGorelli left a comment •

edited