Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: non-keyword arguments in any #44896

Merged
merged 42 commits into from Apr 10, 2022

Conversation

yadav-sachin
Copy link
Contributor

@yadav-sachin yadav-sachin commented Dec 15, 2021

precursor to #44802

What does this implement/fix? Explain your changes.
I used @overload definitions to specify that DataFrame.any should return type Series and Series.any should have return type bool.

Any other comments?
I am currently getting errors on checking with mypy, The erased type of self "pandas.core.frame.DataFrame" is not a supertype of its class "pandas.core.generic.NDFrame".

So I am currently trying to figure out how to resolve this error and whether using @overload is the correct direction to do this. Mentioning this pull request as WIP.

@twoertwein
Copy link
Member

Any other comments? I am currently getting errors on checking with mypy, The erased type of self "pandas.core.frame.DataFrame" is not a supertype of its class "pandas.core.generic.NDFrame".

If I understand the error correctly: any is a method of NDArray, but your overloads do now only cover DataFrame and Series - NDFrame is not covered by the overloads (since neither of them is a super-class for NDFrame).

I think you will need a third overload that is compatible with the return type of the Series and DataFrame overload. Not sure whether this works:

    @overload
    def any(
        self: NDFrame,
        axis: Axis = ...,
        bool_only: bool_t | None = ...,
        skipna: bool_t = ...,
        level: Level | None = ...,
        **kwargs,
    ) -> Series | bool_t:
        ...

@twoertwein twoertwein added the Typing type annotations, mypy/pyright type checking label Dec 15, 2021
@yadav-sachin
Copy link
Contributor Author

@twoertwein Thanks for the suggestion.
But the error still persists after I make the suggested changes.

pandas/core/generic.py Outdated Show resolved Hide resolved
@MarcoGorelli
Copy link
Member

Why not define any in both pandas/core/frame.py and pandas/core/series.py, in which they each have the correct return type and call super().any?

@yadav-sachin
Copy link
Contributor Author

Why not define any in both pandas/core/frame.py and pandas/core/series.py, in which they each have the correct return type and call super().any?

Thanks @MarcoGorelli
I was working on this approach exactly, once the @overload approach didn't seem to work.

@twoertwein
Copy link
Member

Why not define any in both pandas/core/frame.py and pandas/core/series.py, in which they each have the correct return type and call super().any?

If the mypy error is a false-positive, I would prefer adding two ignore comments compared to "implementing" any in the sub-classes. This has the advantage that as soon as mypy is updated, we can remove the ignore comments. We don't get an automatic reminder if we implement any in the sub-classes.

At least in the toy example of python/mypy#11759, the return types are correctly inferred even though mypy complains.

@MarcoGorelli
Copy link
Member

@twoertwein true, but there's the same construct for replace, where both Series.replace and DataFrame.replace just call super().replace, if I recall correctly

This has the advantage that as soon as mypy is updated, we can remove the ignore comments.

Sure, but even then, what's the advantage of doing this via overloads rather than by overriding any in DataFrame and Series?

Even without the mypy false-positive, using overloads requires adding 3 overloads, while overriding in the child classes only requires 2 extra function definitions.

In terms of lines of code which need adding, overriding any adds fewer

@twoertwein
Copy link
Member

Thank you @MarcoGorelli , I agree super().any(...) is the better solution. I was hoping that typing should ideally not change the implementation.

@yadav-sachin yadav-sachin changed the title [WIP]: TYP tighten return type in function any [MRG]: TYP tighten return type in function any Dec 16, 2021
@yadav-sachin yadav-sachin changed the title [MRG]: TYP tighten return type in function any TYP: tighten return type in function any Dec 16, 2021
@MarcoGorelli
Copy link
Member

Nice, thanks! Could you also post screenshots of the docs for Series.any and DataFrame.any?

You can use --single for this, see https://pandas.pydata.org/docs/development/contributing_documentation.html#building-the-documentation

# compile the reference docs for a single function
python make.py clean
python make.py --single pandas.DataFrame.join

@yadav-sachin
Copy link
Contributor Author

DataFrame.any
image

Series.any
image

@twoertwein
Copy link
Member

There is a mismatch between the doc-string and the type annotations: The type annotations say that DataFrame.any returns a Series (doc: Series or DataFrame) and Seriesn.any returns a bool (doc: Series or scalar).

@MarcoGorelli
Copy link
Member

There is a mismatch between the doc-string and the type annotations: The type annotations say that DataFrame.any returns a Series (doc: Series or DataFrame) and Seriesn.any returns a bool (doc: Series or scalar).

Yes, you're right!

So, the this needs some overloads too, to indicate that the return type depends on whether level was specified

@yadav-sachin
Copy link
Contributor Author

Overloaded function signatures 1 and 2 overlap with incompatible return types
What could be a way to define these overload definitions without overlapping signatures as per our requirement?

@MarcoGorelli
Copy link
Member

ah, it's because level takes default values, and can be passed both positionally and as keyword, and it has default arguments which come before it. This makes overloading complicated, check https://medium.com/analytics-vidhya/making-sense-of-typing-overload-437e6deecade

So, I'd suggest:

  1. deprecate non-keyword arguments in any, see https://github.com/pandas-dev/pandas/pull/41523/files as an example
  2. for now, the return types can be DataFrame | Series and Series | bool
  3. in a follow-up PR (or, in the next version of pandas, once non-keyword arguments in any have been deprecated), overload the return types depending on the value of level

This has been a bit more challenging that I originally thought, but I hope you find it worthwhile and learn a few new things!

@twoertwein
Copy link
Member

Level is defined as Union[Hashable, int] but None is also hashable, so they actually overlap.

@yadav-sachin
Copy link
Contributor Author

yadav-sachin commented Dec 18, 2021

@MarcoGorelli
Do I also need to add a what's new note and test?

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! And yes, this needs a whatsnew note (and ideally, a couple of little tests too)

) -> Series | bool_t:
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep the return type here? -> DataFrame | Series | bool_t?

Also, as this is only used internally, we can make its arguments keyword-only right away without warning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By making arguments keyword-only, do you mean placing arguments after ,*,? (context: https://medium.com/analytics-vidhya/keyword-only-arguments-in-python-3c1c00051720)

@yadav-sachin
Copy link
Contributor Author

What needs to be changed for the failing checks?

@MarcoGorelli
Copy link
Member

What needs to be changed for the failing checks?

You'll need to update those tests such that they don't throw a warning

=================================== FAILURES ===================================
__________________ TestDataFrameAnalytics.test_any_all_extra ___________________
[gw1] linux -- Python 3.10.4 /usr/share/miniconda/envs/pandas-dev/bin/python

self = <pandas.tests.frame.test_reductions.TestDataFrameAnalytics object at 0x7feb4634aa40>

    def test_any_all_extra(self):
        df = DataFrame(
            {
                "A": [True, False, False],
                "B": [True, True, False],
                "C": [True, True, True],
            },
            index=["a", "b", "c"],
        )
>       result = df[["A", "B"]].any(1)

pandas/tests/frame/test_reductions.py:1058: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (       A      B
a   True   True
b  False   True
c  False  False, 1)
kwargs = {}, arguments = ''

    @wraps(func)
    def wrapper(*args, **kwargs):
        arguments = _format_argument_list(allow_args)
        if len(args) > num_allow_args:
>           warnings.warn(
                msg.format(arguments=arguments),
                FutureWarning,
                stacklevel=stacklevel,
            )
E           FutureWarning: In a future version of pandas all arguments of DataFrame.any and Series.any will be keyword-only.

pandas/util/_decorators.py:[31](https://github.com/pandas-dev/pandas/runs/5830982510?check_suite_focus=true#step:11:31)2: FutureWarning

So, change result = df[["A", "B"]].any(1) to result = df[["A", "B"]].any(axis=1), and likewise for the others

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What needs to be changed for the failing checks?

You'll need to update those tests such that they don't throw a warning

=================================== FAILURES ===================================
__________________ TestDataFrameAnalytics.test_any_all_extra ___________________
[gw1] linux -- Python 3.10.4 /usr/share/miniconda/envs/pandas-dev/bin/python

self = <pandas.tests.frame.test_reductions.TestDataFrameAnalytics object at 0x7feb4634aa40>

    def test_any_all_extra(self):
        df = DataFrame(
            {
                "A": [True, False, False],
                "B": [True, True, False],
                "C": [True, True, True],
            },
            index=["a", "b", "c"],
        )
>       result = df[["A", "B"]].any(1)

pandas/tests/frame/test_reductions.py:1058: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (       A      B
a   True   True
b  False   True
c  False  False, 1)
kwargs = {}, arguments = ''

    @wraps(func)
    def wrapper(*args, **kwargs):
        arguments = _format_argument_list(allow_args)
        if len(args) > num_allow_args:
>           warnings.warn(
                msg.format(arguments=arguments),
                FutureWarning,
                stacklevel=stacklevel,
            )
E           FutureWarning: In a future version of pandas all arguments of DataFrame.any and Series.any will be keyword-only.

pandas/util/_decorators.py:[31](https://github.com/pandas-dev/pandas/runs/5830982510?check_suite_focus=true#step:11:31)2: FutureWarning

So, change result = df[["A", "B"]].any(1) to result = df[["A", "B"]].any(axis=1), and likewise for the others


Looks like there's also a warning in the docs that needs updating:

WARNING: 
>>>-------------------------------------------------------------------------
looking for now-outdated files... none found
Warning in /home/runner/work/pandas/pandas/doc/source/whatsnew/v0.13.0.rst at block ending on line 682
Specify :okwarning: as an option in the ipython:: block to suppress this message
----------------------------------------------------------------------------
<ipython-input-92-c0a7f4d519f6>:1: FutureWarning: In a future version of pandas all arguments of DataFrame.any and Series.any will be keyword-only.
  dfi[mask.any(1)]
<<<-------------------------------------------------------------------------

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yadav-sachin
Copy link
Contributor Author

What is the tool used in the tests for docstring validation?

@MarcoGorelli
Copy link
Member

What is the tool used in the tests for docstring validation?

numpydoc

There was a failure on the main branch but it's been fixed now (see #46690)

@yadav-sachin
Copy link
Contributor Author

I don't know how to solve the current failing checks.
Some of them show FileNotFoundError. Could the given paths be wrong 🤔 ? (in doc/source/user_guide/io.rst)

@MarcoGorelli
Copy link
Member

They're unrelated, don't worry about them

@MarcoGorelli MarcoGorelli self-requested a review April 9, 2022 15:33
@deprecate_nonkeyword_arguments(
version=None,
allowed_args=["self"],
stacklevel=find_stack_level() - 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i c, yeah we out to fix this but ok for here

@jreback
Copy link
Contributor

jreback commented Apr 10, 2022

lgm @MarcoGorelli over to you

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli MarcoGorelli merged commit a59582d into pandas-dev:main Apr 10, 2022
@yadav-sachin
Copy link
Contributor Author

Thanks @MarcoGorelli @twoertwein @jreback for the reviews. 😄

@yadav-sachin yadav-sachin deleted the any_return_type branch April 10, 2022 19:51
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
* tighten return type in any

* correct overload definitions

* add overload def for NDFrame input

* remove overload defs and define function any in sub-classes

* add overload defs for level

* correct default val in overload defs

* deprecate non-keyword args

* add whatsnew note

* modify return types and add tests

* move non-keyword deprecation to generic

* correct deprecation decorators

* remove imports in test assertions

* place deprecate_nonkeyword at correct place

* remove changes from frame.py, series.py

* readd changes in frame, series without actual implementations

* place deprecate_nonkeyword at other place

* add name argument to deprecate_non_keyword_args decorator

* add test for name in deprecate_nonkeyword_args

* remove changes from frame.py, series.py

* correct stacklevel in warning

* correct stacklevel

* set stacklevel to default

* move deprecation message to whatsnew v1.5.0.rst

* add name parameter in deprecate_non_keyword_args docstring

* correct whitespace in deprecate_nonkeyword_args docstring

* update any non-keyword args in other tests

* update any in doc

* update remaining any() calls in pandas/core

* correct docstring of isocalendar in pandas/core/indexes/accessors.py

Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com>
ClaudioSalvatoreArcidiacono pushed a commit to ClaudioSalvatoreArcidiacono/feature_engine that referenced this pull request Apr 18, 2023
From pandas 2.0 any only accepts keyworkd arguments
ref pandas-dev/pandas#44896
solegalli pushed a commit to feature-engine/feature_engine that referenced this pull request Apr 22, 2023
* Transform positional argument into keyword argument

From pandas 2.0 any only accepts keyworkd arguments
ref pandas-dev/pandas#44896

* Change how reciprocal is computed

I have not fully understood why this solve the problem, but splitting
the operation in 2 lines does not seem to work

* Catch warnings from pandas.to_datetime

Now pandas.to_datetime raises a warning when the column cannot be converted

* check_dtype=False in tests datetime features

Pandas dataframes created from python integers are created with int
column types `int64` but the operation tested returns `int32` which
caused issues

* Use droplevel before merging

Merging dfs with different column lelvels has been disallowed
ref pandas-dev/pandas#34862

* Change expected values for months

I am not sure why this caused an issue, maybe due to type casting?

* run black

* run black on tests

* isort _variable_type_checks.py

* Fix datetime_subtraction

---------

Co-authored-by: Claudio Salvatore Arcidiacono <claudio.arcidiacono@mollie.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants