Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP: Overload concat #41184

Merged
merged 15 commits into from
Aug 22, 2021
Merged

Conversation

pckSF
Copy link
Contributor

@pckSF pckSF commented Apr 27, 2021

mypy Setup and Sutput

# t.py
import pandas as pd
from pandas._typing import Axis
from pandas.core.generic import NDFrame

axis: Axis

ndf1: NDFrame
ndf2: NDFrame

s1 = pd.Series
s2 = pd.Series

df1 = pd.DataFrame
df2 = pd.DataFrame

reveal_type(pd.concat([s1, s2]))
reveal_type(pd.concat([s1, s2], axis=0))
reveal_type(pd.concat([s1, s2], axis=1))
reveal_type(pd.concat([s1, s2], axis="index"))
reveal_type(pd.concat([s1, s2], axis="columns"))

reveal_type(pd.concat([df1, df2]))
reveal_type(pd.concat([df1, df2], axis=0))
reveal_type(pd.concat([df1, df2], axis=1))
reveal_type(pd.concat([df1, df2], axis="index"))
reveal_type(pd.concat([df1, df2], axis="columns"))

reveal_type(pd.concat([ndf1, ndf2]))
reveal_type(pd.concat([ndf1, ndf2], axis=0))
reveal_type(pd.concat([ndf1, ndf2], axis=1))
reveal_type(pd.concat([ndf1, ndf2], axis="index"))
reveal_type(pd.concat([ndf1, ndf2], axis="columns"))

which results in the following mypy output:

$ mypy t.py
  t.py:16: note: Revealed type is 'pandas.core.series.Series'
  t.py:17: note: Revealed type is 'pandas.core.series.Series'
  t.py:18: note: Revealed type is 'pandas.core.frame.DataFrame'
  t.py:19: note: Revealed type is 'pandas.core.series.Series'
  t.py:20: note: Revealed type is 'pandas.core.frame.DataFrame'
  t.py:22: note: Revealed type is 'pandas.core.frame.DataFrame'
  t.py:23: note: Revealed type is 'pandas.core.frame.DataFrame'
  t.py:24: note: Revealed type is 'pandas.core.frame.DataFrame'
  t.py:25: note: Revealed type is 'pandas.core.frame.DataFrame'
  t.py:26: note: Revealed type is 'pandas.core.frame.DataFrame'
  t.py:28: note: Revealed type is 'Union[pandas.core.frame.DataFrame, pandas.core.series.Series]'
  t.py:29: note: Revealed type is 'Union[pandas.core.frame.DataFrame, pandas.core.series.Series]'
  t.py:30: note: Revealed type is 'pandas.core.frame.DataFrame'
  t.py:31: note: Revealed type is 'Union[pandas.core.frame.DataFrame, pandas.core.series.Series]'
  t.py:32: note: Revealed type is 'pandas.core.frame.DataFrame'

@pckSF
Copy link
Contributor Author

pckSF commented Apr 27, 2021

The only issue left seems to be

$ mypy t.py
...
  pandas/core/reshape/concat.py:58: error: Overloaded function signatures 1 and 3 overlap with incompatible return types  [misc]
...

but signature 1 has objs: Iterable[DataFrame] | Mapping[Hashable, DataFrame], while signature 3 has objs: Iterable[Series] | Mapping[Hashable, Series], so it should not overlap?

I played around with it but could not get the desired mypy t.py output without also encountering this error :(

@MarcoGorelli
Copy link
Member

Thanks for looking into this - I also wouldn't have expected them to overlap, so I'm not sure why this is happening, but I'll look into it later

@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Apr 28, 2021
@pckSF
Copy link
Contributor Author

pckSF commented May 1, 2021

Updated overloads as I just realized that a passed objs of type NDFrame should also return a DataFrame if axis = Literal[1, "columns"] and only else a Union[pandas.core.frame.DataFrame, pandas.core.series.Series].

@MarcoGorelli MarcoGorelli self-requested a review May 3, 2021 10:17
@MarcoGorelli
Copy link
Member

I can't make sense of the mypy error you're getting:

from __future__ import annotations
from typing import overload, Iterable, Mapping, Hashable

@overload
def foo(myarg: Iterable[str] | Mapping[Hashable, str]) -> str: ...

@overload
def foo(myarg: Iterable[bool] | Mapping[Hashable, bool]) -> bool: ...

def foo(myarg) -> str | bool: ...
$ mypy t.py
t.py:5: error: Overloaded function signatures 1 and 2 overlap with incompatible return types  [misc]
Found 1 error in 1 file (checked 1 source file)

I'll ask on StackOverflow, then open an issue in mypy if I don't get a response - will get back to you as soon as I have an answer!

@MarcoGorelli
Copy link
Member

MarcoGorelli commented May 3, 2021

OK, I see the issue: it's because Mapping[TypeA, TypeB] is an Iterable[TypeA]. E.g. the following type checks fine:

from typing import Iterable


a: Iterable[str]
a = {'foo': 0}

So...gonna have to think about what to do about this


e.g.

from typing import Hashable
from pandas import DataFrame


a: Hashable
a = DataFrame()

type checks fine.

So, if you pass a mapping in which the keys are Series, then it could match Mapping[Hashable, DataFrame], but also Iterable[Series]

@simonjayhawkins simonjayhawkins added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label May 22, 2021
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 22, 2021
@MarcoGorelli
Copy link
Member

This is pending on #41283 being merged first

@simonjayhawkins
Copy link
Member

This is pending on #41283 being merged first

#41283 is now merged.

@MarcoGorelli
Copy link
Member

Hey @pckSF - if you fetch and merge upstream/master, then I'd expect the mypy issue to be fixed

@pckSF
Copy link
Contributor Author

pckSF commented Jul 11, 2021

Hey @pckSF - if you fetch and merge upstream/master, then I'd expect the mypy issue to be fixed

Done, and the error is gone

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you fixup the remaining lint issues?

mypy --version
mypy 0.910
Performing static analysis using mypy
pandas/core/reshape/reshape.py:1053: error: Redundant cast to "DataFrame"  [redundant-cast]
pandas/core/reshape/melt.py:139: error: Redundant cast to "Series"  [redundant-cast]
pandas/core/groupby/generic.py:312: error: unused "type: ignore" comment
pandas/core/groupby/generic.py:551: error: unused "type: ignore" comment
Found 4 errors in 3 files (checked 1299 source files)
Performing static analysis using mypy DONE
Error: Process completed with exit code 1.

pandas/core/reshape/concat.py Outdated Show resolved Hide resolved
@MarcoGorelli
Copy link
Member

nice - is this ready for review?

@pckSF
Copy link
Contributor Author

pckSF commented Jul 18, 2021

nice - is this ready for review?

Thanks for the reminder, I always forget that I have them in preview...

@pckSF pckSF marked this pull request as ready for review July 18, 2021 12:02
@MarcoGorelli MarcoGorelli self-requested a review July 18, 2021 12:05
@lithomas1 lithomas1 removed the Stale label Jul 18, 2021
@simonjayhawkins simonjayhawkins added this to the 1.4 milestone Jul 19, 2021
@MarcoGorelli
Copy link
Member

Nice, looks good - the typing failure is unrelated, might need to merge again when that's fixed on master

@simonjayhawkins
Copy link
Member

Nice, looks good - the typing failure is unrelated, might need to merge again when that's fixed on master

just merged #42614

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a couple of these might be unnecessary, because the return type is always the same if axis is 1/"columns"

Could you check that you check the same result by removing them? (if not I can do it later in the week)

pandas/core/reshape/concat.py Outdated Show resolved Hide resolved
pandas/core/reshape/concat.py Outdated Show resolved Hide resolved
@MarcoGorelli MarcoGorelli self-requested a review July 20, 2021 08:33
@MarcoGorelli
Copy link
Member

Performing static analysis using mypy
pandas/core/arrays/categorical.py:2314: error: Incompatible types in assignment (expression has type "List[str]", variable has type "Index")  [assignment]
pandas/core/generic.py:5752: error: Incompatible return value type (got "DataFrame", expected "FrameOrSeries")  [return-value]
pandas/core/generic.py:6115: error: Incompatible return value type (got "DataFrame", expected "FrameOrSeries")  [return-value]
Found 3 errors in 2 files (checked 1301 source files)
Performing static analysis using mypy DONE

🤔 maybe those overloads weren't redundant...I'm not sure why though, I'll look into it

@pckSF
Copy link
Contributor Author

pckSF commented Jul 20, 2021

Performing static analysis using mypy
pandas/core/arrays/categorical.py:2314: error: Incompatible types in assignment (expression has type "List[str]", variable has type "Index")  [assignment]
pandas/core/generic.py:5752: error: Incompatible return value type (got "DataFrame", expected "FrameOrSeries")  [return-value]
pandas/core/generic.py:6115: error: Incompatible return value type (got "DataFrame", expected "FrameOrSeries")  [return-value]
Found 3 errors in 2 files (checked 1301 source files)
Performing static analysis using mypy DONE

🤔 maybe those overloads weren't redundant...I'm not sure why though, I'll look into it

Question regarding pandas/core/generic.py:5752: error: Incompatible return value type (got "DataFrame", expected "FrameOrSeries") [return-value]:

    def astype(
        self: FrameOrSeries, dtype, copy: bool_t = True, errors: str = "raise"
    ) -> FrameOrSeries:
    ...
    
    ...
    result = concat(results, axis=1, copy=False)
    result.columns = self.columns
    return result

given that it uses concat along axis=1 shoudn't the returnd value be a DataFrame?
(Similar case for pandas/core/generic.py:6115: error: Incompatible return value type (got "DataFrame", expected "FrameOrSeries") [return-value])

@MarcoGorelli
Copy link
Member

If I do

diff --git a/pandas/core/generic.py b/pandas/core/generic.py
index 3b8458fac2..db6f213b3b 100644
--- a/pandas/core/generic.py
+++ b/pandas/core/generic.py
@@ -5747,7 +5747,9 @@ class NDFrame(PandasObject, indexing.IndexingMixin):
             return self.copy()
 
         # GH 19920: retain column metadata after concat
+        reveal_type(results)
         result = concat(results, axis=1, copy=False)
+        reveal_type(result)
         result.columns = self.columns
         return result

then I get

$ mypy pandas
pandas/core/arrays/categorical.py:2318: error: Incompatible types in assignment (expression has type "List[str]", variable has type "Index")  [assignment]
pandas/core/generic.py:5750: note: Revealed type is "builtins.list[Any]"
pandas/core/generic.py:5752: note: Revealed type is "pandas.core.frame.DataFrame"
pandas/core/generic.py:5754: error: Incompatible return value type (got "DataFrame", expected "FrameOrSeries")  [return-value]
pandas/core/generic.py:6117: error: Incompatible return value type (got "DataFrame", expected "FrameOrSeries")  [return-value]
Found 3 errors in 2 files (checked 1299 source files)

, so the revealed type at pandas/core/generic.py:5752 looks correct. However, if I roll back to the previous commit, then that is revealed to be Any:

$ mypy pandas
pandas/core/generic.py:5750: note: Revealed type is "builtins.list[Any]"
pandas/core/generic.py:5752: note: Revealed type is "Any"

So, it worked in the previous commit because the type checker was being disabled on the result.

Likewise, on master:

$ mypy pandas
pandas/core/generic.py:5750: note: Revealed type is "builtins.list[Any]"
pandas/core/generic.py:5752: note: Revealed type is "Any"

So, what you have in the current commit seems correct, although I can't figure out why in your previous commit the type of result was not inferred correctly

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Jul 24, 2021

OK, looks like we're running into python/mypy#8354

So, I'd suggest you do something like

diff --git a/pandas/core/generic.py b/pandas/core/generic.py
index 3b8458fac2..6205463aad 100644
--- a/pandas/core/generic.py
+++ b/pandas/core/generic.py
@@ -5749,7 +5749,8 @@ class NDFrame(PandasObject, indexing.IndexingMixin):
         # GH 19920: retain column metadata after concat
         result = concat(results, axis=1, copy=False)
         result.columns = self.columns
-        return result
+        # https://github.com/python/mypy/issues/8354
+        return cast(FrameOrSeries, result)
 
     @final
     def copy(self: FrameOrSeries, deep: bool_t = True) -> FrameOrSeries:
@@ -6112,7 +6113,8 @@ class NDFrame(PandasObject, indexing.IndexingMixin):
                 for col_name, col in self.items()
             ]
             if len(results) > 0:
-                return concat(results, axis=1, copy=False)
+                # https://github.com/python/mypy/issues/8354
+                return cast(FrameOrSeries, concat(results, axis=1, copy=False))
             else:
                 return self.copy()

The other issue seems correct though, for that I'd suggest

diff --git a/pandas/core/arrays/categorical.py b/pandas/core/arrays/categorical.py
index 3fdb52a73d..57e7c0af52 100644
--- a/pandas/core/arrays/categorical.py
+++ b/pandas/core/arrays/categorical.py
@@ -2315,7 +2315,7 @@ class Categorical(NDArrayBackedExtensionArray, PandasObject, ObjectStringArrayMi
         from pandas.core.reshape.concat import concat
 
         result = concat([counts, freqs], axis=1)
-        result.columns = ["counts", "freqs"]
+        result.columns = Index(["counts", "freqs"])
         result.index.name = "categories"
 
         return result

@MarcoGorelli
Copy link
Member

thanks - now the CI is failing because of #42716 , once that's fixed I'd like to think that this PR will be fine

@MarcoGorelli
Copy link
Member

thanks - now the CI is failing because of #42716 , once that's fixed I'd like to think that this PR will be fine

should be good to fetch/merge upstream/master now

@@ -2311,7 +2311,7 @@ def describe(self):
from pandas.core.reshape.concat import concat

result = concat([counts, freqs], axis=1)
result.columns = ["counts", "freqs"]
result.columns = Index(["counts", "freqs"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll need to import Index in/after line 2311 for this to work

@MarcoGorelli MarcoGorelli self-requested a review August 12, 2021 11:17
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@MarcoGorelli
Copy link
Member

cc @simonjayhawkins in case you have comments

@MarcoGorelli
Copy link
Member

Thanks @pckSF !

@MarcoGorelli MarcoGorelli merged commit cbec30e into pandas-dev:master Aug 22, 2021
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
* TYP: Overload concat

* Fix NDFrame columns overload

* Fix FrameOrSeries

* Fix redundant casts

* Import Axis outside of TYPE_CHECKING

* Remove redundant overloads

* Mark mypy/issues/8354 return type issues

* Fix missing imports

* noop

Co-authored-by: Marco Gorelli <marcogorelli@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants