TYP: Typ part of python_parser #44406

phofl · 2021-11-12T13:06:14Z

@simonjayhawkins I am wondering about mypy with overloads.

The overload for _do_date_conversion could be more specific, e.g.

    @overload
    def _do_date_conversions(
        self,
        names: list[Scalar | tuple],
        data: dict[Scalar | tuple, ArrayLike] | dict[Scalar | tuple, np.ndarray],
    ) -> tuple[
        list[Scalar | tuple],
        dict[Scalar | tuple, ArrayLike] | dict[Scalar | tuple, np.ndarray],
    ]:
        ...

could be transformed to

    @overload
    def _do_date_conversions(
        self,
        names: list[Scalar | tuple],
        data: dict[Scalar | tuple, ArrayLike],
    ) -> tuple[
        list[Scalar | tuple],
        dict[Scalar | tuple, ArrayLike],
    ]:
        ...

    @overload
    def _do_date_conversions(
        self,
        names: list[Scalar | tuple],
        data: dict[Scalar | tuple, np.ndarray],
    ) -> tuple[
        list[Scalar | tuple],
        dict[Scalar | tuple, np.ndarray],
    ]:
        ...

But in this case mypy complains about:
error: Overloaded function signature 3 will never be matched: signature 2's parameter type(s) are the same or broader [misc]

On the other side, if typing this only with

    @overload
    def _do_date_conversions(
        self,
        names: list[Scalar | tuple],
        data: dict[Scalar | tuple, ArrayLike],
    ) -> tuple[
        list[Scalar | tuple],
        dict[Scalar | tuple, ArrayLike],
    ]:
        ...

and passing a dict[Scalar | tuple, np.ndarray] mypy complains with

pandas/io/parsers/python_parser.py:283: error: Argument 2 to "_do_date_conversions" of "ParserBase" has incompatible type "Dict[Union[Union[Union[str, int, float, bool], Union[Period, Timestamp, Timedelta, Any]], Tuple[Any, ...]], ndarray[Any, Any]]"; expected "Dict[Union[Union[Union[str, int, float, bool], Union[Period, Timestamp, Timedelta, Any]], Tuple[Any, ...]], Union[ExtensionArray, ndarray[Any, Any]]]"  [arg-type]
pandas/io/parsers/python_parser.py:283: note: "Dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
pandas/io/parsers/python_parser.py:283: note: Consider using "Mapping" instead, which is covariant in the value type

This looks inconsistent. I think the overload should accept the distinction between ArrayLike and np.ndarray with dicts or lists?

Technically we could use Mapping probably, but we would loose some strictness in this case and the object is alwyas a dict and we know if passing only np.ndarray we will get them in return.

On another topic: Should we use an alias for Scalar | tuple? We need this throughout the code to indicate column names

� Conflicts: � pandas/io/parsers/base_parser.py � pandas/io/parsers/c_parser_wrapper.py � pandas/io/parsers/python_parser.py

twoertwein · 2021-11-13T00:06:25Z

The overload for _do_date_conversion could be more specific, e.g.

ArrayLike is defiend as Union["ExtensionArray", np.ndarray]. I think you don't need an overload:

_ArrayLikeT = TypeVar("_ArrayLikeT", bound=ArrayLike)

def _do_date_conversions(
    self,
    names: list[Scalar | tuple],
    data: dict[Scalar | tuple, _ArrayLikeT],
) -> tuple[
    list[Scalar | tuple],
    dict[Scalar | tuple, _ArrayLikeT],
]:
    ...

phofl · 2021-11-13T00:11:32Z

Unfortunately mypy does not accept this, if I want to put dict[Scalar | tuple, np.ndarray] in, because a dict is invariant. Minimal example:

def test(x: dict[str, int | float]) -> None:
    print(x)


x: dict[str, int] = {"a": 1}
test(x)

This raises

error: Argument 1 to "test" has incompatible type "Dict[str, int]"; expected "Dict[str, Union[int, float]]"  [arg-type]
note: "Dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
note: Consider using "Mapping" instead, which is covariant in the value type

pandas/io/parsers/python_parser.py

twoertwein · 2021-11-13T00:19:42Z

pandas/io/parsers/python_parser.py

@@ -233,7 +243,7 @@ def _read():
        # TextIOWrapper, mmap, None]")
        self.data = reader  # type: ignore[assignment]

-    def read(self, rows=None):
+    def read(self, rows: int | None = None):


Not for this PR: we have variable: int | None = None in a few places, using variable: int = -1 looks nicer to me (but would need changing the implementation).

Would be more readable I think, yes

twoertwein · 2021-11-18T21:57:00Z

pandas/io/parsers/arrow_parser_wrapper.py

+        # error: Incompatible types in assignment (expression has type
+        # "Union[List[Union[Union[str, int, float, bool], Union[Period, Timestamp,
+        # Timedelta, Any]]], Index]", variable has type "Index")  [assignment]
+        frame.columns, frame = self._do_date_conversions(  # type: ignore[assignment]


Not related to typing: Can _do_date_conversions return a new DataFrame object? If yes, this line should maybe be re-written as

columns, frame = self._do_date_conversions(...) frame.columns = columns

otherwise, python will set columns on the old frame object and then replace the frame object.

Yes, but it will return the input without modification. This happens only if self.parse_dates is None. In this case the input is returned, so should be fine

# Conflicts: # pandas/io/parsers/base_parser.py # pandas/io/parsers/python_parser.py

phofl · 2021-11-18T22:47:22Z

@simonjayhawkins would you mind having a look?

simonjayhawkins

Thanks @phofl

simonjayhawkins · 2021-11-19T11:33:45Z