CLN: Add typing for dtype argument in io/sql.py #38680

avinashpancham · 2020-12-24T14:55:29Z

Follow up PR for #37546:

Added typing Optional[DtypeArg] for dtype arg in pandas/io, since those functions accept single values and dicts as dtype args
Added typing Optional[Dtype] for dtype arg in pandas/core, since those functions only accept a single value as dtype args
Added typing Optional[NpDtype] for dtype arg in pandas/core for functions that only accept numpy dtypes as dtype args
closes #xxxx
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback

one comment

jreback · 2020-12-24T17:52:23Z

pandas/core/generic.py

@@ -2639,7 +2643,7 @@ def to_sql(
        index: bool_t = True,
        index_label=None,
        chunksize=None,
-        dtype=None,
+        dtype: DtypeArg = None,


jbrockmendel · 2020-12-24T18:03:23Z

pandas/core/arrays/base.py

@@ -211,7 +211,9 @@ def _from_sequence(cls, scalars, *, dtype=None, copy=False):
        raise AbstractMethodError(cls)

    @classmethod
-    def _from_sequence_of_strings(cls, strings, *, dtype=None, copy=False):
+    def _from_sequence_of_strings(
+        cls, strings, *, dtype: Optional[Dtype] = None, copy=False


avinashpancham · 2020-12-24T21:16:28Z

This will probaby take some time since Mypy gives multiple union-attr errors when providing DtypeArg or Dtype to functions. Atm Ive managed to reduce it to 30, but I have to look into every case.

To give you an idea of one such issue:

pandas/io/sql.py:1947: error: Item "str" of "Union[ExtensionDtype, Any, str, Type[object], Dict[Optional[Hashable], Union[ExtensionDtype, Union[str, Any, Type[str], Type[float], Type[int], Type[complex], Type[bool], Type[object]], Union[IntervalDtype, DatetimeTZDtype]]]]" has no attribute "items"  [union-attr]

The problem is that above line 1947 we already do a check to see whether dtype is a dict or not, so a string can never rearch line 1947, but Mypy still gives it as an issue. I think it is related to overwriting variables with a different dtype

avinashpancham · 2020-12-24T21:34:30Z

@jreback, I'm bit stuck here. According to python/mypy#1174, Mypy does not like overwriting variables with different dtypes and we tend to do that a lot in Pandas, see for example below. Any ideas how to solve this or should I remove the dtyping untill we have a better solution.

 def to_sql(
        self,
        frame,
        name,
        if_exists="fail",
        index=True,
        index_label=None,
        schema=None,
        chunksize=None,
        dtype: Optional[DtypeArg] = None,
        method=None,
    ):
        if dtype and not is_dict_like(dtype):
            dtype = {col_name: dtype for col_name in frame}

        if dtype is not None:
            for col, my_type in dtype.items(): # Get an error here, since not all types in Optional[DtypeArg] (e.g. str) have items attr
                if not isinstance(my_type, str):
                    raise ValueError(f"{col} ({my_type}) not a string")

jreback · 2020-12-24T21:56:07Z

@avinashpancham this is why need to do this in a small incremental way to avoid issues

iow just do one class of things

then in another PR do others

you cannot use the same variable names if they change type
simply create a new one

but again small incremental PRa otherwise these will bog down

avinashpancham · 2020-12-24T21:57:21Z

Other option would be to modify the code for each of these cases, such that we dont overwrite variables (see below). But this would take time and would make this PR way too long. I would then propose to close this PR and make an issue with all files that need to be changed such that people can contribute on a per file base.

 def to_sql(
        self,
        frame,
        name,
        if_exists="fail",
        index=True,
        index_label=None,
        schema=None,
        chunksize=None,
        dtype: Optional[DtypeArg] = None,
        method=None,
    ):
        if dtype and not is_dict_like(dtype):
            dtype_dict = {col_name: dtype for col_name in frame}
       else:
            dtype_dict = dtype
                   
        if dtype_dict is not None:
            for col, my_type in dtype_dict.items(): 
                if not isinstance(my_type, str):
                    raise ValueError(f"{col} ({my_type}) not a string")

avinashpancham · 2020-12-24T21:59:50Z

Ah I see we are thinking the same. I will then make an issue with all the files that need changing so that also other people can contribute. Will limit this PR to just the io/sql.py file then

jreback · 2020-12-28T18:49:15Z

pandas/io/sql.py

@@ -1483,7 +1496,7 @@ def to_sql(
        if dtype and not is_dict_like(dtype):
            dtype = {col_name: dtype for col_name in frame}

-        if dtype is not None:
+        if dtype is not None and isinstance(dtype, dict):


interesting you have to do this as L1496 explicity converts this.

so is_dict_like will pass thrue a Series for example which will fail in other places which we are not likely testing. I would change L1499 to

if dtype is not None: if not is_dict_like(...): .. else: dtype = dict(dtype)

interesting you have to do this as L1496 explicity converts this.

Yes, the problem is that we define the type of dtype already in the function at L1452. After that you cannot change the type of dtype, even not by overwriting the variable. isinstance checks are the only way to narrow it down. So the provided solution (see below) will not work, since we are overwriting the dtype variable

if dtype is not None: if not is_dict_like(dtype): dtype = {col_name: dtype for col_name in frame} else: dtype = dict(dtype) # This line gives a mypy error

Mypy error

error: Argument 1 to "dict" has incompatible type "Union[ExtensionDtype, Any, str, Type[object], Dict[Optional[Hashable], Union[ExtensionDtype, str, Any, Type[str], Type[float], Type[int], Type[complex], Type[bool], Type[object]]]]"; expected "Mapping[Any, Any]" [arg-type]

isinstance checks are the only way to narrow it down.

also can use cast or assert. In this case, is_dict_like function will not narrow types, so ok to use a cast following the is_* call. so something like

if is_dict_like(dtype): dtype = cast(dict, dtype) ... else: ...

can always replace dict with Mapping, or union if more than dict is accepted for dict-like parameter.

Thanks @simonjayhawkins did not know the cast function also helps in narrowing the type down. With the cast function it works.

jreback · 2020-12-28T23:21:30Z

pandas/io/sql.py

-        if col.name in dtype:
-            return self.dtype[col.name]
+        dtype: DtypeArg = self.dtype or {}
+        if isinstance(dtype, dict) and col.name in dtype:


shoudl is is_dict_like

jreback · 2020-12-28T23:21:48Z

pandas/io/sql.py

-        dtype = self.dtype or {}
-        if col.name in dtype:
+        dtype: DtypeArg = self.dtype or {}
+        if isinstance(dtype, dict) and col.name in dtype:


use is_dict_like

jreback · 2020-12-29T14:08:26Z

thanks @avinashpancham

avinashpancham · 2020-12-29T14:10:58Z

Thanks @jreback, learnt some new mypy things when working on this PR :)

Will make a general issue to also update the typing of dtype in the remainder of the codebase.

jreback · 2020-12-29T14:56:38Z

kk great thanks!

jreback requested changes Dec 24, 2020

View reviewed changes

jreback added Dtype Conversions Unexpected or buggy dtype conversions Typing type annotations, mypy/pyright type checking labels Dec 24, 2020

jreback added this to the 1.3 milestone Dec 24, 2020

jbrockmendel reviewed Dec 24, 2020

View reviewed changes

Add typing for io/sql.py

e1bc42a

avinashpancham force-pushed the typing_dtype branch from e9b49c6 to e1bc42a Compare December 24, 2020 23:29

avinashpancham changed the title ~~CLN: Add typing for dtype argument in codebase~~ CLN: Add typing for dtype argument in io/sql.py Dec 24, 2020

Merge remote-tracking branch 'upstream/master' into typing_dtype

1aaa6a2

jreback requested changes Dec 28, 2020

View reviewed changes

Cast dtype to dict

2c70647

jreback requested changes Dec 28, 2020

View reviewed changes

avinashpancham added 4 commits December 29, 2020 01:45

Replace isinstance checks with is_dict_like

a82a6c1

Merge remote-tracking branch 'upstream/master' into typing_dtype

41138e3

Revert check to original form

8dc13bd

Remove superfluous if statement

ae41d91

jreback approved these changes Dec 29, 2020

View reviewed changes

jreback merged commit 5a6a0f7 into pandas-dev:master Dec 29, 2020

avinashpancham mentioned this pull request Dec 30, 2020

CLN: Add typing for dtype argument in codebase #38808

Closed

32 tasks

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021

CLN: Add typing for dtype argument in io/sql.py (pandas-dev#38680)

2fb1e47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: Add typing for dtype argument in io/sql.py #38680

CLN: Add typing for dtype argument in io/sql.py #38680

avinashpancham commented Dec 24, 2020

jreback left a comment

jreback Dec 24, 2020

jbrockmendel Dec 24, 2020

avinashpancham commented Dec 24, 2020 •

edited

Loading

avinashpancham commented Dec 24, 2020 •

edited

Loading

jreback commented Dec 24, 2020

avinashpancham commented Dec 24, 2020

avinashpancham commented Dec 24, 2020

jreback Dec 28, 2020

jreback Dec 28, 2020

avinashpancham Dec 28, 2020

simonjayhawkins Dec 28, 2020

avinashpancham Dec 28, 2020

jreback Dec 28, 2020

jreback Dec 28, 2020

jreback commented Dec 29, 2020

avinashpancham commented Dec 29, 2020 •

edited

Loading

jreback commented Dec 29, 2020

CLN: Add typing for dtype argument in io/sql.py #38680

CLN: Add typing for dtype argument in io/sql.py #38680

Conversation

avinashpancham commented Dec 24, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avinashpancham commented Dec 24, 2020 • edited Loading

avinashpancham commented Dec 24, 2020 • edited Loading

jreback commented Dec 24, 2020

avinashpancham commented Dec 24, 2020

avinashpancham commented Dec 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 29, 2020

avinashpancham commented Dec 29, 2020 • edited Loading

jreback commented Dec 29, 2020

avinashpancham commented Dec 24, 2020 •

edited

Loading

avinashpancham commented Dec 24, 2020 •

edited

Loading

avinashpancham commented Dec 29, 2020 •

edited

Loading