Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Add typing for dtype argument in io/sql.py #38680

Merged
merged 7 commits into from
Dec 29, 2020

Conversation

avinashpancham
Copy link
Contributor

Follow up PR for #37546:

  • Added typing Optional[DtypeArg] for dtype arg in pandas/io, since those functions accept single values and dicts as dtype args

  • Added typing Optional[Dtype] for dtype arg in pandas/core, since those functions only accept a single value as dtype args

  • Added typing Optional[NpDtype] for dtype arg in pandas/core for functions that only accept numpy dtypes as dtype args

  • closes #xxxx

  • tests added / passed

  • passes black pandas

  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

  • whatsnew entry

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -2639,7 +2643,7 @@ def to_sql(
index: bool_t = True,
index_label=None,
chunksize=None,
dtype=None,
dtype: DtypeArg = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Typing type annotations, mypy/pyright type checking labels Dec 24, 2020
@jreback jreback added this to the 1.3 milestone Dec 24, 2020
@@ -211,7 +211,9 @@ def _from_sequence(cls, scalars, *, dtype=None, copy=False):
raise AbstractMethodError(cls)

@classmethod
def _from_sequence_of_strings(cls, strings, *, dtype=None, copy=False):
def _from_sequence_of_strings(
cls, strings, *, dtype: Optional[Dtype] = None, copy=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy: bool

@avinashpancham
Copy link
Contributor Author

avinashpancham commented Dec 24, 2020

This will probaby take some time since Mypy gives multiple union-attr errors when providing DtypeArg or Dtype to functions. Atm Ive managed to reduce it to 30, but I have to look into every case.

To give you an idea of one such issue:

pandas/io/sql.py:1947: error: Item "str" of "Union[ExtensionDtype, Any, str, Type[object], Dict[Optional[Hashable], Union[ExtensionDtype, Union[str, Any, Type[str], Type[float], Type[int], Type[complex], Type[bool], Type[object]], Union[IntervalDtype, DatetimeTZDtype]]]]" has no attribute "items"  [union-attr]

The problem is that above line 1947 we already do a check to see whether dtype is a dict or not, so a string can never rearch line 1947, but Mypy still gives it as an issue. I think it is related to overwriting variables with a different dtype

@avinashpancham
Copy link
Contributor Author

avinashpancham commented Dec 24, 2020

@jreback, I'm bit stuck here. According to python/mypy#1174, Mypy does not like overwriting variables with different dtypes and we tend to do that a lot in Pandas, see for example below. Any ideas how to solve this or should I remove the dtyping untill we have a better solution.

 def to_sql(
        self,
        frame,
        name,
        if_exists="fail",
        index=True,
        index_label=None,
        schema=None,
        chunksize=None,
        dtype: Optional[DtypeArg] = None,
        method=None,
    ):
        if dtype and not is_dict_like(dtype):
            dtype = {col_name: dtype for col_name in frame}

        if dtype is not None:
            for col, my_type in dtype.items(): # Get an error here, since not all types in Optional[DtypeArg] (e.g. str) have items attr
                if not isinstance(my_type, str):
                    raise ValueError(f"{col} ({my_type}) not a string")

@jreback
Copy link
Contributor

jreback commented Dec 24, 2020

@avinashpancham this is why need to do this in a small incremental way to avoid issues

iow just do one class of things

then in another PR do others

you cannot use the same variable names if they change type
simply create a new one

but again small incremental PRa otherwise these will bog down

@avinashpancham
Copy link
Contributor Author

Other option would be to modify the code for each of these cases, such that we dont overwrite variables (see below). But this would take time and would make this PR way too long. I would then propose to close this PR and make an issue with all files that need to be changed such that people can contribute on a per file base.

 def to_sql(
        self,
        frame,
        name,
        if_exists="fail",
        index=True,
        index_label=None,
        schema=None,
        chunksize=None,
        dtype: Optional[DtypeArg] = None,
        method=None,
    ):
        if dtype and not is_dict_like(dtype):
            dtype_dict = {col_name: dtype for col_name in frame}
       else:
            dtype_dict = dtype
                   
        if dtype_dict is not None:
            for col, my_type in dtype_dict.items(): 
                if not isinstance(my_type, str):
                    raise ValueError(f"{col} ({my_type}) not a string")

@avinashpancham
Copy link
Contributor Author

Ah I see we are thinking the same. I will then make an issue with all the files that need changing so that also other people can contribute. Will limit this PR to just the io/sql.py file then

@avinashpancham avinashpancham changed the title CLN: Add typing for dtype argument in codebase CLN: Add typing for dtype argument in io/sql.py Dec 24, 2020
pandas/io/sql.py Outdated
@@ -1483,7 +1496,7 @@ def to_sql(
if dtype and not is_dict_like(dtype):
dtype = {col_name: dtype for col_name in frame}

if dtype is not None:
if dtype is not None and isinstance(dtype, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting you have to do this as L1496 explicity converts this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so is_dict_like will pass thrue a Series for example which will fail in other places which we are not likely testing. I would change L1499 to

if dtype is not None:
     if not is_dict_like(...):
       ..
    else:
        dtype = dict(dtype)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting you have to do this as L1496 explicity converts this.

Yes, the problem is that we define the type of dtype already in the function at L1452. After that you cannot change the type of dtype, even not by overwriting the variable. isinstance checks are the only way to narrow it down. So the provided solution (see below) will not work, since we are overwriting the dtype variable

        if dtype is not None:
            if not is_dict_like(dtype):
                dtype = {col_name: dtype for col_name in frame}
            else:
                dtype = dict(dtype) # This line gives a mypy error

Mypy error

error: Argument 1 to "dict" has incompatible type "Union[ExtensionDtype, Any, str, Type[object], Dict[Optional[Hashable], Union[ExtensionDtype, str, Any, Type[str], Type[float], Type[int], Type[complex], Type[bool], Type[object]]]]"; expected "Mapping[Any, Any]"  [arg-type]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance checks are the only way to narrow it down.

also can use cast or assert. In this case, is_dict_like function will not narrow types, so ok to use a cast following the is_* call. so something like

            if is_dict_like(dtype):
                dtype = cast(dict, dtype)
                ...
            else:
                ...

can always replace dict with Mapping, or union if more than dict is accepted for dict-like parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @simonjayhawkins did not know the cast function also helps in narrowing the type down. With the cast function it works.

pandas/io/sql.py Outdated
if col.name in dtype:
return self.dtype[col.name]
dtype: DtypeArg = self.dtype or {}
if isinstance(dtype, dict) and col.name in dtype:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shoudl is is_dict_like

pandas/io/sql.py Outdated
dtype = self.dtype or {}
if col.name in dtype:
dtype: DtypeArg = self.dtype or {}
if isinstance(dtype, dict) and col.name in dtype:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_dict_like

@jreback jreback merged commit 5a6a0f7 into pandas-dev:master Dec 29, 2020
@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

thanks @avinashpancham

@avinashpancham
Copy link
Contributor Author

avinashpancham commented Dec 29, 2020

Thanks @jreback, learnt some new mypy things when working on this PR :)

Will make a general issue to also update the typing of dtype in the remainder of the codebase.

@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

kk great thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants