Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: core/dtypes/cast.py::maybe_downcast_to_dtype #37050

Merged
merged 10 commits into from
Oct 14, 2020

Conversation

arw2019
Copy link
Member

@arw2019 arw2019 commented Oct 11, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@arw2019 arw2019 marked this pull request as draft October 11, 2020 04:24
@arw2019 arw2019 marked this pull request as ready for review October 11, 2020 05:47
@arw2019 arw2019 changed the title [WIP] CLN: core/dtypes/cast.py::maybe_downcast_to_dtype CLN: core/dtypes/cast.py::maybe_downcast_to_dtype Oct 11, 2020
if dtype.kind in ["M", "m"] and result.dtype.kind in ["i", "f"]:
if hasattr(dtype, "tz"):
# GH12821, iNaT is cast to float
if is_datetime_or_timedelta_any_dtype(dtype) and result.dtype.kind in ["i", "f"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dtype.kind checks are much more performant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, just leave this as is i think for L163 (and can remove the comment which isold)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! Done both

@jreback jreback added Clean Dtype Conversions Unexpected or buggy dtype conversions labels Oct 11, 2020
if dtype.kind in ["M", "m"] and result.dtype.kind in ["i", "f"]:
if hasattr(dtype, "tz"):
# GH12821, iNaT is cast to float
if is_datetime_or_timedelta_any_dtype(dtype) and result.dtype.kind in ["i", "f"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, just leave this as is i think for L163 (and can remove the comment which isold)

@arw2019 arw2019 force-pushed the maybe_downcast_to_dtype_cleanup branch from a5c9169 to 3aad776 Compare October 11, 2020 16:23
if dtype.kind in ["M", "m"] and result.dtype.kind in ["i", "f"]:
if hasattr(dtype, "tz"):

if is_datetime_or_timedelta_dtype(dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this code here can be greatly simplified, see what the incoming inputs are and i bet you can adjust it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOW the way this is written L173 is always hit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried this. There are three tests that use the else block:

FAILED pandas/tests/dtypes/cast/test_downcast.py::test_datetime_with_timezone[True]
FAILED pandas/tests/frame/methods/test_combine_first.py::TestDataFrameCombineFirst::test_combine_first_timezone
FAILED pandas/tests/groupby/test_function.py::test_arg_passthru - TypeError: ...

Reverting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and you have L177. so i would revert to the original code here as well (except for the period change);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jreback jreback added this to the 1.2 milestone Oct 12, 2020
if dtype.kind in ["M", "m"] and result.dtype.kind in ["i", "f"]:
if hasattr(dtype, "tz"):

if is_datetime_or_timedelta_dtype(dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and you have L177. so i would revert to the original code here as well (except for the period change);

@arw2019 arw2019 force-pushed the maybe_downcast_to_dtype_cleanup branch from 51557f4 to e46b10d Compare October 12, 2020 17:06
@jreback jreback merged commit fad14fd into pandas-dev:master Oct 14, 2020
@jreback
Copy link
Contributor

jreback commented Oct 14, 2020

thanks @arw2019

@simonjayhawkins
Copy link
Member

@arw2019 this solution could be to revert this and re-open? The mypy errors are

pandas\core\dtypes\cast.py:173: error: Item "type" of "Union[ExtensionDtype, Any, Type[object]]" has no attribute "type"  [union-attr]
pandas\core\dtypes\cast.py:179: error: Item "ExtensionDtype" of "Union[ExtensionDtype, Any, Type[object]]" has no attribute "freq"  [union-attr]
pandas\core\dtypes\cast.py:179: error: Item "type" of "Union[ExtensionDtype, Any, Type[object]]" has no attribute "freq"  [union-attr]

so i'm a bit reluctant to ignore to get mypy to green.

The issue is related to #37024 (comment)

so maybe only str and dtypeObj are accepted. so could also change to Union[str, DtypeObj] but this I think is unusual and should normally be DtypeObj or Dtype.

@arw2019
Copy link
Member Author

arw2019 commented Oct 14, 2020

@arw2019 this solution could be to revert this and re-open? The mypy errors are

pandas\core\dtypes\cast.py:173: error: Item "type" of "Union[ExtensionDtype, Any, Type[object]]" has no attribute "type"  [union-attr]
pandas\core\dtypes\cast.py:179: error: Item "ExtensionDtype" of "Union[ExtensionDtype, Any, Type[object]]" has no attribute "freq"  [union-attr]
pandas\core\dtypes\cast.py:179: error: Item "type" of "Union[ExtensionDtype, Any, Type[object]]" has no attribute "freq"  [union-attr]

so i'm a bit reluctant to ignore to get mypy to green.

Your call since I'm still pretty new to the code base.

The issue is related to #37024 (comment)

so maybe only str and dtypeObj are accepted. so could also change to Union[str, DtypeObj] but this I think is unusual and should normally be DtypeObj or Dtype.

Again I'm fairly new but I think it's possible Union[str, DtypeObj] is right, at least based on the tests we've got for this:

(pandas-dev) root@27041e356f25:/workspaces/pandas-arw2019# grep -r -n maybe_downcast_to_dtype pandas/tests
pandas/tests/dtypes/cast/test_downcast.py:6:from pandas.core.dtypes.cast import maybe_downcast_to_dtype
pandas/tests/dtypes/cast/test_downcast.py:40:    result = maybe_downcast_to_dtype(arr, dtype)
pandas/tests/dtypes/cast/test_downcast.py:47:    result = maybe_downcast_to_dtype(ser, np.dtype(np.float64))
pandas/tests/dtypes/cast/test_downcast.py:58:    result = maybe_downcast_to_dtype(arr, "infer")
pandas/tests/dtypes/cast/test_downcast.py:69:    result = maybe_downcast_to_dtype(arr, "infer")
pandas/tests/dtypes/cast/test_downcast.py:76:    result = maybe_downcast_to_dtype(arr, "int64")
pandas/tests/dtypes/cast/test_downcast.py:86:    res = maybe_downcast_to_dtype(arr, dtype)
pandas/tests/dtypes/cast/test_downcast.py:97:    res = maybe_downcast_to_dtype(obj, exp.dtype)

For reference actually only use this method in a few places in the code:

pandas/core/frame.py:88:    maybe_downcast_to_dtype,
pandas/core/frame.py:6247:            arr = maybe_downcast_to_dtype(arr, this_dtype)
pandas/core/indexes/interval.py:22:    maybe_downcast_to_dtype,
pandas/core/indexes/interval.py:1301:            breaks = maybe_downcast_to_dtype(breaks, "int64")
pandas/core/tools/numeric.py:5:from pandas.core.dtypes.cast import maybe_downcast_to_dtype
pandas/core/tools/numeric.py:182:                    values = maybe_downcast_to_dtype(values, dtype)
pandas/core/reshape/pivot.py:18:from pandas.core.dtypes.cast import maybe_downcast_to_dtype
pandas/core/reshape/pivot.py:128:                agged[v] = maybe_downcast_to_dtype(agged[v], data[v].dtype)
pandas/core/reshape/pivot.py:275:                maybe_downcast_to_dtype, args=(dtype,)
pandas/core/dtypes/cast.py:122:def maybe_downcast_to_dtype(result, dtype):
pandas/core/dtypes/cast.py:189:    Subset of maybe_downcast_to_dtype restricted to numeric dtypes.
pandas/core/dtypes/cast.py:298:            result = maybe_downcast_to_dtype(result, dtype)
pandas/core/internals/blocks.py:24:    maybe_downcast_to_dtype,
pandas/core/internals/blocks.py:533:            nv = maybe_downcast_to_dtype(values, dtypes)
pandas/core/internals/blocks.py:550:            val = maybe_downcast_to_dtype(val, dtype="infer")

@simonjayhawkins
Copy link
Member

we'll go with the revert for now even though the fix is straightfoward to allow unpressured discussion on the topic. OK with reopening this PR?

@arw2019
Copy link
Member Author

arw2019 commented Oct 14, 2020

we'll go with the revert for now even though the fix is straightfoward to allow unpressured discussion on the topic. OK with reopening this PR?

Sounds good to me!

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants