Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Remove/deprecate unused/misleading dtype functions #23917

Merged
merged 25 commits into from Nov 27, 2018

Conversation

Projects
None yet
5 participants
@jbrockmendel
Copy link
Member

commented Nov 26, 2018

For "misleading" the leading case is is_int_or_datetime_dtype, which includes timedelta64 but excludes datetime64tz.

Several branches in lib.infer_dtype are unreachable, are removed.

is_datetimetz is redundant with is_datetime64tz_dtype, is deprecated, and internal uses of it are changed.

is_period is both redundant and misleading, is deprecated (no internal usages outside of tests)

maybe_convert_string_to_object and maybe_convert_scalar are never used outside of tests, removed

xref #22137

cc: @h-vetinari IIRC you've been looking at other functions in core.dtypes.cast with an eye towards cleanup or de-duplication. Suggestions welcome.

@pep8speaks

This comment has been minimized.

Copy link

commented Nov 26, 2018

Hello @jbrockmendel! Thanks for submitting the PR.

jbrockmendel added some commits Nov 26, 2018

@gfyoung gfyoung added the Dtypes label Nov 26, 2018

@gfyoung

This comment has been minimized.

Copy link
Member

commented Nov 26, 2018

@jbrockmendel @h-vetinari : Have a look at #16242 as well. This issue might be resolvable based on what you guys are doing here...

@gfyoung gfyoung added the Clean label Nov 26, 2018

assert is_datetimetz(s.dtype)
assert not is_datetimetz(np.dtype('float64'))
assert not is_datetimetz(1.0)
with tm.assert_produces_warning(FutureWarning):

This comment has been minimized.

Copy link
@h-vetinari

h-vetinari Nov 26, 2018

Contributor

If each call is supposed to generate a warning, this will need more context managers.

@h-vetinari

This comment has been minimized.

Copy link
Contributor

commented Nov 26, 2018

Thanks for tagging me.

There's a couple of things I encountered in #23796, mainly:

  • astype_intsafe from _libs.lib is only used once in dtypes.cast.astype_nansafe on the object branch, where there's some confusing/contradictory/redundant branches (related to your integer/datetime/timedelta point)
    • the branch for if np.issubdtype(dtype.type, np.integer): catches all the timedeltas (but not the datetimes), both of which would have a dedicated branch further down. However, letting the timedeltas actually hit their dedicated branch means re-calling astype_nansafe after explicitly casting object dtype to timedelta, hitting a dedicated timedelta-dtype-branch, which casts to float. This then contradicts/fails the TestMerge::test_other_timedelta_unit test, because it expects timedelta output.
    • furthermore, that integer-condition is actually relevant for some IntegerArray cases, because there's no dedicated branch for extension array types (on the side of the caller) yet.
  • maybe_promote and maybe_upcast_putmask have some issues, #23823 / #23833 (actually many more than listed in those issues). Both are also completely untested directly, AFAICT. I'm currently in the process of writing tests for the former, and then hoping to fix the behaviour.
Show resolved Hide resolved doc/source/whatsnew/v0.24.0.rst
@@ -1125,6 +1127,7 @@ Removal of prior version deprecations/changes
- :meth:`SparseSeries.to_dense` has dropped the ``sparse_only`` parameter (:issue:`14686`)
- :meth:`DataFrame.astype` and :meth:`Series.astype` have renamed the ``raise_on_error`` argument to ``errors`` (:issue:`14967`)
- ``is_sequence``, ``is_any_int_dtype``, and ``is_floating_dtype`` have been removed from ``pandas.api.types`` (:issue:`16163`, :issue:`16189`)
- ``is_floating_dtype`` has been removed (:issue:`????`)

This comment has been minimized.

Copy link
@jreback

jreback Nov 26, 2018

Contributor

this looks like a dupe of the line above

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 26, 2018

Author Member

The existing whatsnew makes it look like this has already been removed, but it is still present in master, and not listed (as deprecated or removed) in #6581.

The whatsnew also says is_sequence and is_any_int_dtype have been removed, but each of these are still used in a few places internally. Get rid of those usages? (any_int_dtype only has two usages so that one will be easy)

Show resolved Hide resolved pandas/_libs/lib.pyx Outdated
Show resolved Hide resolved pandas/core/dtypes/common.py Outdated
Show resolved Hide resolved pandas/core/dtypes/common.py
@@ -1604,7 +1604,10 @@ def _factorize_keys(lk, rk, sort=True):

lk = ensure_int64(lk.codes)
rk = ensure_int64(rk)
elif is_int_or_datetime_dtype(lk) and is_int_or_datetime_dtype(rk):
elif (issubclass(lk.dtype.type, (np.integer, np.timedelta64,

This comment has been minimized.

Copy link
@jreback

jreback Nov 26, 2018

Contributor

this is not right, these should be in separate branches here, the datetime / timedeltas can be in a single elif clause.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 26, 2018

Author Member

If the behavior here is wrong then we should also add a test for it. I'll open a separate issue to address this, since I'm not sure off the top of my head what the correct behavior is.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 26, 2018

Author Member

opened #23929

This comment has been minimized.

Copy link
@jreback

jreback Nov 27, 2018

Contributor

right these actually this should be 2 clauses of

elif is_integer_dtype(....)
....
elif needs_i8_conversion(...)
...

This comment has been minimized.

Copy link
@jreback

jreback Nov 27, 2018

Contributor

i don't really like changing the way you are doing here.

i was objecting to the object conversion below, not really sure why that is done at all. if you can separate to 2 elif's would be better here

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 27, 2018

Author Member

OK. I'd really rather these be separated in a dedicated PR, but I guess #23929 can be a Needs Test Issue... will change.

This comment has been minimized.

Copy link
@jreback

jreback Nov 27, 2018

Contributor

yeah that's fine. I think there is another PR about adding EA types here as well (though of course orthogonal). see if this passes and can add tests / perf checking later.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 27, 2018

Author Member

had to revert use of needs_i8_conversion since it breaks on period_dtype

Show resolved Hide resolved pandas/tests/dtypes/test_common.py Outdated

@jsexauer jsexauer referenced this pull request Nov 26, 2018

Open

DEPR: deprecations from prior versions #6581

0 of 94 tasks complete

jbrockmendel added some commits Nov 26, 2018

@codecov

This comment has been minimized.

Copy link

commented Nov 26, 2018

Codecov Report

Merging #23917 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23917      +/-   ##
==========================================
+ Coverage   92.31%   92.31%   +<.01%     
==========================================
  Files         161      161              
  Lines       51488    51471      -17     
==========================================
- Hits        47530    47515      -15     
+ Misses       3958     3956       -2
Flag Coverage Δ
#multiple 90.7% <100%> (ø) ⬆️
#single 42.43% <66.66%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/datetimes.py 98.37% <ø> (ø) ⬆️
pandas/io/formats/format.py 97.76% <100%> (ø) ⬆️
pandas/core/indexes/datetimes.py 96.47% <100%> (ø) ⬆️
pandas/core/dtypes/common.py 94.77% <100%> (+0.39%) ⬆️
pandas/core/frame.py 97.02% <100%> (ø) ⬆️
pandas/core/algorithms.py 95.11% <100%> (ø) ⬆️
pandas/core/dtypes/concat.py 96.66% <100%> (ø) ⬆️
pandas/core/nanops.py 95.05% <100%> (ø) ⬆️
pandas/core/internals/blocks.py 93.71% <100%> (ø) ⬆️
pandas/core/reshape/merge.py 94.27% <100%> (+0.03%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d53a4cc...5bdc157. Read the comment docs.

@@ -1604,7 +1604,10 @@ def _factorize_keys(lk, rk, sort=True):

lk = ensure_int64(lk.codes)
rk = ensure_int64(rk)
elif is_int_or_datetime_dtype(lk) and is_int_or_datetime_dtype(rk):
elif (issubclass(lk.dtype.type, (np.integer, np.timedelta64,

This comment has been minimized.

Copy link
@jreback

jreback Nov 27, 2018

Contributor

right these actually this should be 2 clauses of

elif is_integer_dtype(....)
....
elif needs_i8_conversion(...)
...
@@ -1604,7 +1604,10 @@ def _factorize_keys(lk, rk, sort=True):

lk = ensure_int64(lk.codes)
rk = ensure_int64(rk)
elif is_int_or_datetime_dtype(lk) and is_int_or_datetime_dtype(rk):
elif (issubclass(lk.dtype.type, (np.integer, np.timedelta64,

This comment has been minimized.

Copy link
@jreback

jreback Nov 27, 2018

Contributor

i don't really like changing the way you are doing here.

i was objecting to the object conversion below, not really sure why that is done at all. if you can separate to 2 elif's would be better here

@jreback
Copy link
Contributor

left a comment

lgtm. merge on green.

@jreback jreback added this to the 0.24.0 milestone Nov 27, 2018

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 27, 2018

rebased, also fixed a docstring mistake I made in #23937.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 27, 2018

ok! ping on green.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 27, 2018

Ping

@jreback jreback merged commit ee283fa into pandas-dev:master Nov 27, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20181127.60 succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 27, 2018

thanks!

@jbrockmendel jbrockmendel deleted the jbrockmendel:dtypes branch Nov 27, 2018

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.