implement astype portion of #24024 #24405

jbrockmendel · 2018-12-24T02:58:27Z

and accompanying tests

Uses _eadata like #24394

Constitutes ~10% of the diff of #24024, more after that gets rebased.

codecov · 2018-12-24T03:35:25Z

Codecov Report

Merging #24405 into master will increase coverage by 0.01%.
The diff coverage is 98.57%.

@@            Coverage Diff             @@
##           master   #24405      +/-   ##
==========================================
+ Coverage    92.3%   92.31%   +0.01%     
==========================================
  Files         163      163              
  Lines       51943    51965      +22     
==========================================
+ Hits        47946    47972      +26     
+ Misses       3997     3993       -4

Flag	Coverage Δ
#multiple	`90.72% <98.57%> (+0.01%)`	⬆️
#single	`43.01% <57.14%> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/dtypes/missing.py	`93.1% <ø> (ø)`	⬆️
pandas/core/arrays/datetimelike.py	`96.48% <100%> (+0.12%)`	⬆️
pandas/core/indexes/datetimelike.py	`97.57% <100%> (-0.01%)`	⬇️
pandas/core/arrays/datetimes.py	`98.56% <100%> (+0.78%)`	⬆️
pandas/core/indexes/timedeltas.py	`90.44% <100%> (+0.02%)`	⬆️
pandas/core/arrays/period.py	`98.41% <100%> (-0.06%)`	⬇️
pandas/core/indexes/period.py	`93.09% <100%> (+0.01%)`	⬆️
pandas/core/indexes/datetimes.py	`96.28% <100%> (-0.05%)`	⬇️
pandas/core/arrays/timedeltas.py	`87.22% <92.3%> (+0.36%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc7bc3f...6a5c216. Read the comment docs.

codecov · 2018-12-24T03:35:26Z

Codecov Report

Merging #24405 into master will increase coverage by 0.01%.
The diff coverage is 98.59%.

@@            Coverage Diff             @@
##           master   #24405      +/-   ##
==========================================
+ Coverage    92.3%   92.31%   +0.01%     
==========================================
  Files         165      165              
  Lines       52176    52194      +18     
==========================================
+ Hits        48161    48183      +22     
+ Misses       4015     4011       -4

Flag	Coverage Δ
#multiple	`90.73% <98.59%> (+0.01%)`	⬆️
#single	`42.96% <50.7%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/dtypes/missing.py	`93.1% <ø> (ø)`	⬆️
pandas/core/indexes/base.py	`96.28% <ø> (ø)`	⬆️
pandas/core/arrays/datetimelike.py	`96.08% <100%> (+0.15%)`	⬆️
pandas/core/indexes/datetimelike.py	`97.53% <100%> (-0.04%)`	⬇️
pandas/core/arrays/datetimes.py	`98.39% <100%> (+0.78%)`	⬆️
pandas/core/indexes/timedeltas.py	`90.25% <100%> (-0.03%)`	⬇️
pandas/core/arrays/period.py	`98.42% <100%> (-0.06%)`	⬇️
pandas/core/indexes/period.py	`92.69% <100%> (-0.02%)`	⬇️
pandas/core/indexes/datetimes.py	`96.14% <100%> (-0.07%)`	⬇️
pandas/core/arrays/timedeltas.py	`87.36% <94.44%> (+0.44%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1b2a52...eac662b. Read the comment docs.

jreback · 2018-12-24T18:26:06Z

is this contingent on #24394 ? should that be first?

jbrockmendel · 2018-12-24T18:30:24Z

is this contingent on #24394 ? should that be first?

No, they are independent. They just both implement+use _eadata.

pandas/core/arrays/datetimes.py

pandas/core/arrays/datetimelike.py

pandas/core/indexes/datetimelike.py

pandas/core/indexes/period.py

pandas/tests/arrays/test_period.py

pandas/tests/indexes/datetimes/test_astype.py

pandas/tests/indexes/timedeltas/test_astype.py

…ss24024b

jbrockmendel · 2018-12-24T22:47:34Z

added requested assertion in test_period and passed copy=copy to numpy copy in the DatetimeLikeArrayMixin.astype case where it was obvious. For the rest I'm going to leave that to Tom in #24024 post-rebase (and if this goes through I'll make a PR on his branch to help rebase)

…ss24024b

jreback

@jbrockmendel its much simpler if you actually respond to each comment and resolve if you are doing it, otherwise comment.

pandas/core/arrays/timedeltas.py

jreback · 2018-12-25T16:42:26Z

pandas/core/indexes/datetimelike.py

+        #  dtype=object to disable inference. But, DTA.astype ignores
+        #  integer sign and size, so we need to detect that case and
+        #  just choose int64.
+        dtype = pandas_dtype(dtype)


not sure this is necessary as it already coerces properly, doing it here is very weird.

In [2]: pd.Index([1,2,3],dtype='int32') Out[2]: Int64Index([1, 2, 3], dtype='int64')

did you address this?

Not in the last 8 hours, no. May need to wait on Tom to clarify, since all of this was taken from 24024.

(the fact that these things get closer attention in smaller doses reassures me that splitting is a good idea, even if it does cause rebasing hassles in the parent PR)

yep, sounds ok then.

What’s the question here? Why we do the integer check? Astype ignores the sign and size. I suppose the index constructor just ignores the size?

I'm not sure if you saw the part about uint vs. int.

So I'm just going to decide that the expected behavior for {Datetime,Timedelta,Period}Index.astype("uint{8,16,32,64}") is to return a UInt64Index. That means we can remove this check and just pass new_values through with the original dtype.

@jbrockmendel do you want to do that here? It's not at all tested, and will need a release note.

ok by me (of course its weird to do this, but hey)

do you want to do that here? It's not at all tested, and will need a release note.

I tried this, pretty much just deleting ten lines here, and ended up getting two failures in pandas/tests/indexes/interval/test_astype.py. I can fix this by changing dtype=dtype to dtype=new_values.dtype in the call that wraps self._eadata.astype. Is that what you have in mind?

Attempt #2 at this also failed. Any other ideas?

Sorry I missed this note last night. I implemented this in 3fca810 if you could take a look.

pandas/core/indexes/datetimelike.py

pandas/core/indexes/period.py

pandas/tests/indexes/datetimes/test_astype.py

pandas/tests/indexes/timedeltas/test_astype.py

…ss24024b

TomAugspurger · 2018-12-27T02:43:37Z

What are the new parts that was discovered in this PR?

jbrockmendel · 2018-12-27T17:04:46Z

What are the new parts that was discovered in this PR?

Primarily the int32 casting stuff and some copy-related topics that I think have been resolved.

jreback · 2018-12-27T22:11:34Z

pandas/core/indexes/datetimelike.py

+        #  dtype=object to disable inference. But, DTA.astype ignores
+        #  integer sign and size, so we need to detect that case and
+        #  just choose int64.
+        dtype = pandas_dtype(dtype)


having code here is just another inconsistency and maintenance burden. The constructor does the inference.

…ss24024b

pandas/core/arrays/timedeltas.py

…ss24024b

TomAugspurger · 2018-12-28T16:35:28Z

pandas/tests/arrays/test_period.py

@@ -88,24 +87,29 @@ def test_take_raises():
        arr.take([0, -1], allow_fill=True, fill_value='foo')


-@pytest.mark.parametrize('dtype', [int, np.int32, np.int64])
+@pytest.mark.parametrize('dtype', [int, np.int32, np.int64, 'uint'])


The addition of uint will cause 3fca810
to fail. Removing, since it's tested elsewhere now.

Though those tests are just index. I'll add ones for arrays as well...

Fixed in 5fa32e9. The test is slightly more complicated now.

TomAugspurger · 2018-12-28T16:48:49Z

I think
04efd45 broke TimedeltaIndex.astype(str). Looking into it now

TomAugspurger · 2018-12-28T16:56:37Z

Though now that I think about it, this is some pretty strange behavior

In [3]: pd.timedelta_range('2000', periods=2)._data.astype(str)[0]
Out[3]: Timedelta('0 days 00:00:00.000002')

Looks like we need to bring in the _format_native_types changes for TimedeltaArray. That should clear all this up.

TomAugspurger · 2018-12-28T17:03:24Z

5d718e6 (hopefully) fixed TImedeltaArray._format_native_types to return an array of strings.

TomAugspurger · 2018-12-28T17:06:01Z

pandas/core/arrays/datetimes.py

+            return self
+        elif is_period_dtype(dtype):
+            return self.to_period(freq=dtype.freq)
+        return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)


Just noticed... it'd be nice to leave a bunch of TODO: Use super for places like this.

Actually... I think Python2 will force us to make this changes when we switch inheritance to composition, since we won't be able to call the unbound method with a DatetimeIndex anymore (I think).

TomAugspurger · 2018-12-28T17:47:03Z

Py2 failure: https://travis-ci.org/pandas-dev/pandas/jobs/473068039#L2035. Also have some linting failures.

I'll assume you have other things to work on @jbrockmendel, and continue to push changes here if that's OK.

This makes the default na repr match the expected type for the formatter.

TomAugspurger · 2018-12-28T18:42:20Z

e29d898 hopefully fixes the Py2 failure. At some point, the type for the na_rep parameter of the datetimelike _format_native_types became unicode. On the base class it's str (on whatever version of Python), and the underlying formatter seems to expect str (again, on whatever version of Python).

jreback · 2018-12-28T19:05:26Z

lgtm. assume you just rebased? and if you want to add TODO have at it. ping on green.

TomAugspurger · 2018-12-28T19:26:26Z

By "just" rebased do you mean recently? If so I think the last one was
207ffb9.

I'll merge master again when I fix this last py2 error.

If you meant "is rebasing the only thing you did?" then no, there are some real changes here.

It looks like the period tests aren't happy with my changes to format arrays. On 0.23.4 we were inconsistent

In [5]: type(pd.period_range('2000', periods=2).astype(str)[0])
Out[5]: numpy.unicode_

In [6]: type(pd.date_range('2000', periods=2).astype(str)[0])
Out[6]: str

I'll revert the period change here and open an issue.

…24024b

TomAugspurger · 2018-12-28T19:29:22Z

Merged master and fixed the py2 issue (
a3c42f0).

I decided not to open an issue because the behavior is correct for Python 3 and we're not going to change this for 0.24, so who cares :)

jbrockmendel · 2018-12-28T19:31:30Z

and continue to push changes here if that's OK.

Extremely happy to hand this one off, thanks for figuring it out.

jreback · 2018-12-28T19:31:40Z

ok lgtm. ping on green.

TomAugspurger · 2018-12-28T20:30:37Z

All green.

jreback · 2018-12-28T20:31:36Z

thanks! nice @jbrockmendel and @TomAugspurger

TomAugspurger · 2018-12-28T21:19:36Z

pandas/tests/indexes/timedeltas/test_astype.py

        with pytest.raises(TypeError, match=msg):
            idx.astype(dtype)

+    @pytest.mark.parametrize('tz', [None, 'US/Central'])


Whoops. This should probably be in indexes/datetimes/test_astype.py. Or rather, we should be using timedelta_range here.

implement astype portion of pandas-dev#24024

6a5c216

jbrockmendel added 2 commits December 23, 2018 19:51

fixup unused import

1a9f30b

isort fixup

1b109b8

jreback added Dtype Conversions Unexpected or buggy dtype conversions Clean ExtensionArray Extending pandas with custom dtypes or arrays. labels Dec 24, 2018

jreback added this to the 0.24.0 milestone Dec 24, 2018

jreback requested changes Dec 24, 2018

View reviewed changes

jbrockmendel added 2 commits December 24, 2018 14:41

Merge branch 'master' of https://github.com/pandas-dev/pandas into le…

f271005

…ss24024b

pass copy kwarg

5615b9f

jbrockmendel mentioned this pull request Dec 24, 2018

POC: _eadata #24394

Merged

jbrockmendel added 2 commits December 25, 2018 07:14

Merge branch 'master' of https://github.com/pandas-dev/pandas into le…

d5cca5a

…ss24024b

revert change that brokethe world

184f59f

jreback requested changes Dec 25, 2018

View reviewed changes

jbrockmendel added 3 commits December 25, 2018 12:53

Merge branch 'master' of https://github.com/pandas-dev/pandas into le…

df39bd7

…ss24024b

comments, typo

e41068a

avoid double-copy

6f108dd

jbrockmendel mentioned this pull request Dec 26, 2018

REF: DatetimeLikeArray #24024

Merged

12 tasks

jbrockmendel mentioned this pull request Dec 27, 2018

Datetimelike Array Refactor #23185

Closed

jreback requested changes Dec 27, 2018

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into le…

b123d08

…ss24024b

jreback requested changes Dec 28, 2018

View reviewed changes

pandas/core/arrays/timedeltas.py Show resolved Hide resolved

Merge branch 'master' of https://github.com/pandas-dev/pandas into le…

207ffb9

…ss24024b

jbrockmendel and others added 2 commits December 27, 2018 20:18

sidestep int sign/size astype issues

04efd45

Implement UInt64 handling, tests, and docs

3fca810

TomAugspurger reviewed Dec 28, 2018

View reviewed changes

Handle uint in astype tests

5fa32e9

Fixed TimedeltaArray._format_native_types

5d718e6

TomAugspurger approved these changes Dec 28, 2018

View reviewed changes

TomAugspurger added 2 commits December 28, 2018 11:48

Linting

33b5434

Change default to str

e29d898

This makes the default na repr match the expected type for the formatter.

jreback approved these changes Dec 28, 2018

View reviewed changes

TomAugspurger added 2 commits December 28, 2018 13:27

revert for period

a3c42f0

Merge remote-tracking branch 'upstream/master' into jbrockmendel-less…

eac662b

…24024b

jreback merged commit f4f37f4 into pandas-dev:master Dec 28, 2018

TomAugspurger reviewed Dec 28, 2018

View reviewed changes

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

implement astype portion of pandas-dev#24024 (pandas-dev#24405)

0277ee7

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

implement astype portion of pandas-dev#24024 (pandas-dev#24405)

83b902c

jbrockmendel deleted the less24024b branch April 5, 2020 17:43

implement astype portion of #24024 #24405

implement astype portion of #24024 #24405

Conversation

jbrockmendel commented Dec 24, 2018

codecov bot commented Dec 24, 2018

Codecov Report

codecov bot commented Dec 24, 2018 • edited Loading

Codecov Report

jreback commented Dec 24, 2018

jbrockmendel commented Dec 24, 2018

jbrockmendel commented Dec 24, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger Dec 26, 2018 • edited Loading

Choose a reason for hiding this comment

TomAugspurger Dec 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Dec 27, 2018

jbrockmendel commented Dec 27, 2018

Choose a reason for hiding this comment

TomAugspurger Dec 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Dec 28, 2018

TomAugspurger commented Dec 28, 2018 • edited Loading

TomAugspurger commented Dec 28, 2018

Choose a reason for hiding this comment

TomAugspurger commented Dec 28, 2018

TomAugspurger commented Dec 28, 2018

jreback commented Dec 28, 2018

TomAugspurger commented Dec 28, 2018

TomAugspurger commented Dec 28, 2018 • edited Loading

jbrockmendel commented Dec 28, 2018

jreback commented Dec 28, 2018

TomAugspurger commented Dec 28, 2018

jreback commented Dec 28, 2018

TomAugspurger Dec 28, 2018 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Dec 24, 2018 •

edited

Loading

TomAugspurger Dec 26, 2018 •

edited

Loading

TomAugspurger Dec 28, 2018 •

edited

Loading

TomAugspurger Dec 28, 2018 •

edited

Loading

TomAugspurger commented Dec 28, 2018 •

edited

Loading

TomAugspurger commented Dec 28, 2018 •

edited

Loading

TomAugspurger Dec 28, 2018 •

edited

Loading