Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement astype portion of #24024 #24405

Merged
merged 20 commits into from Dec 28, 2018

Conversation

Projects
None yet
3 participants
@jbrockmendel
Copy link
Member

commented Dec 24, 2018

and accompanying tests

Uses _eadata like #24394

Constitutes ~10% of the diff of #24024, more after that gets rebased.

@codecov

This comment has been minimized.

Copy link

commented Dec 24, 2018

Codecov Report

Merging #24405 into master will increase coverage by 0.01%.
The diff coverage is 98.57%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24405      +/-   ##
==========================================
+ Coverage    92.3%   92.31%   +0.01%     
==========================================
  Files         163      163              
  Lines       51943    51965      +22     
==========================================
+ Hits        47946    47972      +26     
+ Misses       3997     3993       -4
Flag Coverage Δ
#multiple 90.72% <98.57%> (+0.01%) ⬆️
#single 43.01% <57.14%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/dtypes/missing.py 93.1% <ø> (ø) ⬆️
pandas/core/arrays/datetimelike.py 96.48% <100%> (+0.12%) ⬆️
pandas/core/indexes/datetimelike.py 97.57% <100%> (-0.01%) ⬇️
pandas/core/arrays/datetimes.py 98.56% <100%> (+0.78%) ⬆️
pandas/core/indexes/timedeltas.py 90.44% <100%> (+0.02%) ⬆️
pandas/core/arrays/period.py 98.41% <100%> (-0.06%) ⬇️
pandas/core/indexes/period.py 93.09% <100%> (+0.01%) ⬆️
pandas/core/indexes/datetimes.py 96.28% <100%> (-0.05%) ⬇️
pandas/core/arrays/timedeltas.py 87.22% <92.3%> (+0.36%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc7bc3f...6a5c216. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Dec 24, 2018

Codecov Report

Merging #24405 into master will increase coverage by 0.01%.
The diff coverage is 98.59%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24405      +/-   ##
==========================================
+ Coverage    92.3%   92.31%   +0.01%     
==========================================
  Files         165      165              
  Lines       52176    52194      +18     
==========================================
+ Hits        48161    48183      +22     
+ Misses       4015     4011       -4
Flag Coverage Δ
#multiple 90.73% <98.59%> (+0.01%) ⬆️
#single 42.96% <50.7%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/dtypes/missing.py 93.1% <ø> (ø) ⬆️
pandas/core/indexes/base.py 96.28% <ø> (ø) ⬆️
pandas/core/arrays/datetimelike.py 96.08% <100%> (+0.15%) ⬆️
pandas/core/indexes/datetimelike.py 97.53% <100%> (-0.04%) ⬇️
pandas/core/arrays/datetimes.py 98.39% <100%> (+0.78%) ⬆️
pandas/core/indexes/timedeltas.py 90.25% <100%> (-0.03%) ⬇️
pandas/core/arrays/period.py 98.42% <100%> (-0.06%) ⬇️
pandas/core/indexes/period.py 92.69% <100%> (-0.02%) ⬇️
pandas/core/indexes/datetimes.py 96.14% <100%> (-0.07%) ⬇️
pandas/core/arrays/timedeltas.py 87.36% <94.44%> (+0.44%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1b2a52...eac662b. Read the comment docs.

jbrockmendel added some commits Dec 24, 2018

@jreback jreback added this to the 0.24.0 milestone Dec 24, 2018

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 24, 2018

is this contingent on #24394 ? should that be first?

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Dec 24, 2018

is this contingent on #24394 ? should that be first?

No, they are independent. They just both implement+use _eadata.

Show resolved Hide resolved pandas/core/arrays/datetimes.py
Show resolved Hide resolved pandas/core/arrays/datetimelike.py
Show resolved Hide resolved pandas/core/arrays/datetimelike.py
Show resolved Hide resolved pandas/core/indexes/datetimelike.py Outdated
Show resolved Hide resolved pandas/core/indexes/period.py
Show resolved Hide resolved pandas/tests/arrays/test_period.py
Show resolved Hide resolved pandas/tests/indexes/datetimes/test_astype.py
Show resolved Hide resolved pandas/tests/indexes/timedeltas/test_astype.py
@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Dec 24, 2018

added requested assertion in test_period and passed copy=copy to numpy copy in the DatetimeLikeArrayMixin.astype case where it was obvious. For the rest I'm going to leave that to Tom in #24024 post-rebase (and if this goes through I'll make a PR on his branch to help rebase)

@jbrockmendel jbrockmendel referenced this pull request Dec 24, 2018

Merged

POC: _eadata #24394

@jreback
Copy link
Contributor

left a comment

@jbrockmendel its much simpler if you actually respond to each comment and resolve if you are doing it, otherwise comment.

Show resolved Hide resolved pandas/core/arrays/timedeltas.py Outdated
Show resolved Hide resolved pandas/core/arrays/timedeltas.py
Show resolved Hide resolved pandas/core/arrays/timedeltas.py Outdated
# dtype=object to disable inference. But, DTA.astype ignores
# integer sign and size, so we need to detect that case and
# just choose int64.
dtype = pandas_dtype(dtype)

This comment has been minimized.

Copy link
@jreback

jreback Dec 25, 2018

Contributor

not sure this is necessary as it already coerces properly, doing it here is very weird.

In [2]: pd.Index([1,2,3],dtype='int32')
Out[2]: Int64Index([1, 2, 3], dtype='int64')

This comment has been minimized.

Copy link
@jreback

jreback Dec 26, 2018

Contributor

did you address this?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Dec 26, 2018

Author Member

Not in the last 8 hours, no. May need to wait on Tom to clarify, since all of this was taken from 24024.

(the fact that these things get closer attention in smaller doses reassures me that splitting is a good idea, even if it does cause rebasing hassles in the parent PR)

This comment has been minimized.

Copy link
@jreback

jreback Dec 26, 2018

Contributor

yep, sounds ok then.

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 26, 2018

Contributor

What’s the question here? Why we do the integer check? Astype ignores the sign and size. I suppose the index constructor just ignores the size?

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 28, 2018

Contributor

I'm not sure if you saw the part about uint vs. int.

So I'm just going to decide that the expected behavior for {Datetime,Timedelta,Period}Index.astype("uint{8,16,32,64}") is to return a UInt64Index. That means we can remove this check and just pass new_values through with the original dtype.

@jbrockmendel do you want to do that here? It's not at all tested, and will need a release note.

This comment has been minimized.

Copy link
@jreback

jreback Dec 28, 2018

Contributor

ok by me (of course its weird to do this, but hey)

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Dec 28, 2018

Author Member

do you want to do that here? It's not at all tested, and will need a release note.

I tried this, pretty much just deleting ten lines here, and ended up getting two failures in pandas/tests/indexes/interval/test_astype.py. I can fix this by changing dtype=dtype to dtype=new_values.dtype in the call that wraps self._eadata.astype. Is that what you have in mind?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Dec 28, 2018

Author Member

Attempt #2 at this also failed. Any other ideas?

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 28, 2018

Contributor

Sorry I missed this note last night. I implemented this in 3fca810 if you could take a look.

Show resolved Hide resolved pandas/core/indexes/datetimelike.py Outdated
Show resolved Hide resolved pandas/core/indexes/period.py Outdated
Show resolved Hide resolved pandas/core/indexes/period.py
Show resolved Hide resolved pandas/tests/indexes/datetimes/test_astype.py
Show resolved Hide resolved pandas/tests/indexes/timedeltas/test_astype.py

@jbrockmendel jbrockmendel referenced this pull request Dec 26, 2018

Merged

REF: DatetimeLikeArray #24024

7 of 12 tasks complete
@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Dec 27, 2018

What are the new parts that was discovered in this PR?

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Dec 27, 2018

What are the new parts that was discovered in this PR?

Primarily the int32 casting stuff and some copy-related topics that I think have been resolved.

# dtype=object to disable inference. But, DTA.astype ignores
# integer sign and size, so we need to detect that case and
# just choose int64.
dtype = pandas_dtype(dtype)

This comment has been minimized.

Copy link
@jreback

jreback Dec 27, 2018

Contributor

having code here is just another inconsistency and maintenance burden. The constructor does the inference.

jbrockmendel and others added some commits Dec 28, 2018

@@ -88,24 +87,29 @@ def test_take_raises():
arr.take([0, -1], allow_fill=True, fill_value='foo')


@pytest.mark.parametrize('dtype', [int, np.int32, np.int64])
@pytest.mark.parametrize('dtype', [int, np.int32, np.int64, 'uint'])

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 28, 2018

Contributor

The addition of uint will cause 3fca810
to fail. Removing, since it's tested elsewhere now.

Though those tests are just index. I'll add ones for arrays as well...

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 28, 2018

Contributor

Fixed in 5fa32e9. The test is slightly more complicated now.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

I think
04efd45 broke TimedeltaIndex.astype(str). Looking into it now

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

Though now that I think about it, this is some pretty strange behavior

In [3]: pd.timedelta_range('2000', periods=2)._data.astype(str)[0]
Out[3]: Timedelta('0 days 00:00:00.000002')

Looks like we need to bring in the _format_native_types changes for TimedeltaArray. That should clear all this up.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

5d718e6 (hopefully) fixed TImedeltaArray._format_native_types to return an array of strings.

return self
elif is_period_dtype(dtype):
return self.to_period(freq=dtype.freq)
return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 28, 2018

Contributor

Just noticed... it'd be nice to leave a bunch of TODO: Use super for places like this.

Actually... I think Python2 will force us to make this changes when we switch inheritance to composition, since we won't be able to call the unbound method with a DatetimeIndex anymore (I think).

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

Py2 failure: https://travis-ci.org/pandas-dev/pandas/jobs/473068039#L2035. Also have some linting failures.

I'll assume you have other things to work on @jbrockmendel, and continue to push changes here if that's OK.

TomAugspurger added some commits Dec 28, 2018

Change default to str
This makes the default na repr match the expected type
for the formatter.
@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

e29d898 hopefully fixes the Py2 failure. At some point, the type for the na_rep parameter of the datetimelike _format_native_types became unicode. On the base class it's str (on whatever version of Python), and the underlying formatter seems to expect str (again, on whatever version of Python).

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

lgtm. assume you just rebased? and if you want to add TODO have at it. ping on green.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

By "just" rebased do you mean recently? If so I think the last one was
207ffb9.

I'll merge master again when I fix this last py2 error.

If you meant "is rebasing the only thing you did?" then no, there are some real changes here.


It looks like the period tests aren't happy with my changes to format arrays. On 0.23.4 we were inconsistent

In [5]: type(pd.period_range('2000', periods=2).astype(str)[0])
Out[5]: numpy.unicode_

In [6]: type(pd.date_range('2000', periods=2).astype(str)[0])
Out[6]: str

I'll revert the period change here and open an issue.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

Merged master and fixed the py2 issue (
a3c42f0).

I decided not to open an issue because the behavior is correct for Python 3 and we're not going to change this for 0.24, so who cares :)

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Dec 28, 2018

and continue to push changes here if that's OK.

Extremely happy to hand this one off, thanks for figuring it out.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

ok lgtm. ping on green.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

All green.

@jreback jreback merged commit f4f37f4 into pandas-dev:master Dec 28, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20181228.61 succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

thanks! nice @jbrockmendel and @TomAugspurger

with pytest.raises(TypeError, match=msg):
idx.astype(dtype)

@pytest.mark.parametrize('tz', [None, 'US/Central'])

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 28, 2018

Contributor

Whoops. This should probably be in indexes/datetimes/test_astype.py. Or rather, we should be using timedelta_range here.

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.