Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index #23524

Merged
merged 17 commits into from Nov 11, 2018

Conversation

Projects
None yet
5 participants
@jbrockmendel
Copy link
Member

commented Nov 6, 2018

Also fixes bug with DateOffset == "infer" incorrectly raising instead of returning False.

Also fixes bug(?) with pd.Index(dtindex, dtype=object) returning an index of datetimes instead of Timestamps, potentially losing nanoseconds.

  • closes #23491
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
@pep8speaks

This comment has been minimized.

Copy link

commented Nov 6, 2018

Hello @jbrockmendel! Thanks for submitting the PR.

Show resolved Hide resolved pandas/core/indexes/base.py Outdated
@codecov

This comment has been minimized.

Copy link

commented Nov 6, 2018

Codecov Report

Merging #23524 into master will decrease coverage by <.01%.
The diff coverage is 91.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23524      +/-   ##
==========================================
- Coverage   92.25%   92.25%   -0.01%     
==========================================
  Files         161      161              
  Lines       51262    51278      +16     
==========================================
+ Hits        47292    47304      +12     
- Misses       3970     3974       +4
Flag Coverage Δ
#multiple 90.63% <91.3%> (-0.01%) ⬇️
#single 42.32% <43.47%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimes.py 98.44% <100%> (+0.02%) ⬆️
pandas/core/indexes/base.py 96.45% <100%> (ø) ⬆️
pandas/tseries/offsets.py 97.07% <85.71%> (-0.15%) ⬇️
pandas/io/pytables.py 92.34% <0%> (-0.09%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update efd1844...cc7f5cd. Read the comment docs.

Show resolved Hide resolved pandas/core/arrays/datetimes.py
@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2018

Also fixes bug(?) with pd.Index(dtindex, dtype=object) returning an index of datetimes instead of Timestamps, potentially losing nanoseconds.

Seems like a fine change, but needs a release note.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 6, 2018

Seems like a fine change, but needs a release note.

Good point; done.

Show resolved Hide resolved doc/source/whatsnew/v0.24.0.txt Outdated
Show resolved Hide resolved pandas/_libs/tslibs/offsets.pyx
Show resolved Hide resolved pandas/core/arrays/datetimes.py
Show resolved Hide resolved pandas/core/indexes/base.py Outdated
Show resolved Hide resolved pandas/tseries/offsets.py
Show resolved Hide resolved doc/source/whatsnew/v0.24.0.txt Outdated
Show resolved Hide resolved pandas/core/arrays/datetimes.py
Show resolved Hide resolved pandas/core/indexes/base.py Outdated

@jreback jreback added this to the 0.24.0 milestone Nov 8, 2018

@jorisvandenbossche
Copy link
Member

left a comment

Small comment


# TODO: warn that dtype is not used?
# warn that conversion may be lossy?
return self._data.view(np.ndarray) # follow Index.__array__

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Nov 8, 2018

Member

Other question: what is the .view(np.ndarray) part doing if it is already an array? Can we remove it?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 8, 2018

Author Member

It can probably be removed; this is taken directly from the Index.__array__ implementation, so I think the maybe-removing this should be done at the same time those methods are overhauled (ill be opening an Issue shortly)

@@ -57,6 +57,54 @@ def timedelta_index(request):

class TestDatetimeArray(object):

def test_array_object_dtype(self, tz_naive_fixture):

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Nov 8, 2018

Member

It think it would be good to add such a test to the base extension tests as well?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 8, 2018

Author Member

I don't know those tests well enough to have an informed opinion. AFAIK ExtensionArray doesn't implement __array__, so it isn't clear that this is supported in the general case.

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Nov 8, 2018

Contributor

EA implements __iter__, which should be sufficient.

This test would be slightly opinionated for a base test, in case an EA wants to be converted to a specific NumPy type, but I think it's OK.

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Nov 8, 2018

Contributor

I guess we have base.interface.BaseInterfaceTests.test_array_interface which checks

    def test_array_interface(self, data):
        result = np.array(data)
        assert result[0] == data[0]

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Nov 9, 2018

Member

Ah, yes, that's already a generic test. OK, since that does not actually test the return dtype, it's good to have more explicit tests here.

Should we expect from EA that np.array(EA, dtype=object) always works (returns an object array of scalars)?
That seems like an OK assumption to me, since this already happens if you don't implement __array__, so we can expect this as well if the EA author implements a custom __array__ I think.

Show resolved Hide resolved pandas/core/arrays/datetimes.py Outdated
if is_object_dtype(dtype):
return np.array(list(self), dtype=object)
elif is_int64_dtype(dtype):
return self.asi8

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Nov 9, 2018

Member

I think we can remove this elif branch. Numpy will afterwards convert the M8[ns] data to int, and in that way ensure the semantics of np.asarray regarding copy/no copy is followed.

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Nov 9, 2018

Member

@jbrockmendel I opened #23593, a PR doing the __array__ for all datetimelike EAs, not only DatetimeArray (but so there is a bit of overlap with this PR)

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 9, 2018

Author Member

Great, I'll take a look at 23593.

jbrockmendel added some commits Nov 9, 2018

@jreback jreback merged commit 58a59bd into pandas-dev:master Nov 11, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20181109.29 succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 11, 2018

thanks @jbrockmendel

#23593 is the followup to address __array__.

@jbrockmendel jbrockmendel deleted the jbrockmendel:index_bugs branch Nov 11, 2018

thoo added a commit to thoo/pandas that referenced this pull request Nov 11, 2018

Merge remote-tracking branch 'upstream/master' into to_html-to_string
* upstream/master:
  BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524)
  BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544)
  API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561)
  CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583)
  DOC: Fix Order of parameters in docstrings (pandas-dev#23611)
  TST: Unskip some Categorical Tests (pandas-dev#23613)
  TST: Fix integer ops comparison test (pandas-dev#23619)
  DOC: Fixes to docstring to add validation to CI (pandas-dev#23560)
  DOC: Remove incorrect periods at the end of parameter types (pandas-dev#23600)
  MAINT: tm.assert_raises_regex --> pytest.raises (pandas-dev#23592)
  DOC: Updating Series.resample and DataFrame.resample docstrings (pandas-dev#23197)

thoo added a commit to thoo/pandas that referenced this pull request Nov 12, 2018

Merge remote-tracking branch 'upstream/master' into io_csv_docstring_…
…fixed

* upstream/master:
  DOC: Enhancing pivot / reshape docs (pandas-dev#21038)
  TST: Fix xfailing DataFrame arithmetic tests by transposing (pandas-dev#23620)
  BUILD: Simplifying contributor dependencies (pandas-dev#23522)
  BUG/REF: TimedeltaIndex.__new__ (pandas-dev#23539)
  BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524)
  BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544)
  API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561)
  CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583)
  DOC: Fix Order of parameters in docstrings (pandas-dev#23611)
  TST: Unskip some Categorical Tests (pandas-dev#23613)
  TST: Fix integer ops comparison test (pandas-dev#23619)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.