Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: consistent __array__ for datetime-like ExtensionArrays #23593

Merged
merged 18 commits into from
Jan 5, 2019

Conversation

jorisvandenbossche
Copy link
Member

closes #23569

Already makes the __array__ for the datetimelike EAs consistent with each other and with Series/Index (for the default case of not specifying a dtype)

@pep8speaks
Copy link

pep8speaks commented Nov 9, 2018

Hello @jorisvandenbossche! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on January 05, 2019 at 20:26 Hours UTC

@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Nov 9, 2018
@jorisvandenbossche jorisvandenbossche added this to the 0.24.0 milestone Nov 9, 2018
@TomAugspurger
Copy link
Contributor

isort failure:

Check import format using isort
ERROR: /home/travis/build/pandas-dev/pandas/pandas/tests/extension/base/interface.py Imports are incorrectly sorted.

@jorisvandenbossche
Copy link
Member Author

Yeah, I will still need to get used to it :-)
Do you have it as a commit hook?

@TomAugspurger
Copy link
Contributor

I have for dask-ml. I'm experimenting with it in pandas today.

  1. install pre-commit
  2. paste this to .pre-commit-config.yaml
-   repo: https://github.com/pre-commit/mirrors-isort
    rev: 'f35773e46d096de5c45365f1a47eeeef36fc83ed'  # Use the revision sha / tag you want to point at
    hooks:
    -   id: isort

-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v2.0.0  # Use the ref you want to point at
    hooks:
    -   id: flake8
  1. run pre-commit install

I'll open an issue suggesting that for the docs if it works well.

@codecov
Copy link

codecov bot commented Nov 9, 2018

Codecov Report

Merging #23593 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23593      +/-   ##
==========================================
+ Coverage   92.37%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52379    52377       -2     
==========================================
  Hits        48386    48386              
+ Misses       3993     3991       -2
Flag Coverage Δ
#multiple 90.8% <100%> (ø) ⬆️
#single 43.01% <77.77%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/timedeltas.py 88.24% <ø> (+0.05%) ⬆️
pandas/core/arrays/datetimelike.py 97.86% <100%> (+0.01%) ⬆️
pandas/core/arrays/period.py 98.54% <100%> (ø) ⬆️
pandas/core/arrays/datetimes.py 97.84% <100%> (-0.01%) ⬇️
pandas/util/testing.py 88.09% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f074abe...4120586. Read the comment docs.

@@ -126,6 +126,24 @@ def test_int_properties(self, datetime_index, propname):

tm.assert_numpy_array_equal(result, expected)

def test_array(self, datetime_index):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while you're at it, can you make sure there are analogous tests for the DTI/TDI/PI index classes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I only have to see then a bit how that overlaps with your PR then

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if there are any existing tests on this somewhere (for the index classes)? Not directly finding something

@jbrockmendel
Copy link
Member

jbrockmendel commented Nov 9, 2018

Implement view while at it?
Wrong button, woops

@jbrockmendel jbrockmendel reopened this Nov 9, 2018
@jorisvandenbossche
Copy link
Member Author

Implement view while at it?

On EAs? Is there a reason to do that?

@jbrockmendel
Copy link
Member

On EAs? Is there a reason to do that?

IIRC from other branches we'll need to do it eventually. Since I don't have the precise reason on hand, it's reasonable to wait until actually needed.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 9, 2018

From #23569 (comment)

You questioned whether we should implement np.array(period_array, dtype=int). You haven't done that right?

I'm not really clear what's best here. Do we publicly document exactly what an ordinal is, beyond just saying that it's an integer (though maybe that's enough justify making int)?

@jreback
Copy link
Contributor

jreback commented Nov 11, 2018

@jbrockmendel can you rebase this.

@TomAugspurger
Copy link
Contributor

This one is from Joris :)

@jreback
Copy link
Contributor

jreback commented Nov 11, 2018

ok then @jorisvandenbossche lgtm.

@jreback
Copy link
Contributor

jreback commented Nov 23, 2018

@jorisvandenbossche can you rebase

@jorisvandenbossche
Copy link
Member Author

Fixed the linting error.

But, we still need to decide on #23593 (comment): should np.array(period_array, dtype='int') work?
PeriodIndex currently does this (but also without making a copy, which I think we should change to conform to numpy behaviour).

I'm not really clear what's best here. Do we publicly document exactly what an ordinal is, beyond just saying that it's an integer (though maybe that's enough justify making int)?

I don't really care that much. If "power users" want to work with the integers, I think the asi8 attribute is more explicit to use? (and is already public)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 23, 2018 via email

@jreback
Copy link
Contributor

jreback commented Nov 23, 2018

yeah this is ok

@jreback
Copy link
Contributor

jreback commented Dec 28, 2018

need to merge master on this

@jreback
Copy link
Contributor

jreback commented Jan 3, 2019

@jorisvandenbossche @TomAugspurger @jbrockmendel should rebase this and see where things are.

@jreback
Copy link
Contributor

jreback commented Jan 4, 2019

@jorisvandenbossche can you merge master

@jorisvandenbossche
Copy link
Member Author

@jreback any other comments?

@TomAugspurger fine to merge this? There is some overlap with your other PR on __array__, but also still some content that is not in there.

@TomAugspurger
Copy link
Contributor

Yep, this is fine. I'd prefer this goes in first.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments

pandas/tests/extension/base/interface.py Show resolved Hide resolved
pandas/core/arrays/period.py Show resolved Hide resolved
# used for Timedelta/DatetimeArray, overwritten by PeriodArray
if is_object_dtype(dtype):
return np.array(list(self), dtype=object)
return self._data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would raise if dtype not in [None, object]

@jorisvandenbossche
Copy link
Member Author

I would raise if dtype not in [None, object]

Why? Eg for Datetime/TimedeltaArray, now np.asarray(.., dtype='M8[ns] / 'int64') now works as expected.
In case of PeriodArray, numpy will already raise an error if it cannot convert to the specified dtype.

@jreback
Copy link
Contributor

jreback commented Jan 4, 2019

Why? Eg for Datetime/TimedeltaArray, now np.asarray(.., dtype='M8[ns] / 'int64') now works as expected.
In case of PeriodArray, numpy will already raise an error if it cannot convert to the specified dtype.

outside of the specified dtypes, you must raise for a dtype if you are not passing to np.array() which I don't think you are.

but this is really easy to answer. provide tests that use non-valid dtypes and show that it raises.

@jorisvandenbossche
Copy link
Member Author

provide tests that use non-valid dtypes and show that it raises.

This is already tested for PeriodArray: https://github.com/pandas-dev/pandas/pull/23593/files#diff-fff2a1331dd7864425d7581505b95e58R622

@TomAugspurger
Copy link
Contributor

but this is really easy to answer. provide tests that use non-valid dtypes and show that it raises.

And that can't be a base test since we don't know what is and isn't valid for subtypes.

Agreed with @jorisvandenbossche that NumPy should handle the exception raising if the values aren't valid for the requested dtype.

@jreback
Copy link
Contributor

jreback commented Jan 5, 2019

can you merge master

@TomAugspurger
Copy link
Contributor

Merged master and

  • Updated DatetimeArray.__array__ to change the default dtype to object for DatetimeTZ
  • Fixed a lint error in the tests.

@jreback jreback merged commit 280a88f into pandas-dev:master Jan 5, 2019
@jreback
Copy link
Contributor

jreback commented Jan 5, 2019

thanks @jorisvandenbossche and @TomAugspurger

@jorisvandenbossche jorisvandenbossche deleted the ea-array-protocol branch January 6, 2019 21:08
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review/Overhaul __array__ and view methods
5 participants