Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Make PeriodArray an ExtensionArray #22862

Merged
merged 149 commits into from
Oct 25, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
149 commits
Select commit Hold shift + click to select a range
eaadcbc
WIP: PeriodArray
TomAugspurger Sep 26, 2018
a05928a
WIP
TomAugspurger Sep 27, 2018
3c0d9ee
Just moves
TomAugspurger Sep 27, 2018
63fc3fa
PeriodArray.shift definition
TomAugspurger Sep 27, 2018
7d5d71c
_data type
TomAugspurger Sep 27, 2018
e5caac6
clean
TomAugspurger Sep 27, 2018
c194407
accessor wip
TomAugspurger Sep 27, 2018
eb4506b
some more wip
TomAugspurger Sep 27, 2018
1b9fd7a
tshift, shift
TomAugspurger Sep 28, 2018
0fa0ed1
Arithmetic
TomAugspurger Sep 28, 2018
3247ea8
repr changes
TomAugspurger Sep 28, 2018
c162cdd
wip
TomAugspurger Sep 28, 2018
611d378
freq setter
TomAugspurger Sep 28, 2018
fb2ff82
Added disabled ops
TomAugspurger Sep 28, 2018
25a380f
copy
TomAugspurger Sep 28, 2018
1b2c4ec
Support concat
TomAugspurger Sep 28, 2018
d04293e
object ctor
TomAugspurger Sep 28, 2018
eacad39
Updates
TomAugspurger Sep 28, 2018
70cd3b8
lint
TomAugspurger Sep 28, 2018
9b22889
lint
TomAugspurger Sep 28, 2018
87d289a
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 1, 2018
6369c7f
wip
TomAugspurger Oct 1, 2018
01551f0
more wip
TomAugspurger Oct 1, 2018
0437940
array-setitem
TomAugspurger Oct 1, 2018
42ab137
wip
TomAugspurger Oct 1, 2018
298390f
wip
TomAugspurger Oct 1, 2018
23e5cfc
Use ._tshift internally for datetimelike ops
TomAugspurger Oct 2, 2018
9d17fd2
deep
TomAugspurger Oct 2, 2018
959cd72
Squashed commit of the following:
TomAugspurger Oct 2, 2018
b66f617
Squashed commit of the following:
TomAugspurger Oct 2, 2018
5669675
fixup
TomAugspurger Oct 2, 2018
2c0311c
The rest of the EA tests
TomAugspurger Oct 2, 2018
012be1c
docs
TomAugspurger Oct 2, 2018
c3a96d0
Merge remote-tracking branch 'upstream/master' into datetimelike-tshift
TomAugspurger Oct 3, 2018
67faabc
rename to time_shift
TomAugspurger Oct 3, 2018
ff7c06c
Squashed commit of the following:
TomAugspurger Oct 3, 2018
c2d57bd
Squashed commit of the following:
TomAugspurger Oct 3, 2018
fbde770
Squashed commit of the following:
TomAugspurger Oct 3, 2018
1c4bbe7
Squashed commit of the following:
TomAugspurger Oct 3, 2018
b395c90
fixed merge conflict
TomAugspurger Oct 3, 2018
d68a5c5
Handle divmod test
TomAugspurger Oct 3, 2018
0c7b704
extension tests passing
TomAugspurger Oct 3, 2018
d26d3d2
Squashed commit of the following:
TomAugspurger Oct 4, 2018
e4babea
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 4, 2018
7f6c144
merge conflict
TomAugspurger Oct 4, 2018
b4aa4ca
wip
TomAugspurger Oct 4, 2018
6a70131
indexes passing
TomAugspurger Oct 4, 2018
9aa077c
op names
TomAugspurger Oct 4, 2018
411738c
extension, arrays passing
TomAugspurger Oct 4, 2018
8e0fb69
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 9, 2018
6d98e85
fixup
TomAugspurger Oct 9, 2018
6d9e150
lint
TomAugspurger Oct 9, 2018
4899479
Fixed to_timestamp
TomAugspurger Oct 9, 2018
634def1
Same error message for index, series
TomAugspurger Oct 9, 2018
1f18452
Fix freq handling in to_timestamp
TomAugspurger Oct 9, 2018
2f92b22
dtype update
TomAugspurger Oct 9, 2018
23f232c
accept kwargs
TomAugspurger Oct 9, 2018
dd3b8cd
fixups
TomAugspurger Oct 9, 2018
1a7c360
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 9, 2018
87ecb64
updates
TomAugspurger Oct 9, 2018
0bde329
explicit
TomAugspurger Oct 9, 2018
2d85a82
add to assert
TomAugspurger Oct 9, 2018
438e6b5
wip period_array
TomAugspurger Oct 10, 2018
a9456fd
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 10, 2018
ac05365
wip period_array
TomAugspurger Oct 10, 2018
36ed547
order
TomAugspurger Oct 10, 2018
4652ca7
sort order
TomAugspurger Oct 10, 2018
a047a1b
test for hashing
TomAugspurger Oct 10, 2018
a4a30d7
update
TomAugspurger Oct 10, 2018
1441ae6
lint
TomAugspurger Oct 10, 2018
8003808
boxing
TomAugspurger Oct 10, 2018
5f43753
fix fixtures
TomAugspurger Oct 10, 2018
1c13d0f
infer
TomAugspurger Oct 10, 2018
bae6b3d
Remove seemingly unreachable code
TomAugspurger Oct 10, 2018
f422cf0
lint
TomAugspurger Oct 10, 2018
0229d74
wip
TomAugspurger Oct 12, 2018
aa40cf4
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 12, 2018
29085e1
Updates for master
TomAugspurger Oct 12, 2018
00ffddf
simplify
TomAugspurger Oct 12, 2018
e81fa9c
wip
TomAugspurger Oct 12, 2018
0c8925f
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 15, 2018
96204a1
remove view
TomAugspurger Oct 15, 2018
82930f7
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 17, 2018
8d24582
simplify
TomAugspurger Oct 17, 2018
1fc7744
lint
TomAugspurger Oct 17, 2018
6cd428c
Removed add_comparison_methods
TomAugspurger Oct 17, 2018
21693e0
xfail op
TomAugspurger Oct 17, 2018
b65ffad
remove some
TomAugspurger Oct 17, 2018
1f438e3
constructors
TomAugspurger Oct 17, 2018
f3928fb
Constructor cleanup
TomAugspurger Oct 17, 2018
089f8ab
misc fixups
TomAugspurger Oct 17, 2018
700650a
more xfails
TomAugspurger Oct 17, 2018
452c229
typo
TomAugspurger Oct 17, 2018
e3e0e57
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 18, 2018
78751c2
Added asi8
TomAugspurger Oct 18, 2018
203d561
Allow setting nan
TomAugspurger Oct 18, 2018
eb1c67d
revert breaking docs
TomAugspurger Oct 18, 2018
e08aa79
Override _add_sub_int_array
TomAugspurger Oct 18, 2018
c1ee04b
lint
TomAugspurger Oct 18, 2018
827e563
Update PeriodIndex._simple_new
TomAugspurger Oct 18, 2018
ca4a7fd
Clean up uses of .values, ._values, ._ndarray_values, ._data
TomAugspurger Oct 18, 2018
ed185c0
one more values
TomAugspurger Oct 18, 2018
b3407ac
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 18, 2018
a4011eb
remove xfails
TomAugspurger Oct 18, 2018
fc1ca3c
Fixed freq handling in _shallow_copy with a freq
TomAugspurger Oct 18, 2018
1b1841f
test updates
TomAugspurger Oct 18, 2018
b3b315a
API: Keep PeriodIndex.values an ndarray
TomAugspurger Oct 18, 2018
3ab4176
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 18, 2018
8102475
BUG: Raise for non-equal freq in take
TomAugspurger Oct 18, 2018
8c329eb
Punt on DataFrame.replace specializing
TomAugspurger Oct 18, 2018
78d4960
lint
TomAugspurger Oct 18, 2018
4e3d914
fixed xfail message
TomAugspurger Oct 18, 2018
5e4aaa7
TST: _from_datetime64
TomAugspurger Oct 19, 2018
7f77563
Fixups
TomAugspurger Oct 19, 2018
f88d6f7
escape
TomAugspurger Oct 19, 2018
7aa78ba
dtype
TomAugspurger Oct 19, 2018
2d737f8
revert and unxfail values
TomAugspurger Oct 19, 2018
833899a
error catching
TomAugspurger Oct 19, 2018
236b49c
isort
TomAugspurger Oct 19, 2018
8230347
Avoid PeriodArray.values
TomAugspurger Oct 19, 2018
bf33a57
clarify _box_func usage
TomAugspurger Oct 19, 2018
738acfe
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 19, 2018
032ec02
TST: unxfail ops tests
TomAugspurger Oct 19, 2018
77e389a
Avoid use of .values
jorisvandenbossche Oct 19, 2018
61031d7
__setitem__ type
TomAugspurger Oct 19, 2018
a094b3d
Misc cleanups
TomAugspurger Oct 19, 2018
ace4856
lint
TomAugspurger Oct 19, 2018
fc6a1c7
API: remove ordinal from period_array
TomAugspurger Oct 19, 2018
900afcf
catch exception
TomAugspurger Oct 19, 2018
0baa3e9
misc cleanup
TomAugspurger Oct 19, 2018
f95106e
Handle astype integer size
TomAugspurger Oct 19, 2018
e57e24a
Bump test coverage
TomAugspurger Oct 19, 2018
ce1c970
remove partial test
TomAugspurger Oct 19, 2018
a7e1216
close bracket
TomAugspurger Oct 19, 2018
2548d6a
change the test
TomAugspurger Oct 19, 2018
02e3863
isort
TomAugspurger Oct 19, 2018
1997cff
consistent _data
TomAugspurger Oct 19, 2018
af2d1de
lint
TomAugspurger Oct 19, 2018
64f5778
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 20, 2018
4151510
ndarray_values -> asi8
TomAugspurger Oct 20, 2018
ac9bd41
colocate ops
TomAugspurger Oct 20, 2018
5462bd7
refactor PeriodIndex.item
TomAugspurger Oct 20, 2018
c1c6428
return NotImplemented for Series / Index
TomAugspurger Oct 20, 2018
7ab2736
remove xpass
TomAugspurger Oct 20, 2018
bd6f966
release note
TomAugspurger Oct 22, 2018
8068daf
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 23, 2018
5691506
types, use data
TomAugspurger Oct 23, 2018
575d61a
remove ufunc xpass
TomAugspurger Oct 24, 2018
4065bdb
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 25, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 19 additions & 10 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -145,33 +145,41 @@ Current Behavior:

.. _whatsnew_0240.enhancements.interval:

Storing Interval Data in Series and DataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Storing Interval and Period Data in Series and DataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Interval data may now be stored in a ``Series`` or ``DataFrame``, in addition to an
:class:`IntervalIndex` like previously (:issue:`19453`).
Interval and Period data may now be stored in a ``Series`` or ``DataFrame``, in addition to an
:class:`IntervalIndex` and :class:`PeriodIndex` like previously (:issue:`19453`, :issue:`22862`).

.. ipython:: python

ser = pd.Series(pd.interval_range(0, 5))
ser
ser.dtype

Previously, these would be cast to a NumPy array of ``Interval`` objects. In general,
this should result in better performance when storing an array of intervals in
a :class:`Series`.
And for periods:

.. ipython:: python

pser = pd.Series(pd.date_range("2000", freq="D", periods=5))
pser
pser.dtype

Previously, these would be cast to a NumPy array with object dtype. In general,
this should result in better performance when storing an array of intervals or periods
in a :class:`Series` or column of a :class:`DataFrame`.

Note that the ``.values`` of a ``Series`` containing intervals is no longer a NumPy
Note that the ``.values`` of a ``Series`` containing one of these types is no longer a NumPy
array, but rather an ``ExtensionArray``:

.. ipython:: python

ser.values
pser.values

This is the same behavior as ``Series.values`` for categorical data. See
:ref:`whatsnew_0240.api_breaking.interval_values` for more.


.. _whatsnew_0240.enhancements.other:

Other Enhancements
Expand Down Expand Up @@ -360,7 +368,7 @@ New Behavior:
This mirrors ``CategoricalIndex.values``, which returns a ``Categorical``.

For situations where you need an ``ndarray`` of ``Interval`` objects, use
:meth:`numpy.asarray` or ``idx.astype(object)``.
:meth:`numpy.asarray`.

.. ipython:: python

Expand Down Expand Up @@ -810,6 +818,7 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your
- Updated the ``.type`` attribute for ``PeriodDtype``, ``DatetimeTZDtype``, and ``IntervalDtype`` to be instances of the dtype (``Period``, ``Timestamp``, and ``Interval`` respectively) (:issue:`22938`)
- :func:`ExtensionArray.isna` is allowed to return an ``ExtensionArray`` (:issue:`22325`).
- Support for reduction operations such as ``sum``, ``mean`` via opt-in base class method override (:issue:`22762`)
- :meth:`Series.unstack` no longer converts extension arrays to object-dtype ndarrays. The output ``DataFrame`` will now have the same dtype as the input. This changes behavior for Categorical and Sparse data (:issue:`23077`).

.. _whatsnew_0240.api.incompatibilities:

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from .categorical import Categorical # noqa
from .datetimes import DatetimeArrayMixin # noqa
from .interval import IntervalArray # noqa
from .period import PeriodArrayMixin # noqa
from .period import PeriodArray, period_array # noqa
from .timedeltas import TimedeltaArrayMixin # noqa
from .integer import ( # noqa
IntegerArray, integer_array)
Expand Down
23 changes: 19 additions & 4 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
is_categorical_dtype,
is_float_dtype,
is_integer_dtype,
is_object_dtype,
is_list_like, is_sequence,
is_scalar, is_iterator,
is_dict_like)
Expand Down Expand Up @@ -342,7 +343,6 @@ def __init__(self, values, categories=None, ordered=None, dtype=None,
# a.) use categories, ordered
# b.) use values.dtype
# c.) infer from values

if dtype is not None:
# The dtype argument takes precedence over values.dtype (if any)
if isinstance(dtype, compat.string_types):
Expand Down Expand Up @@ -2478,11 +2478,26 @@ def _get_codes_for_values(values, categories):
utility routine to turn values into codes given the specified categories
"""
from pandas.core.algorithms import _get_data_algo, _hashtables
if is_dtype_equal(values.dtype, categories.dtype):
dtype_equal = is_dtype_equal(values.dtype, categories.dtype)

if dtype_equal:
# To prevent erroneous dtype coercion in _get_data_algo, retrieve
# the underlying numpy array. gh-22702
values = getattr(values, 'values', values)
categories = getattr(categories, 'values', categories)
values = getattr(values, '_ndarray_values', values)
categories = getattr(categories, '_ndarray_values', categories)
elif (is_extension_array_dtype(categories.dtype) and
is_object_dtype(values)):
# Support inferring the correct extension dtype from an array of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a function I think as we are likely to do this in other places as well (or maybe already) do, e.g. in csv parsing with category?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is basically what catetorical_array (generic construtor for Categoricals should be doing anyhow)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This called from the categorical constructor.

Ideally we would have a single place that does all this. Right now it feels scattered over _sanitize_array, the Index constructor, and probably a few other places.

# scalar objects. e.g.
# Categorical(array[Period, Period], categories=PeriodIndex(...))
try:
values = (
categories.dtype.construct_array_type()._from_sequence(values)
)
except Exception:
# but that may fail for any reason, so fall back to object
values = ensure_object(values)
categories = ensure_object(categories)
else:
values = ensure_object(values)
categories = ensure_object(categories)
Expand Down
18 changes: 4 additions & 14 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -474,17 +474,8 @@ def _addsub_int_array(self, other, op):
result : same class as self
"""
assert op in [operator.add, operator.sub]
if is_period_dtype(self):
# easy case for PeriodIndex
if op is operator.sub:
other = -other
res_values = checked_add_with_arr(self.asi8, other,
arr_mask=self._isnan)
res_values = res_values.view('i8')
res_values[self._isnan] = iNaT
return self._from_ordinals(res_values, freq=self.freq)

elif self.freq is None:

if self.freq is None:
# GH#19123
raise NullFrequencyError("Cannot shift with no freq")

Expand Down Expand Up @@ -524,10 +515,9 @@ def _addsub_offset_array(self, other, op):
left = lib.values_from_object(self.astype('O'))

res_values = op(left, np.array(other))
kwargs = {}
if not is_period_dtype(self):
kwargs['freq'] = 'infer'
return type(self)(res_values, **kwargs)
return type(self)(res_values, freq='infer')
return self._from_sequence(res_values)

@deprecate_kwarg(old_arg_name='n', new_arg_name='periods')
def shift(self, periods, freq=None):
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -832,7 +832,7 @@ def to_period(self, freq=None):
pandas.PeriodIndex: Immutable ndarray holding ordinal values
pandas.DatetimeIndex.to_pydatetime: Return DatetimeIndex as object
"""
from pandas.core.arrays import PeriodArrayMixin
from pandas.core.arrays import PeriodArray

if self.tz is not None:
warnings.warn("Converting to PeriodArray/Index representation "
Expand All @@ -847,7 +847,7 @@ def to_period(self, freq=None):

freq = get_period_alias(freq)

return PeriodArrayMixin(self.values, freq=freq)
return PeriodArray._from_datetime64(self.values, freq, tz=self.tz)

def to_perioddelta(self, freq):
"""
Expand Down
Loading