Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: IntervalIndex[IntervalArray] #20611

Merged
merged 24 commits into from
Jul 13, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 17 additions & 5 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1925,11 +1925,23 @@ untouched. If the data is modified, it is because you did so explicitly.
dtypes
------

The main types stored in pandas objects are ``float``, ``int``, ``bool``,
``datetime64[ns]`` and ``datetime64[ns, tz]``, ``timedelta[ns]``,
``category`` and ``object``. In addition these dtypes have item sizes, e.g.
``int64`` and ``int32``. See :ref:`Series with TZ <timeseries.timezone_series>`
for more detail on ``datetime64[ns, tz]`` dtypes.
For the most part, pandas uses NumPy arrays and dtypes for Series or individual
columns of a DataFrame. The main types allowed in pandas objects are ``float``,
``int``, ``bool``, and ``datetime64[ns]`` (note that NumPy does not support
timezone-aware datetimes).

In addition to NumPy's types, pandas :ref:`extends <extending.extension-types>`
NumPy's type-system for a few cases.

* :ref:`Categorical <categorical>`
* :ref:`Datetime with Timezone <timeseries.timezone_series>`
* Interval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also Periods ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a ref for this? (maybe should create one if not)?


Pandas uses the ``object`` dtype for storing strings.

Finally, arbitrary objects may be stored using the ``object`` dtype, but should
be avoided to the extent possible (for performance and interoperability with
other libraries and methods. See :ref:`basics.object_conversion`).

A convenient :attr:`~DataFrame.dtypes` attribute for DataFrame returns a Series
with the data type of each column.
Expand Down
71 changes: 71 additions & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,41 @@ Supplying a ``CategoricalDtype`` will make the categories in each column consist
df['A'].dtype
df['B'].dtype

.. _whatsnew_023.enhancements.interval:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_whatsnew_023. --> _whatsnew_0230. and likewise on line 337.


Storing Interval Data in Series and DataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Interval data may now be stored in a Series or DataFrame, in addition to an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not very clear as its now new. This works now, but is just inefficient.

:class:`IntervalIndex` like before.

.. ipython:: python

ser = pd.Series(pd.interval_range(0, 5))
ser
ser.dtype

Previously, these would be cast to a NumPy array of Interval objects. In general,
this should result in better performance when storing an array of intervals in
a Series.

Note that the ``.values`` of a Series containing intervals is no longer a NumPy
array. Rather, it's an ``ExtensionArray``, composed of two arrays ``left`` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntervalArray instead of ExtensionArray ?

``right``.

.. ipython:: python

ser.values

To recover the NumPy array of Interval objects, use :func:`numpy.asarray`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should show this (the recovering)


.. ipython:: python

np.asarray(ser.values)

This is the same behavior as ``Series.values`` for categorical data. See
:ref:`whatsnew_0230.api_breaking.interval_values` for more.

.. _whatsnew_023.enhancements.extension:

Extending Pandas with Custom Types
Expand Down Expand Up @@ -479,6 +514,42 @@ If you wish to retain the old behavior while using Python >= 3.6, you can use
'Taxes': -200,
'Net result': 300}).sort_index()

.. _whatsnew_0230.api_breaking.interval_values:

``IntervalIndex.values`` is now an ``IntervalArray``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just put a single sentence for it. its not that big of a deal that this is changed.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``.values`` attribute of an :class:`IntervalIndex` now returns an
``IntervalArray``, rather than a NumPy array of :class:`Interval` objects.

Previous Behavior:

.. code-block:: ipython

In [1]: idx = pd.interval_range(0, 4)

In [2]: idx.values
Out[2]:
array([Interval(0, 1, closed='right'), Interval(1, 2, closed='right'),
Interval(2, 3, closed='right'), Interval(3, 4, closed='right')],
dtype=object)

New Behavior:

.. ipython:: python

idx = pd.interval_range(0, 4)
idx.values

This mirrors ``CateogricalIndex.values``, which returns a ``Categorical``.

For situations where you need an ``ndarray`` of Interval objects, use
:meth:`numpy.asarray` or ``idx.astype(object)``.

.. ipython:: python

idx.values.astype(object)

.. _whatsnew_0230.api_breaking.deprecate_panel:

Deprecate Panel
Expand Down
12 changes: 10 additions & 2 deletions pandas/core/arrays/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,10 @@
from .base import ExtensionArray # noqa
from .categorical import Categorical # noqa
from .base import ExtensionArray
from .categorical import Categorical
from .interval import IntervalArray


__all__ = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don’t need the all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's so the #noqa aren't needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don’t do this anywhere else - don’t need introduce new ways of doing things
u less u r fixing it everywhere

'Categorical',
'ExtensionArray',
'IntervalArray',
]
3 changes: 3 additions & 0 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
_ensure_int64,
_ensure_object,
_ensure_platform_int,
is_extension_array_dtype,
is_dtype_equal,
is_datetimelike,
is_datetime64_dtype,
Expand Down Expand Up @@ -1218,6 +1219,8 @@ def __array__(self, dtype=None):
ret = take_1d(self.categories.values, self._codes)
if dtype and not is_dtype_equal(dtype, self.categories.dtype):
return np.asarray(ret, dtype)
if is_extension_array_dtype(ret):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__array__ has to return an ndarray. Without this, Categorical[ExtensionArray]would fail, astake_1d(...)` would be an ExtensionArray.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an update on this section already in intna

ret = np.asarray(ret)
return ret

def __setstate__(self, state):
Expand Down
Loading