Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: added array #23581

Merged
merged 57 commits into from
Dec 28, 2018
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
bfefc96
added array
TomAugspurger Nov 8, 2018
51480a3
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Nov 9, 2018
dcb7931
update registry test
TomAugspurger Nov 9, 2018
a635649
update doc examples
TomAugspurger Nov 9, 2018
fb0d8bc
wip
TomAugspurger Nov 9, 2018
d58a320
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Nov 9, 2018
fe06de4
inference
TomAugspurger Nov 9, 2018
72f7f06
ia updates
TomAugspurger Nov 9, 2018
c02e183
test fixup
TomAugspurger Nov 10, 2018
a2d3146
isort
TomAugspurger Nov 10, 2018
37901b0
fixups
TomAugspurger Nov 10, 2018
4403010
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Nov 12, 2018
9401dd3
wip
TomAugspurger Nov 12, 2018
838ce5e
dtype from ea
TomAugspurger Nov 12, 2018
5260b99
series, index tests
TomAugspurger Nov 12, 2018
248e9e0
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Nov 12, 2018
cf07c80
added ndarray case
TomAugspurger Nov 12, 2018
22490a8
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Nov 15, 2018
5e0dc62
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Nov 17, 2018
fe40189
added test for a 2d array
TomAugspurger Nov 17, 2018
7eb9d08
TST: test for Series[EA]
TomAugspurger Nov 17, 2018
fa7b200
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Nov 20, 2018
1ca14fe
Added test for period -> category
TomAugspurger Nov 20, 2018
4473899
copy
TomAugspurger Nov 20, 2018
382f57d
prefix for arrays
TomAugspurger Nov 20, 2018
dd76a2b
Added arrays
TomAugspurger Nov 20, 2018
159d3a2
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Nov 21, 2018
5366950
update docstring
TomAugspurger Nov 21, 2018
c818a8f
docstring order
TomAugspurger Nov 21, 2018
ba8b807
Revert "docstring order"
TomAugspurger Nov 21, 2018
77cd782
Updates
TomAugspurger Nov 21, 2018
dfada7b
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Nov 27, 2018
5eff701
Add docs for the types we infer
TomAugspurger Nov 27, 2018
9406400
API: disallow string alias for NumPy
TomAugspurger Nov 27, 2018
8eb07c3
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Nov 28, 2018
ea3a118
Wrap long error message
TomAugspurger Nov 28, 2018
ecae340
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Nov 29, 2018
fb814fc
updates
TomAugspurger Nov 29, 2018
a6f6d29
removed old test
TomAugspurger Nov 29, 2018
6c243f3
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Nov 29, 2018
86b81b5
formatting
TomAugspurger Nov 29, 2018
2c6cf97
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Dec 8, 2018
50d4206
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Dec 8, 2018
9e1b4e6
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Dec 10, 2018
000967d
Raise on scalars
TomAugspurger Dec 10, 2018
bf829c3
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Dec 11, 2018
faf114d
docs on raising
TomAugspurger Dec 11, 2018
3186ded
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Dec 12, 2018
1c4da0e
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Dec 28, 2018
36c6f00
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Dec 28, 2018
932e119
Updates for PandasArray
TomAugspurger Dec 28, 2018
45d07eb
update docstring
TomAugspurger Dec 28, 2018
d1aba73
Updates
TomAugspurger Dec 28, 2018
981f735
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Dec 28, 2018
1f3bb50
fixed test expected
TomAugspurger Dec 28, 2018
c8d3960
doc lint
TomAugspurger Dec 28, 2018
1b9e251
Merge remote-tracking branch 'upstream/master' into pd.array
TomAugspurger Dec 28, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -702,6 +702,19 @@ strings and apply several methods to it. These can be accessed like
Series.dt
Index.str


.. _api.arrays:

Arrays
------

Pandas and third-party libraries can extend NumPy's type system (see :ref:`extending.extension-types`).

.. autosummary::
:toctree: generated/

array

.. _api.categorical:

Categorical
Expand Down Expand Up @@ -790,6 +803,56 @@ following usable methods and properties:
Series.cat.as_ordered
Series.cat.as_unordered

.. _api.arrays.integerna:

Integer-NA
~~~~~~~~~~

:class:`arrays.IntegerArray` can hold integer data, potentially with missing
values.

.. autosummary::
:toctree: generated/

IntegerArray

.. _api.arrays.interval:

Interval
~~~~~~~~

:class:`IntervalArray` is an array for storing data representing intervals.
The scalar type is a :class:`Interval`. These may be stored in a :class:`Series`
or as a :class:`IntervalIndex`. :class:`IntervalArray` can be closed on the
left, right, or both, or neither sides.
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

.. currentmodule:: pandas

.. autosummary::
:toctree: generated/

IntervalArray

.. _api.arrays.period:

Period
~~~~~~

.. autosummary::
:toctree: generated/

PeriodArray
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

Sparse
~~~~~~

.. _api.arrays.sparse:

.. autosummary::
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
:toctree: generated/

SparseArray

Plotting
~~~~~~~~

Expand Down Expand Up @@ -1675,6 +1738,7 @@ IntervalIndex Components
IntervalIndex.get_indexer
IntervalIndex.set_closed
IntervalIndex.overlaps
IntervalArray.to_tuples


.. _api.multiindex:
Expand Down Expand Up @@ -1905,6 +1969,8 @@ Methods
PeriodIndex.strftime
PeriodIndex.to_timestamp

.. api.scalars:

Scalars
-------

Expand Down
12 changes: 12 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,18 @@ Reduction and groupby operations such as 'sum' work.

The Integer NA support currently uses the captilized dtype version, e.g. ``Int8`` as compared to the traditional ``int8``. This may be changed at a future date.

.. _whatsnew_0240.enhancements.array:

A new top-level method :func:`array` has been added for creating arrays (:issue:`22860`).
This can be used to create NumPy arrays, or any :ref:`extension array <extending.extension-types>`, including
extension arrays registered by :ref:`3rd party libraries <ecosystem.extensions>`.

.. ipython:: python

pd.array([1, 2, np.nan], dtype='Int64')
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
pd.array(['a', 'b', 'c'], dtype='category')
pd.array([1, 2])

.. _whatsnew_0240.enhancements.read_html:

``read_html`` Enhancements
Expand Down
17 changes: 17 additions & 0 deletions pandas/arrays/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
"""
All of pandas' ExtensionArrays and ExtensionDtypes.

See :ref:`extending.extension-types` for more.
"""
from pandas.core.arrays import (
IntervalArray, PeriodArray, Categorical, SparseArray, IntegerArray,
)


__all__ = [
jreback marked this conversation as resolved.
Show resolved Hide resolved
'Categorical',
'IntegerArray',
'IntervalArray',
'PeriodArray',
'SparseArray',
]
19 changes: 18 additions & 1 deletion pandas/core/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,26 @@

import numpy as np

from pandas.core.arrays import IntervalArray
from pandas.core.arrays.integer import (
Int8Dtype,
Int16Dtype,
Int32Dtype,
Int64Dtype,
UInt8Dtype,
UInt16Dtype,
UInt32Dtype,
UInt64Dtype,
)
from pandas.core.algorithms import factorize, unique, value_counts
from pandas.core.dtypes.missing import isna, isnull, notna, notnull
from pandas.core.arrays import Categorical
from pandas.core.dtypes.dtypes import (
CategoricalDtype,
PeriodDtype,
IntervalDtype,
DatetimeTZDtype,
)
from pandas.core.arrays import Categorical, array
from pandas.core.groupby import Grouper
from pandas.io.formats.format import set_eng_float_format
from pandas.core.index import (Index, CategoricalIndex, Int64Index,
Expand Down
1 change: 1 addition & 0 deletions pandas/core/arrays/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from .array_ import array # noqa
from .base import (ExtensionArray, # noqa
ExtensionOpsMixin,
ExtensionScalarOpsMixin)
Expand Down
86 changes: 86 additions & 0 deletions pandas/core/arrays/array_.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
import numpy as np

from pandas._libs import lib, tslibs

from pandas.core.dtypes.common import is_extension_array_dtype
from pandas.core.dtypes.dtypes import registry
from pandas.core.dtypes.generic import ABCIndexClass, ABCSeries
jreback marked this conversation as resolved.
Show resolved Hide resolved


def array(data, dtype=None, copy=False):
"""
Create an array.

TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
.. versionadded:: 0.24.0

Parameters
----------
data : Sequence[object]
A sequence of scalar instances for `dtype`. The underlying
array will be extracted from a Series or Index object.

dtype : Union[str, np.dtype, ExtensionDtype], optional
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
The dtype to use for the array. This may be a NumPy
dtype, or an extension type registered with pandas using
:meth:`pandas.api.extensions.register_extension_dtype`.

By default, the dtype will be inferred from the data
with :meth:`numpy.array`.

copy : bool, default False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy has a default of True. Do we have a rationale for using False by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just matching series here. I don't have a strong preference. Let's just match NumPy then.

Whether to copy the data.

Returns
-------
Array : Union[ndarray, ExtensionArray]

Examples
--------
If a dtype is not specified, `data` is passed through to
:meth:`numpy.array`, and an ndarray is returned.
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

>>> pd.array([1, 2])
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
array([1, 2])

Or the NumPy dtype can be specified

>>> pd.array([1, 2], dtype=np.int32)
array([1, 2], dtype=int32)

You can use the string alias for `dtype`

>>> pd.array(['a', 'b', 'a'], dtype='category')
[a, b, a]
Categories (2, object): [a, b]

Or specify the actual dtype

>>> pd.array(['a', 'b', 'a'],
... dtype=pd.CategoricalDtype(['a', 'b', 'c'], ordered=True))
[a, b, a]
Categories (3, object): [a < b < c]
"""
from pandas.core.arrays import period_array

if isinstance(data, (ABCSeries, ABCIndexClass)):
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
data = data._values

# this returns None for not-found dtypes.
dtype = registry.find(dtype) or dtype

if is_extension_array_dtype(dtype):
cls = dtype.construct_array_type()
return cls._from_sequence(data, dtype=dtype, copy=copy)

if dtype is None:
inferred_dtype = lib.infer_dtype(data)
jreback marked this conversation as resolved.
Show resolved Hide resolved
if inferred_dtype == 'period':
try:
return period_array(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should pass copy= here

except tslibs.IncompatibleFrequency:
pass # we return an array below.

# TODO(DatetimeArray): handle this type
# TODO(BooleanArray): handle this type

return np.array(data, dtype=dtype, copy=copy)
2 changes: 2 additions & 0 deletions pandas/core/arrays/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,9 @@
from_arrays
from_tuples
from_breaks
overlaps
set_closed
to_tuples
%(extra_methods)s\

%(examples)s\
Expand Down
9 changes: 7 additions & 2 deletions pandas/core/dtypes/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,12 @@ class Registry(object):
Registry for dtype inference

The registry allows one to map a string repr of a extension
dtype to an extenstion dtype.
dtype to an extension dtype. The string alias can be used in several
places, including

* Series and Index constructors
* :meth:`pandas.array`
* :meth:`pandas.Series.astype`

Multiple extension types can be registered.
These are tried in order.
Expand Down Expand Up @@ -592,6 +597,7 @@ def __eq__(self, other):
str(self.tz) == str(other.tz))


@register_extension_dtype
class PeriodDtype(ExtensionDtype, PandasExtensionDtype):
"""
A Period duck-typed class, suitable for holding a period with freq dtype.
Expand Down Expand Up @@ -854,4 +860,3 @@ def is_dtype(cls, dtype):
_pandas_registry = Registry()

_pandas_registry.register(DatetimeTZDtype)
_pandas_registry.register(PeriodDtype)
10 changes: 8 additions & 2 deletions pandas/tests/api/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,13 @@ class TestPDApi(Base):
'Period', 'PeriodIndex', 'RangeIndex', 'UInt64Index',
'Series', 'SparseArray', 'SparseDataFrame', 'SparseDtype',
'SparseSeries', 'Timedelta',
'TimedeltaIndex', 'Timestamp', 'Interval', 'IntervalIndex']
'TimedeltaIndex', 'Timestamp', 'Interval', 'IntervalIndex',
'IntervalArray',
'CategoricalDtype', 'PeriodDtype', 'IntervalDtype',
'DatetimeTZDtype',
'Int8Dtype', 'Int16Dtype', 'Int32Dtype', 'Int64Dtype',
'UInt8Dtype', 'UInt16Dtype', 'UInt32Dtype', 'UInt64Dtype',
]

# these are already deprecated; awaiting removal
deprecated_classes = ['TimeGrouper']
Expand All @@ -57,7 +63,7 @@ class TestPDApi(Base):
modules = ['np', 'datetime']

# top-level functions
funcs = ['bdate_range', 'concat', 'crosstab', 'cut',
funcs = ['array', 'bdate_range', 'concat', 'crosstab', 'cut',
'date_range', 'interval_range', 'eval',
'factorize', 'get_dummies',
'infer_freq', 'isna', 'isnull', 'lreshape',
Expand Down
Loading