Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH GH20601 raise error when pivot table's number of levels > int32 #20709

Closed
wants to merge 34 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
a635140
ENH GH20601 raise an error when the number of levels in a pivot table…
anhqle Apr 16, 2018
ac224f5
TST add a test for pivot table large number of levels causing int32 o…
anhqle Apr 16, 2018
acbc4eb
CLN PEP8 compliance
anhqle Apr 16, 2018
662ce5f
DOC add whatsnew entry
anhqle Apr 16, 2018
804101c
Fix issue 17912 (#20705)
CianciuStyles Apr 16, 2018
1e4e04b
ENH: ExtensionArray.setitem (#19907)
TomAugspurger Apr 16, 2018
8756f55
DEP: Add 'python_requires' to setup.py to drop 3.4 support (#20698)
djhoese Apr 16, 2018
da33359
DOC: Correct documentation to GroupBy.rank (#20708)
gfyoung Apr 16, 2018
4a34497
API: rolling.apply will pass Series to function (#20584)
jreback Apr 16, 2018
6245e8c
TST: add tests for take() on empty arrays (#20582)
jorisvandenbossche Apr 17, 2018
75295e1
CLN: Replacing %s with .format in pandas/core/frame.py (#20461)
AaronCritchley Apr 17, 2018
bb095a6
change the indent for the pydoc of apply() function. (#20715)
zhao-zihao Apr 17, 2018
7ed1f53
PKG: remove pyproject.toml for now (#20718)
jorisvandenbossche Apr 18, 2018
b9f826f
DOC: use apply(raw=True) in docs to silence warning (#20741)
jorisvandenbossche Apr 19, 2018
07739aa
Fix more tests expecting little-endian (#20738)
ginggs Apr 19, 2018
ede11af
DOC: add coverage href to README.md (#20736)
wuhaochen Apr 19, 2018
78fee04
DEPR: Deprecate DatetimeIndex.offset in favor of DatetimeIndex.freq (…
jschendel Apr 19, 2018
3e691a4
ENH: DataFrame.append preserves columns dtype if possible (#19021)
topper-123 Apr 20, 2018
be057a1
DOC: Clean up badges in README (#20749)
wuhaochen Apr 20, 2018
3a2e9e6
BUG: fixes indexing with monotonic decreasing DTI (#19362) (#20677)
mapehe Apr 20, 2018
23bc217
DOC: Various EA docs (#20707)
TomAugspurger Apr 21, 2018
54470f3
BUG: unexpected assign by a single-element list (GH19474) (#20732)
kittoku Apr 21, 2018
669d9b2
Add interpolate to doc string (#20776)
topper-123 Apr 21, 2018
336fba7
TST: #20720
jreback Apr 21, 2018
7e75e4a
Fixed WOM offset when n=0 (#20549)
mmngreco Apr 21, 2018
0d199e4
BUG: Fix problems in group rank when both nans and infinity are prese…
peterpanmj Apr 21, 2018
8def649
TST: split test_groupby.py (#20781)
jreback Apr 21, 2018
466f90a
ENH GH20601 raise an error when the number of levels in a pivot table…
anhqle Apr 16, 2018
dc982de
TST add a test for pivot table large number of levels causing int32 o…
anhqle Apr 16, 2018
ea53feb
CLN PEP8 compliance
anhqle Apr 16, 2018
50d5e02
DOC add whatsnew entry
anhqle Apr 16, 2018
90b7624
ENH catch the int32 overflow error earlier and in two separate places…
anhqle Apr 22, 2018
8baba4b
CLN git merge clean up
anhqle Apr 22, 2018
2416db1
CLN edit whatsnew entry and remove old code
anhqle Apr 22, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
49 changes: 30 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,33 @@
<table>
<tr>
<td>Latest Release</td>
<td><img src="https://img.shields.io/pypi/v/pandas.svg" alt="latest release" /></td>
<td>
<a href="https://pypi.python.org/pypi/pandas/">
<img src="https://img.shields.io/pypi/v/pandas.svg" alt="latest release" />
</a>
</td>
</tr>
<td></td>
<td><img src="https://anaconda.org/conda-forge/pandas/badges/version.svg" alt="latest release" /></td>
<td>
<a href="https://anaconda.org/anaconda/pandas/">
<img src="https://anaconda.org/conda-forge/pandas/badges/version.svg" alt="latest release" />
</a>
</td>
</tr>
<tr>
<td>Package Status</td>
<td><img src="https://img.shields.io/pypi/status/pandas.svg" alt="status" /></td>
<td>
<a href="https://pypi.python.org/pypi/pandas/">
<img src="https://img.shields.io/pypi/status/pandas.svg" alt="status" /></td>
</a>
</tr>
<tr>
<td>License</td>
<td><img src="https://img.shields.io/pypi/l/pandas.svg" alt="license" /></td>
<td>
<a href="https://github.com/pandas-dev/pandas/blob/master/LICENSE">
<img src="https://img.shields.io/pypi/l/pandas.svg" alt="license" />
</a>
</td>
</tr>
<tr>
<td>Build Status</td>
Expand Down Expand Up @@ -48,35 +63,31 @@
</tr>
<tr>
<td>Coverage</td>
<td><img src="https://codecov.io/github/pandas-dev/pandas/coverage.svg?branch=master" alt="coverage" /></td>
</tr>
<tr>
<td>Conda</td>
<td>
<a href="https://pandas.pydata.org">
<img src="http://pubbadges.s3-website-us-east-1.amazonaws.com/pkgs-downloads-pandas.png" alt="conda default downloads" />
 <td>
<a href="https://codecov.io/gh/pandas-dev/pandas">
<img src="https://codecov.io/github/pandas-dev/pandas/coverage.svg?branch=master" alt="coverage" />
</a>
</td>
</tr>
<tr>
<td>Conda-forge</td>
<td>Downloads</td>
<td>
<a href="https://pandas.pydata.org">
<img src="https://anaconda.org/conda-forge/pandas/badges/downloads.svg" alt="conda-forge downloads" />
</a>
</td>
</tr>
<tr>
<td>PyPI</td>
<td>
<a href="https://pypi.python.org/pypi/pandas/">
<img src="https://img.shields.io/pypi/dm/pandas.svg" alt="pypi downloads" />
</a>
</td>
<td>Gitter</td>
<td>
<a href="https://gitter.im/pydata/pandas">
<img src="https://badges.gitter.im/Join%20Chat.svg"
</a>
</td>
</tr>
</table>

[![https://gitter.im/pydata/pandas](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/pydata/pandas?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)


## What is it

Expand Down
2 changes: 1 addition & 1 deletion ci/environment-dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,5 @@ dependencies:
- python-dateutil>=2.5.0
- python=3
- pytz
- setuptools>=3.3
- setuptools>=24.2.0
- sphinx
4 changes: 2 additions & 2 deletions ci/requirements_dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ moto
pytest>=3.1
python-dateutil>=2.5.0
pytz
setuptools>=3.3
sphinx
setuptools>=24.2.0
sphinx
2 changes: 2 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2106,6 +2106,7 @@ Standard moving window functions
Rolling.skew
Rolling.kurt
Rolling.apply
Rolling.aggregate
Rolling.quantile
Window.mean
Window.sum
Expand Down Expand Up @@ -2133,6 +2134,7 @@ Standard expanding window functions
Expanding.skew
Expanding.kurt
Expanding.apply
Expanding.aggregate
Expanding.quantile

Exponentially-weighted moving window functions
Expand Down
2 changes: 1 addition & 1 deletion doc/source/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,7 @@ compute the mean absolute deviation on a rolling basis:

mad = lambda x: np.fabs(x - x.mean()).mean()
@savefig rolling_apply_ex.png
s.rolling(window=60).apply(mad).plot(style='k')
s.rolling(window=60).apply(mad, raw=True).plot(style='k')

.. _stats.rolling_window:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -496,7 +496,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
def Red(x):
return functools.reduce(CumRet,x,1.0)

S.expanding().apply(Red)
S.expanding().apply(Red, raw=True)


`Replacing some values with mean of the rest of a group
Expand Down
25 changes: 25 additions & 0 deletions doc/source/extending.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,13 @@ If you write a custom accessor, make a pull request adding it to our
Extension Types
---------------

.. versionadded:: 0.23.0

.. warning::

The ``ExtensionDtype`` and ``ExtensionArray`` APIs are new and
experimental. They may change between versions without warning.

Pandas defines an interface for implementing data types and arrays that *extend*
NumPy's type system. Pandas itself uses the extension system for some types
that aren't built into NumPy (categorical, period, interval, datetime with
Expand Down Expand Up @@ -106,6 +113,24 @@ by some other storage type, like Python lists.
See the `extension array source`_ for the interface definition. The docstrings
and comments contain guidance for properly implementing the interface.

We provide a test suite for ensuring that your extension arrays satisfy the expected
behavior. To use the test suite, you must provide several pytest fixtures and inherit
from the base test class. The required fixtures are found in
https://github.com/pandas-dev/pandas/blob/master/pandas/tests/extension/conftest.py.

To use a test, subclass it:

.. code-block:: python

from pandas.tests.extension import base

class TestConstructors(base.BaseConstructorsTests):
pass


See https://github.com/pandas-dev/pandas/blob/master/pandas/tests/extension/base/__init__.py
for a list of all the tests available.

.. _extension dtype source: https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/base.py
.. _extension array source: https://github.com/pandas-dev/pandas/blob/master/pandas/core/arrays/base.py

Expand Down
4 changes: 2 additions & 2 deletions doc/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Instructions for installing from source,
`PyPI <http://pypi.python.org/pypi/pandas>`__, `ActivePython <https://www.activestate.com/activepython/downloads>`__, various Linux distributions, or a
`development version <http://github.com/pandas-dev/pandas>`__ are also provided.

.. _install.dropping_27:
.. _install.dropping-27:

Plan for dropping Python 2.7
----------------------------
Expand Down Expand Up @@ -223,7 +223,7 @@ installed), make sure you have `pytest
Dependencies
------------

* `setuptools <https://setuptools.readthedocs.io/en/latest/>`__: 3.3.0 or higher
* `setuptools <https://setuptools.readthedocs.io/en/latest/>`__: 24.2.0 or higher
* `NumPy <http://www.numpy.org>`__: 1.9.0 or higher
* `python-dateutil <//https://dateutil.readthedocs.io/en/stable/>`__: 2.5.0 or higher
* `pytz <http://pytz.sourceforge.net/>`__
Expand Down
55 changes: 50 additions & 5 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ version.
.. warning::

Starting January 1, 2019, pandas feature releases will support Python 3 only.
See :ref:`here <install.dropping_27>` for more.
See :ref:`install.dropping-27` for more.

.. _whatsnew_0230.enhancements:

Expand Down Expand Up @@ -65,6 +65,35 @@ The :func:`get_dummies` now accepts a ``dtype`` argument, which specifies a dtyp
pd.get_dummies(df, columns=['c'], dtype=bool).dtypes


.. _whatsnew_0230.enhancements.window_raw:

Rolling/Expanding.apply() accepts a ``raw`` keyword to pass a ``Series`` to the function
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`Series.rolling().apply() <pandas.core.window.Rolling.apply>`, :func:`DataFrame.rolling().apply() <pandas.core.window.Rolling.apply>`,
:func:`Series.expanding().apply() <pandas.core.window.Expanding.apply>`, and :func:`DataFrame.expanding().apply() <pandas.core.window.Expanding.apply>` have gained a ``raw=None`` parameter.
This is similar to :func:`DataFame.apply`. This parameter, if ``True`` allows one to send a ``np.ndarray`` to the applied function. If ``False`` a ``Series`` will be passed. The
default is ``None``, which preserves backward compatibility, so this will default to ``True``, sending an ``np.ndarray``.
In a future version the default will be changed to ``False``, sending a ``Series``. (:issue:`5071`, :issue:`20584`)

.. ipython:: python

s = pd.Series(np.arange(5), np.arange(5) + 1)
s

Pass a ``Series``:

.. ipython:: python

s.rolling(2, min_periods=1).apply(lambda x: x.iloc[-1], raw=False)

Mimic the original behavior of passing a ndarray:

.. ipython:: python

s.rolling(2, min_periods=1).apply(lambda x: x[-1], raw=True)


.. _whatsnew_0230.enhancements.merge_on_columns_and_levels:

Merging on a combination of columns and index levels
Expand Down Expand Up @@ -192,6 +221,12 @@ Current Behavior:

s.rank(na_option='top')

These bugs were squashed:

- Bug in :meth:`DataFrame.rank` and :meth:`Series.rank` when ``method='dense'`` and ``pct=True`` in which percentile ranks were not being used with the number of distinct observations (:issue:`15630`)
- Bug in :meth:`Series.rank` and :meth:`DataFrame.rank` when ``ascending='False'`` failed to return correct ranks for infinity if ``NaN`` were present (:issue:`19538`)
- Bug in :func:`DataFrameGroupBy.rank` where ranks were incorrect when both infinity and ``NaN`` were present (:issue:`20561`)

.. _whatsnew_0230.enhancements.round-trippable_json:

JSON read/write round-trippable with ``orient='table'``
Expand Down Expand Up @@ -306,8 +341,8 @@ Supplying a ``CategoricalDtype`` will make the categories in each column consist

.. _whatsnew_023.enhancements.extension:

Extending Pandas with Custom Types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Extending Pandas with Custom Types (Experimental)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pandas now supports storing array-like objects that aren't necessarily 1-D NumPy
arrays as columns in a DataFrame or values in a Series. This allows third-party
Expand Down Expand Up @@ -380,6 +415,7 @@ Other Enhancements
- :class:`IntervalIndex` now supports time zone aware ``Interval`` objects (:issue:`18537`, :issue:`18538`)
- :func:`Series` / :func:`DataFrame` tab completion also returns identifiers in the first level of a :func:`MultiIndex`. (:issue:`16326`)
- :func:`read_excel()` has gained the ``nrows`` parameter (:issue:`16645`)
- :meth:`DataFrame.append` can now in more cases preserve the type of the calling dataframe's columns (e.g. if both are ``CategoricalIndex``) (:issue:`18359`)
- :func:``DataFrame.to_json`` and ``Series.to_json`` now accept an ``index`` argument which allows the user to exclude the index from the JSON output (:issue:`17394`)
- ``IntervalIndex.to_tuples()`` has gained the ``na_tuple`` parameter to control whether NA is returned as a tuple of NA, or NA itself (:issue:`18756`)
- ``Categorical.rename_categories``, ``CategoricalIndex.rename_categories`` and :attr:`Series.cat.rename_categories`
Expand Down Expand Up @@ -408,6 +444,7 @@ Other Enhancements
``SQLAlchemy`` dialects supporting multivalue inserts include: ``mysql``, ``postgresql``, ``sqlite`` and any dialect with ``supports_multivalues_insert``. (:issue:`14315`, :issue:`8953`)
- :func:`read_html` now accepts a ``displayed_only`` keyword argument to controls whether or not hidden elements are parsed (``True`` by default) (:issue:`20027`)
- zip compression is supported via ``compression=zip`` in :func:`DataFrame.to_pickle`, :func:`Series.to_pickle`, :func:`DataFrame.to_csv`, :func:`Series.to_csv`, :func:`DataFrame.to_json`, :func:`Series.to_json`. (:issue:`17778`)
- :class:`WeekOfMonth` constructor now supports ``n=0`` (:issue:`20517`).
- :class:`DataFrame` and :class:`Series` now support matrix multiplication (```@```) operator (:issue:`10259`) for Python>=3.5
- Updated ``to_gbq`` and ``read_gbq`` signature and documentation to reflect changes from
the Pandas-GBQ library version 0.4.0. Adds intersphinx mapping to Pandas-GBQ
Expand Down Expand Up @@ -435,6 +472,8 @@ If installed, we now require:
+-----------------+-----------------+----------+---------------+
| beautifulsoup4 | 4.2.1 | | :issue:`20082`|
+-----------------+-----------------+----------+---------------+
| setuptools | 24.2.0 | | :issue:`20698`|
+-----------------+-----------------+----------+---------------+

.. _whatsnew_0230.api_breaking.dict_insertion_order:

Expand Down Expand Up @@ -815,6 +854,7 @@ Other API Changes
- :func:`DatetimeIndex.strftime` and :func:`PeriodIndex.strftime` now return an ``Index`` instead of a numpy array to be consistent with similar accessors (:issue:`20127`)
- Constructing a Series from a list of length 1 no longer broadcasts this list when a longer index is specified (:issue:`19714`, :issue:`20391`).
- :func:`DataFrame.to_dict` with ``orient='index'`` no longer casts int columns to float for a DataFrame with only int and float columns (:issue:`18580`)
- A user-defined-function that is passed to :func:`Series.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, :func:`DataFrame.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, or its expanding cousins, will now *always* be passed a ``Series``, rather than a ``np.array``; ``.apply()`` only has the ``raw`` keyword, see :ref:`here <whatsnew_0230.enhancements.window_raw>`. This is consistent with the signatures of ``.aggregate()`` across pandas (:issue:`20584`)

.. _whatsnew_0230.deprecations:

Expand Down Expand Up @@ -843,6 +883,9 @@ Deprecations
- ``Index.summary()`` is deprecated and will be removed in a future version (:issue:`18217`)
- ``NDFrame.get_ftype_counts()`` is deprecated and will be removed in a future version (:issue:`18243`)
- The ``convert_datetime64`` parameter in :func:`DataFrame.to_records` has been deprecated and will be removed in a future version. The NumPy bug motivating this parameter has been resolved. The default value for this parameter has also changed from ``True`` to ``None`` (:issue:`18160`).
- :func:`Series.rolling().apply() <pandas.core.window.Rolling.apply>`, :func:`DataFrame.rolling().apply() <pandas.core.window.Rolling.apply>`,
:func:`Series.expanding().apply() <pandas.core.window.Expanding.apply>`, and :func:`DataFrame.expanding().apply() <pandas.core.window.Expanding.apply>` have deprecated passing an ``np.array`` by default. One will need to pass the new ``raw`` parameter to be explicit about what is passed (:issue:`20584`)
- ``DatetimeIndex.offset`` is deprecated. Use ``DatetimeIndex.freq`` instead (:issue:`20716`)

.. _whatsnew_0230.prior_deprecations:

Expand Down Expand Up @@ -1045,14 +1088,12 @@ Offsets

Numeric
^^^^^^^
- Bug in :meth:`DataFrame.rank` and :meth:`Series.rank` when ``method='dense'`` and ``pct=True`` in which percentile ranks were not being used with the number of distinct observations (:issue:`15630`)
- Bug in :class:`Series` constructor with an int or float list where specifying ``dtype=str``, ``dtype='str'`` or ``dtype='U'`` failed to convert the data elements to strings (:issue:`16605`)
- Bug in :class:`Index` multiplication and division methods where operating with a ``Series`` would return an ``Index`` object instead of a ``Series`` object (:issue:`19042`)
- Bug in the :class:`DataFrame` constructor in which data containing very large positive or very large negative numbers was causing ``OverflowError`` (:issue:`18584`)
- Bug in :class:`Index` constructor with ``dtype='uint64'`` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)
- Bug in :class:`DataFrame` flex arithmetic (e.g. ``df.add(other, fill_value=foo)``) with a ``fill_value`` other than ``None`` failed to raise ``NotImplementedError`` in corner cases where either the frame or ``other`` has length zero (:issue:`19522`)
- Multiplication and division of numeric-dtyped :class:`Index` objects with timedelta-like scalars returns ``TimedeltaIndex`` instead of raising ``TypeError`` (:issue:`19333`)
- Bug in :meth:`Series.rank` and :meth:`DataFrame.rank` when ``ascending='False'`` failed to return correct ranks for infinity if ``NaN`` were present (:issue:`19538`)
- Bug where ``NaN`` was returned instead of 0 by :func:`Series.pct_change` and :func:`DataFrame.pct_change` when ``fill_method`` is not ``None`` (:issue:`19873`)


Expand All @@ -1077,6 +1118,8 @@ Indexing
- Bug in :meth:`DataFrame.first_valid_index` and :meth:`DataFrame.last_valid_index` in presence of entire rows of NaNs in the middle of values (:issue:`20499`).
- Bug in :class:`IntervalIndex` where some indexing operations were not supported for overlapping or non-monotonic ``uint64`` data (:issue:`20636`)
- Bug in ``Series.is_unique`` where extraneous output in stderr is shown if Series contains objects with ``__ne__`` defined (:issue:`20661`)
- Bug in ``.loc`` assignment with a single-element list-like incorrectly assigns as a list (:issue:`19474`)
- Bug in partial string indexing on a ``Series/DataFrame`` with a monotonic decreasing ``DatetimeIndex`` (:issue:`19362`)

MultiIndex
^^^^^^^^^^
Expand Down Expand Up @@ -1114,6 +1157,7 @@ I/O
- Bug in :meth:`pandas.io.json.json_normalize` where subrecords are not properly normalized if any subrecords values are NoneType (:issue:`20030`)
- Bug in ``usecols`` parameter in :func:`read_csv` where error is not raised correctly when passing a string. (:issue:`20529`)
- Bug in :func:`HDFStore.keys` when reading a file with a softlink causes exception (:issue:`20523`)
- Bug in :func:`HDFStore.select_column` where a key which is not a valid store raised an ``AttributeError`` instead of a ``KeyError`` (:issue:`17912`)

Plotting
^^^^^^^^
Expand Down Expand Up @@ -1177,6 +1221,7 @@ Reshaping
- Bug in :meth:`DataFrame.astype` where column metadata is lost when converting to categorical or a dictionary of dtypes (:issue:`19920`)
- Bug in :func:`cut` and :func:`qcut` where timezone information was dropped (:issue:`19872`)
- Bug in :class:`Series` constructor with a ``dtype=str``, previously raised in some cases (:issue:`19853`)
- Improved error message when the number of levels in a pivot table or an unstacked dataframe is too large causing int32 overflow (:issue:`20601`)

Other
^^^^^
Expand Down