Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMPAT: Expand compatibility with fromnumeric.py #12810

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
35 changes: 34 additions & 1 deletion doc/source/whatsnew/v0.18.1.txt
Expand Up @@ -357,6 +357,39 @@ New Behavior:
df.groupby('c', sort=False).nth(1)


.. _whatsnew_0181.numpy_compatibility

Compatibility between pandas array-like methods (e.g. ```sum`` and ``take``) and their ``numpy``
counterparts has been greatly increased by augmenting the signatures of the ``pandas`` methods so
as to accept arguments that can be passed in from ``numpy``, even if they are not necessarily
used in the ``pandas`` implementation (:issue:`12644`). Issues that were addressed were:

- ``.searchsorted()`` for ``Index`` and ``TimedeltaIndex`` now accept a ``sorter`` argument to maintain compatibility with numpy's ``searchsorted`` function (:issue:`12238`)
- Bug in numpy compatibility of ``np.round()`` on a ``Series`` (:issue:`12600`)

An example of this signature augmentation is illustrated below:

Previous behaviour:

.. code-block:: ipython

In [1]: sp = pd.SparseDataFrame([1, 2, 3])
In [2]: np.cumsum(sp, axis=0)
...
TypeError: cumsum() takes at most 2 arguments (4 given)

New behaviour:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this an ipython-block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this didn't update, did you push?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub is being buggy here. You can see for yourself below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah weird.

.. code-block:: ipython

In [1]: sp = pd.SparseDataFrame([1, 2, 3])
In [2]: np.cumsum(sp, axis=0)
Out[1]:
0
0 1.0
1 3.0
2 6.0

.. _whatsnew_0181.apply_resample:

Using ``.apply`` on groupby resampling
Expand Down Expand Up @@ -527,7 +560,6 @@ Bug Fixes
- Bug in ``.resample(...)`` with a ``PeriodIndex`` casting to a ``DatetimeIndex`` when empty (:issue:`12868`)
- Bug in ``.resample(...)`` with a ``PeriodIndex`` when resampling to an existing frequency (:issue:`12770`)
- Bug in printing data which contains ``Period`` with different ``freq`` raises ``ValueError`` (:issue:`12615`)
- Bug in numpy compatibility of ``np.round()`` on a ``Series`` (:issue:`12600`)
- Bug in ``Series`` construction with ``Categorical`` and ``dtype='category'`` is specified (:issue:`12574`)
- Bugs in concatenation with a coercable dtype was too aggressive, resulting in different dtypes in outputformatting when an object was longer than ``display.max_rows`` (:issue:`12411`, :issue:`12045`, :issue:`11594`, :issue:`10571`, :issue:`12211`)
- Bug in ``float_format`` option with option not being validated as a callable. (:issue:`12706`)
Expand All @@ -547,6 +579,7 @@ Bug Fixes
- Segfault in ``to_json`` when attempting to serialise a ``DataFrame`` or ``Series`` with non-ndarray values (:issue:`10778`).
- Bug in ``.align`` not returning the sub-class (:issue:`12983`)
- Bug in aligning a ``Series`` with a ``DataFrame`` (:issue:`13037`)
- Bug in ``ABCPanel`` in which ``Panel4D`` was not being considered as a valid instance of this generic type (:issue:`12810`)


- Bug in consistency of ``.name`` on ``.groupby(..).apply(..)`` cases (:issue:`12363`)
Expand Down
2 changes: 1 addition & 1 deletion pandas/__init__.py
Expand Up @@ -19,7 +19,7 @@


# numpy compat
from pandas.compat.numpy_compat import *
from pandas.compat.numpy import *

try:
from pandas import hashtable, tslib, lib
Expand Down
File renamed without changes.
247 changes: 247 additions & 0 deletions pandas/compat/numpy/function.py
@@ -0,0 +1,247 @@
"""
For compatibility with numpy libraries, pandas functions or
methods have to accept '*args' and '**kwargs' parameters to
accommodate numpy arguments that are not actually used or
respected in the pandas implementation.

To ensure that users do not abuse these parameters, validation
is performed in 'validators.py' to make sure that any extra
parameters passed correspond ONLY to those in the numpy signature.
Part of that validation includes whether or not the user attempted
to pass in non-default values for these extraneous parameters. As we
want to discourage users from relying on these parameters when calling
the pandas implementation, we want them only to pass in the default values
for these parameters.

This module provides a set of commonly used default arguments for functions
and methods that are spread throughout the codebase. This module will make it
easier to adjust to future upstream changes in the analogous numpy signatures.
"""

from numpy import ndarray
from pandas.util.validators import (validate_args, validate_kwargs,
validate_args_and_kwargs)
from pandas.core.common import is_integer
from pandas.compat import OrderedDict


class CompatValidator(object):
def __init__(self, defaults, fname=None, method=None,
max_fname_arg_count=None):
self.fname = fname
self.method = method
self.defaults = defaults
self.max_fname_arg_count = max_fname_arg_count

def __call__(self, args, kwargs, fname=None,
max_fname_arg_count=None, method=None):
fname = self.fname if fname is None else fname
max_fname_arg_count = (self.max_fname_arg_count if
max_fname_arg_count is None
else max_fname_arg_count)
method = self.method if method is None else method

if method == 'args':
validate_args(fname, args, max_fname_arg_count, self.defaults)
elif method == 'kwargs':
validate_kwargs(fname, kwargs, self.defaults)
elif method == 'both':
validate_args_and_kwargs(fname, args, kwargs,
max_fname_arg_count,
self.defaults)
else:
raise ValueError("invalid validation method "
"'{method}'".format(method=method))

ARGMINMAX_DEFAULTS = dict(out=None)
validate_argmin = CompatValidator(ARGMINMAX_DEFAULTS, fname='argmin',
method='both', max_fname_arg_count=1)
validate_argmax = CompatValidator(ARGMINMAX_DEFAULTS, fname='argmax',
method='both', max_fname_arg_count=1)


def process_skipna(skipna, args):
if isinstance(skipna, ndarray) or skipna is None:
args = (skipna,) + args
skipna = True

return skipna, args


def validate_argmin_with_skipna(skipna, args, kwargs):
"""
If 'Series.argmin' is called via the 'numpy' library,
the third parameter in its signature is 'out', which
takes either an ndarray or 'None', so check if the
'skipna' parameter is either an instance of ndarray or
is None, since 'skipna' itself should be a boolean
"""

skipna, args = process_skipna(skipna, args)
validate_argmin(args, kwargs)
return skipna


def validate_argmax_with_skipna(skipna, args, kwargs):
"""
If 'Series.argmax' is called via the 'numpy' library,
the third parameter in its signature is 'out', which
takes either an ndarray or 'None', so check if the
'skipna' parameter is either an instance of ndarray or
is None, since 'skipna' itself should be a boolean
"""

skipna, args = process_skipna(skipna, args)
validate_argmax(args, kwargs)
return skipna

ARGSORT_DEFAULTS = OrderedDict()
ARGSORT_DEFAULTS['axis'] = -1
ARGSORT_DEFAULTS['kind'] = 'quicksort'
ARGSORT_DEFAULTS['order'] = None
validate_argsort = CompatValidator(ARGSORT_DEFAULTS, fname='argsort',
max_fname_arg_count=0, method='both')


def validate_argsort_with_ascending(ascending, args, kwargs):
"""
If 'Categorical.argsort' is called via the 'numpy' library, the
first parameter in its signature is 'axis', which takes either
an integer or 'None', so check if the 'ascending' parameter has
either integer type or is None, since 'ascending' itself should
be a boolean
"""

if is_integer(ascending) or ascending is None:
args = (ascending,) + args
ascending = True

validate_argsort(args, kwargs, max_fname_arg_count=1)
return ascending

CLIP_DEFAULTS = dict(out=None)
validate_clip = CompatValidator(CLIP_DEFAULTS, fname='clip',
method='both', max_fname_arg_count=3)


def validate_clip_with_axis(axis, args, kwargs):
"""
If 'NDFrame.clip' is called via the numpy library, the third
parameter in its signature is 'out', which can takes an ndarray,
so check if the 'axis' parameter is an instance of ndarray, since
'axis' itself should either be an integer or None
"""

if isinstance(axis, ndarray):
args = (axis,) + args
axis = None

validate_clip(args, kwargs)
return axis

COMPRESS_DEFAULTS = OrderedDict()
COMPRESS_DEFAULTS['axis'] = None
COMPRESS_DEFAULTS['out'] = None
validate_compress = CompatValidator(COMPRESS_DEFAULTS, fname='compress',
method='both', max_fname_arg_count=1)

CUM_FUNC_DEFAULTS = OrderedDict()
CUM_FUNC_DEFAULTS['dtype'] = None
CUM_FUNC_DEFAULTS['out'] = None
validate_cum_func = CompatValidator(CUM_FUNC_DEFAULTS, method='kwargs')
validate_cumsum = CompatValidator(CUM_FUNC_DEFAULTS, fname='cumsum',
method='both', max_fname_arg_count=1)

LOGICAL_FUNC_DEFAULTS = dict(out=None)
validate_logical_func = CompatValidator(LOGICAL_FUNC_DEFAULTS, method='kwargs')

MINMAX_DEFAULTS = dict(out=None)
validate_min = CompatValidator(MINMAX_DEFAULTS, fname='min',
method='both', max_fname_arg_count=1)
validate_max = CompatValidator(MINMAX_DEFAULTS, fname='max',
method='both', max_fname_arg_count=1)

RESHAPE_DEFAULTS = dict(order='C')
validate_reshape = CompatValidator(RESHAPE_DEFAULTS, fname='reshape',
method='both', max_fname_arg_count=1)

REPEAT_DEFAULTS = dict(axis=None)
validate_repeat = CompatValidator(REPEAT_DEFAULTS, fname='repeat',
method='both', max_fname_arg_count=1)

ROUND_DEFAULTS = dict(out=None)
validate_round = CompatValidator(ROUND_DEFAULTS, fname='round',
method='both', max_fname_arg_count=1)

SORT_DEFAULTS = OrderedDict()
SORT_DEFAULTS['axis'] = -1
SORT_DEFAULTS['kind'] = 'quicksort'
SORT_DEFAULTS['order'] = None
validate_sort = CompatValidator(SORT_DEFAULTS, fname='sort',
method='kwargs')

STAT_FUNC_DEFAULTS = OrderedDict()
STAT_FUNC_DEFAULTS['dtype'] = None
STAT_FUNC_DEFAULTS['out'] = None
validate_stat_func = CompatValidator(STAT_FUNC_DEFAULTS,
method='kwargs')
validate_sum = CompatValidator(STAT_FUNC_DEFAULTS, fname='sort',
method='both', max_fname_arg_count=1)
validate_mean = CompatValidator(STAT_FUNC_DEFAULTS, fname='mean',
method='both', max_fname_arg_count=1)

STAT_DDOF_FUNC_DEFAULTS = OrderedDict()
STAT_DDOF_FUNC_DEFAULTS['dtype'] = None
STAT_DDOF_FUNC_DEFAULTS['out'] = None
validate_stat_ddof_func = CompatValidator(STAT_DDOF_FUNC_DEFAULTS,
method='kwargs')

# Currently, numpy (v1.11) has backwards compatibility checks
# in place so that this 'kwargs' parameter is technically
# unnecessary, but in the long-run, this will be needed.
SQUEEZE_DEFAULTS = dict(axis=None)
validate_squeeze = CompatValidator(SQUEEZE_DEFAULTS, fname='squeeze',
method='kwargs')

TAKE_DEFAULTS = OrderedDict()
TAKE_DEFAULTS['out'] = None
TAKE_DEFAULTS['mode'] = 'raise'
validate_take = CompatValidator(TAKE_DEFAULTS, fname='take',
method='kwargs')


def validate_take_with_convert(convert, args, kwargs):
"""
If this function is called via the 'numpy' library, the third
parameter in its signature is 'axis', which takes either an
ndarray or 'None', so check if the 'convert' parameter is either
an instance of ndarray or is None
"""

if isinstance(convert, ndarray) or convert is None:
args = (convert,) + args
convert = True

validate_take(args, kwargs, max_fname_arg_count=3, method='both')
return convert

TRANSPOSE_DEFAULTS = dict(axes=None)
validate_transpose = CompatValidator(TRANSPOSE_DEFAULTS, fname='transpose',
method='both', max_fname_arg_count=0)


def validate_transpose_for_generic(inst, kwargs):
try:
validate_transpose(tuple(), kwargs)
except ValueError as e:
klass = type(inst).__name__
msg = str(e)

# the Panel class actual relies on the 'axes' parameter if called
# via the 'numpy' library, so let's make sure the error is specific
# about saying that the parameter is not supported for particular
# implementations of 'transpose'
if "the 'axes' parameter is not supported" in msg:
msg += " for {klass} instances".format(klass=klass)

raise ValueError(msg)
1 change: 0 additions & 1 deletion pandas/compat/pickle_compat.py
Expand Up @@ -3,7 +3,6 @@
# flake8: noqa

import sys
import numpy as np
import pandas
import copy
import pickle as pkl
Expand Down
4 changes: 3 additions & 1 deletion pandas/core/base.py
Expand Up @@ -7,6 +7,7 @@
from pandas.core import common as com
import pandas.core.nanops as nanops
import pandas.lib as lib
from pandas.compat.numpy import function as nv
from pandas.util.decorators import (Appender, cache_readonly,
deprecate_kwarg, Substitution)
from pandas.core.common import AbstractMethodError
Expand Down Expand Up @@ -798,8 +799,9 @@ class IndexOpsMixin(object):
# ndarray compatibility
__array_priority__ = 1000

def transpose(self):
def transpose(self, *args, **kwargs):
""" return the transpose, which is by definition self """
nv.validate_transpose(args, kwargs)
return self

T = property(transpose, doc="return the transpose, which is by "
Expand Down