Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardize signature for Index reductions, implement nanmean for datetime64 dtypes #24293

Merged
merged 21 commits into from
Dec 29, 2018
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
f38ffe1
standardize signature for Index reductions, implement nanmean for dat…
jbrockmendel Dec 15, 2018
04cf1f7
suppress warnings
jbrockmendel Dec 15, 2018
3efab79
requested edits
jbrockmendel Dec 15, 2018
a652439
implement view
jbrockmendel Dec 15, 2018
ce760d3
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Dec 19, 2018
2380af6
pass skipna, tests
jbrockmendel Dec 19, 2018
4157f0b
fixup isort
jbrockmendel Dec 20, 2018
50642ae
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Dec 23, 2018
0baedf3
fixup rebase screwups
jbrockmendel Dec 23, 2018
6e0e69f
move len-self checks
jbrockmendel Dec 23, 2018
e2c301b
fixup more rebase screwups
jbrockmendel Dec 23, 2018
fce67ac
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Dec 24, 2018
4b4979f
do values viewing in one place
jbrockmendel Dec 24, 2018
82b8cdf
unpack Series
jbrockmendel Dec 27, 2018
ffe6ada
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Dec 27, 2018
7f7693f
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Dec 28, 2018
07c3102
implement _extract_datetimelike_values_and_dtype
jbrockmendel Dec 28, 2018
6c93410
fix mask
jbrockmendel Dec 28, 2018
4620c56
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Dec 28, 2018
4777b75
requested docstring edits, remoe duplicated view
jbrockmendel Dec 28, 2018
aa4028a
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Dec 28, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,26 @@ def _maybe_mask_results(self, result, fill_value=iNaT, convert=None):
result[self._isnan] = fill_value
return result

# ------------------------------------------------------------------
# Additional array methods
# These are not part of the EA API, but we implement them because
# pandas currently assumes they're there.

def view(self, dtype=None):
jreback marked this conversation as resolved.
Show resolved Hide resolved
"""
New view on this array with the same data.

Parameters
----------
dtype : numpy dtype, optional

Returns
-------
ndarray
With the specified `dtype`.
"""
return self._data.view(dtype=dtype)

# ------------------------------------------------------------------
# Frequency Properties/Methods

Expand Down
36 changes: 32 additions & 4 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -920,10 +920,16 @@ def _ndarray_values(self):
def empty(self):
return not self.size

def max(self):
def max(self, axis=None, skipna=True):
jreback marked this conversation as resolved.
Show resolved Hide resolved
"""
Return the maximum value of the Index.

Parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we have similar things in pandas/core/base, these should change as well.

----------
axis : {None}
Dummy argument for consistency with Series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we use this terminology here

axis : {0)
    Axis for the function to be applied on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @jbrockmendel is listing the set of possible values, which we do use. In this case, that set is just the single value None, so it looks strange.

I think

axis : int, optional
    For compatibility with NumPy. Only 0 or None are allowed.

is clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger verbiage would be good

skipna : bool, default True

Returns
-------
scalar
Expand Down Expand Up @@ -951,22 +957,36 @@ def max(self):
>>> idx.max()
('b', 2)
"""
nv.validate_minmax_axis(axis)
return nanops.nanmax(self.values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't skipna passed? Is that a followup item?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered this a follow-up item, but am now leaning towards fixing these now. Good catch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turns out to be more involved than expected, requires some changes in Series._reduce and IndexOpsMixin._reduce. Neither of which is all that invasive, but changing them entails changing/fixing/testing behavior for other reductions I hadn't planned on doing in this PR.

Currently going through to get that all done here. I'm also OK with doing this in two phases so as to take things out of #24024 sooner rather than later.


def argmax(self, axis=None):
def argmax(self, axis=None, skipna=True):
"""
Return a ndarray of the maximum argument indexer.

Parameters
----------
axis : {None}
Dummy argument for consistency with Series
skipna : bool, default True

See Also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be consisten about the See Also, e.g. make sure in all, and add a refernce to the Series.min function as well (as appropriate)

--------
numpy.ndarray.argmax
"""
nv.validate_minmax_axis(axis)
return nanops.nanargmax(self.values)
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

def min(self):
def min(self, axis=None, skipna=True):
"""
Return the minimum value of the Index.

Parameters
----------
axis : {None}
Dummy argument for consistency with Series
skipna : bool, default True

Returns
-------
scalar
Expand Down Expand Up @@ -994,16 +1014,24 @@ def min(self):
>>> idx.min()
('a', 1)
"""
nv.validate_minmax_axis(axis)
return nanops.nanmin(self.values)

def argmin(self, axis=None):
def argmin(self, axis=None, skipna=True):
"""
Return a ndarray of the minimum argument indexer.

Parameters
----------
axis : {None}
Dummy argument for consistency with Series
skipna : bool, default True

See Also
--------
numpy.ndarray.argmin
"""
nv.validate_minmax_axis(axis)
jreback marked this conversation as resolved.
Show resolved Hide resolved
return nanops.nanargmin(self.values)

def tolist(self):
Expand Down
7 changes: 5 additions & 2 deletions pandas/core/dtypes/missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@
is_period_dtype, is_scalar, is_string_dtype, is_string_like_dtype,
is_timedelta64_dtype, needs_i8_conversion, pandas_dtype)
from .generic import (
ABCExtensionArray, ABCGeneric, ABCIndexClass, ABCMultiIndex, ABCSeries)
ABCDatetimeArray, ABCExtensionArray, ABCGeneric, ABCIndexClass,
ABCMultiIndex, ABCSeries)
from .inference import is_list_like

isposinf_scalar = libmissing.isposinf_scalar
Expand Down Expand Up @@ -108,7 +109,7 @@ def _isna_new(obj):
elif isinstance(obj, ABCMultiIndex):
raise NotImplementedError("isna is not defined for MultiIndex")
elif isinstance(obj, (ABCSeries, np.ndarray, ABCIndexClass,
ABCExtensionArray)):
ABCExtensionArray, ABCDatetimeArray)):
jreback marked this conversation as resolved.
Show resolved Hide resolved
return _isna_ndarraylike(obj)
elif isinstance(obj, ABCGeneric):
return obj._constructor(obj._data.isna(func=isna))
Expand Down Expand Up @@ -196,6 +197,8 @@ def _isna_ndarraylike(obj):
else:
values = obj
result = values.isna()
elif isinstance(obj, ABCDatetimeArray):
jreback marked this conversation as resolved.
Show resolved Hide resolved
return obj.isna()
elif is_string_dtype(dtype):
# Working around NumPy ticket 1542
shape = values.shape
Expand Down
9 changes: 5 additions & 4 deletions pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ class DatetimeIndexOpsMixin(DatetimeLikeArrayMixin):
copy = Index.copy
unique = Index.unique
take = Index.take
view = Index.view

# DatetimeLikeArrayMixin assumes subclasses are mutable, so these are
# properties there. They can be made into cache_readonly for Index
Expand Down Expand Up @@ -258,7 +259,7 @@ def tolist(self):
"""
return list(self.astype(object))

def min(self, axis=None, *args, **kwargs):
def min(self, axis=None, skipna=True, *args, **kwargs):
"""
Return the minimum value of the Index or minimum along
an axis.
Expand Down Expand Up @@ -286,7 +287,7 @@ def min(self, axis=None, *args, **kwargs):
except ValueError:
return self._na_value

def argmin(self, axis=None, *args, **kwargs):
def argmin(self, axis=None, skipna=True, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these inherit the doc-string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look. Might also get rid of *args while at it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm the docstrings here and for the corresponding methods in base.IndexOpsMixin are kind of clunky. This may merit a separate look, @datapythonista ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know well what's the class hierarchy and whether makes sense to inherit. But we'll have to add Parameters, Returns and Examples section here if this is public.

"""
Returns the indices of the minimum values along an axis.

Expand All @@ -309,7 +310,7 @@ def argmin(self, axis=None, *args, **kwargs):
i8[mask] = np.iinfo('int64').max
return i8.argmin()

def max(self, axis=None, *args, **kwargs):
def max(self, axis=None, skipna=True, *args, **kwargs):
"""
Return the maximum value of the Index or maximum along
an axis.
Expand Down Expand Up @@ -337,7 +338,7 @@ def max(self, axis=None, *args, **kwargs):
except ValueError:
return self._na_value

def argmax(self, axis=None, *args, **kwargs):
def argmax(self, axis=None, skipna=True, *args, **kwargs):
"""
Returns the indices of the maximum values along an axis.

Expand Down
6 changes: 4 additions & 2 deletions pandas/core/indexes/range.py
Original file line number Diff line number Diff line change
Expand Up @@ -297,12 +297,14 @@ def _minmax(self, meth):

return self._start + self._step * no_steps

def min(self):
def min(self, axis=None, skipna=True):
"""The minimum value of the RangeIndex"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same. why don't these inherit the doc-string

nv.validate_minmax_axis(axis)
return self._minmax('min')

def max(self):
def max(self, axis=None, skipna=True):
"""The maximum value of the RangeIndex"""
nv.validate_minmax_axis(axis)
return self._minmax('max')

def argsort(self, *args, **kwargs):
Expand Down
27 changes: 19 additions & 8 deletions pandas/core/nanops.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@

import numpy as np

from pandas._libs import lib, tslibs
from pandas._libs import iNaT, lib, tslibs
import pandas.compat as compat

from pandas.core.dtypes.cast import _int64_max, maybe_upcast_putmask
from pandas.core.dtypes.common import (
_get_dtype, is_any_int_dtype, is_bool_dtype, is_complex, is_complex_dtype,
is_datetime64_dtype, is_datetime_or_timedelta_dtype, is_float,
is_float_dtype, is_integer, is_integer_dtype, is_numeric_dtype,
is_datetime64_dtype, is_datetime64tz_dtype, is_datetime_or_timedelta_dtype,
is_float, is_float_dtype, is_integer, is_integer_dtype, is_numeric_dtype,
is_object_dtype, is_scalar, is_timedelta64_dtype)
from pandas.core.dtypes.missing import isna, na_value_for_dtype, notna

Expand Down Expand Up @@ -203,6 +203,7 @@ def _get_values(values, skipna, fill_value=None, fill_value_typ=None,
if necessary copy and mask using the specified fill_value
copy = True will force the copy
"""
orig_values = values
values = com.values_from_object(values)

if mask is None:
Expand All @@ -212,6 +213,10 @@ def _get_values(values, skipna, fill_value=None, fill_value_typ=None,
mask = isna(values)

dtype = values.dtype
if is_datetime64tz_dtype(orig_values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very very odd place to do this as no other dtype specific things are here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semi-agree. Most of nanops is numpy dtype/ndarray specific, so making datetime64tz work is klunky. That's why I thought it made more sense to implement these directly in DTA in #23890. But conditional on implementing this here in nanops, this check is necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is this is in the wrong place. nanops already handles dtypes just not here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific as to where you want this? The place where dtype is currently defined is one line above this, so doing this anywhere else seems weird to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this should go in _view_if_needed (or maybe just remove that function and inline it here is ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't yet. view_if_needed is called ~25 lines below this. It isn't obvious to me that it is OK to move it up to here. If it is, then sure, its a one-liner that can be inlined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes move it, as I said above you have dtype inference in 2 places for no reason.

dtype = orig_values.dtype

values = getattr(values, 'asi8', values)
dtype_ok = _na_ok_dtype(dtype)

# get our fill value (in case we need to provide an alternative
Expand Down Expand Up @@ -268,12 +273,16 @@ def _view_if_needed(values):
def _wrap_results(result, dtype, fill_value=None):
""" wrap our results if needed """

if is_datetime64_dtype(dtype):
if is_datetime64_dtype(dtype) or is_datetime64tz_dtype(dtype):
if fill_value is None:
# GH#24293
fill_value = iNaT
if not isinstance(result, np.ndarray):
tz = getattr(dtype, 'tz', None)
assert not isna(fill_value), "Expected non-null fill_value"
if result == fill_value:
result = np.nan
result = tslibs.Timestamp(result)
result = tslibs.Timestamp(result, tz=tz)
else:
result = result.view(dtype)
elif is_timedelta64_dtype(dtype):
Expand Down Expand Up @@ -426,7 +435,6 @@ def nansum(values, axis=None, skipna=True, min_count=0, mask=None):
return _wrap_results(the_sum, dtype)


@disallow('M8')
@bottleneck_switch()
def nanmean(values, axis=None, skipna=True, mask=None):
"""
Expand Down Expand Up @@ -457,7 +465,8 @@ def nanmean(values, axis=None, skipna=True, mask=None):
values, skipna, 0, mask=mask)
dtype_sum = dtype_max
dtype_count = np.float64
if is_integer_dtype(dtype) or is_timedelta64_dtype(dtype):
if (is_integer_dtype(dtype) or is_timedelta64_dtype(dtype) or
jreback marked this conversation as resolved.
Show resolved Hide resolved
is_datetime64_dtype(dtype) or is_datetime64tz_dtype(dtype)):
dtype_sum = np.float64
elif is_float_dtype(dtype):
dtype_sum = dtype
Expand All @@ -466,7 +475,9 @@ def nanmean(values, axis=None, skipna=True, mask=None):
the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))

if axis is not None and getattr(the_sum, 'ndim', False):
the_mean = the_sum / count
with np.errstate(all="ignore"):
# suppress division by zero warnings
the_mean = the_sum / count
ct_mask = count == 0
if ct_mask.any():
the_mean[ct_mask] = np.nan
Expand Down
5 changes: 3 additions & 2 deletions pandas/tests/series/test_analytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -517,9 +517,10 @@ def _check_stat_op(self, name, alternate, string_series_,
string_series_[5:15] = np.NaN

# idxmax, idxmin, min, and max are valid for dates
if name not in ['max', 'min']:
if name not in ['max', 'min', 'mean']:
ds = Series(date_range('1/1/2001', periods=10))
pytest.raises(TypeError, f, ds)
with pytest.raises(TypeError):
f(ds)

# skipna or no
assert notna(f(string_series_))
Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/test_nanops.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from pandas import Series, isna
from pandas.compat.numpy import _np_version_under1p13
from pandas.core.dtypes.common import is_integer_dtype
from pandas.core.arrays import DatetimeArrayMixin as DatetimeArray

use_bn = nanops._USE_BOTTLENECK

Expand Down Expand Up @@ -998,6 +999,23 @@ def prng(self):
return np.random.RandomState(1234)


class TestDatetime64NaNOps(object):
@pytest.mark.parametrize('tz', [None, 'UTC'])
def test_nanmean(self, tz):
dti = pd.date_range('2016-01-01', periods=3, tz=tz)
expected = dti[1]

for obj in [dti, DatetimeArray(dti), Series(dti)]:
result = nanops.nanmean(obj)
assert result == expected

dti2 = dti.insert(1, pd.NaT)

for obj in [dti2, DatetimeArray(dti2), Series(dti2)]:
result = nanops.nanmean(obj)
assert result == expected


def test_use_bottleneck():

if nanops._BOTTLENECK_INSTALLED:
Expand Down