Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardize signature for Index reductions, implement nanmean for datetime64 dtypes #24293

Merged
merged 21 commits into from
Dec 29, 2018

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Dec 15, 2018

Same basic goal as #23890, but focusing on nanops (a module I'm not all that familiar with) first, with the array methods later, instead of vice-versa.

I think the only user-facing change (i.e. need for whatsnew) is that Series[datetime64].mean() now works.

I need to see if there are any more tests from #23890 or #24024 that can be copied here.

@pep8speaks
Copy link

pep8speaks commented Dec 15, 2018

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 28, 2018 at 23:37 Hours UTC

@jbrockmendel
Copy link
Member Author

possibly closes #21583

@jbrockmendel
Copy link
Member Author

test_nanops is a PITA in to troubleshoot. code de-duplication at the expense of clarity may need re-thinking at some point.

@codecov
Copy link

codecov bot commented Dec 15, 2018

Codecov Report

Merging #24293 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24293      +/-   ##
==========================================
+ Coverage   92.22%   92.22%   +<.01%     
==========================================
  Files         162      162              
  Lines       51824    51842      +18     
==========================================
+ Hits        47795    47813      +18     
  Misses       4029     4029
Flag Coverage Δ
#multiple 90.63% <100%> (ø) ⬆️
#single 43% <45.71%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/nanops.py 95.14% <100%> (+0.09%) ⬆️
pandas/core/indexes/datetimelike.py 97.29% <100%> (ø) ⬆️
pandas/core/base.py 97.66% <100%> (+0.02%) ⬆️
pandas/core/dtypes/missing.py 93.18% <100%> (+0.07%) ⬆️
pandas/core/indexes/range.py 97.34% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b0fa8e...04cf1f7. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 15, 2018

Codecov Report

Merging #24293 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24293      +/-   ##
==========================================
- Coverage   92.32%   92.31%   -0.01%     
==========================================
  Files         166      166              
  Lines       52298    52320      +22     
==========================================
+ Hits        48285    48301      +16     
- Misses       4013     4019       +6
Flag Coverage Δ
#multiple 90.73% <100%> (-0.01%) ⬇️
#single 43.04% <31.03%> (-0.03%) ⬇️
Impacted Files Coverage Δ
pandas/core/nanops.py 94.9% <100%> (-0.16%) ⬇️
pandas/core/indexes/datetimelike.py 97.59% <100%> (+0.05%) ⬆️
pandas/core/base.py 97.72% <100%> (+0.02%) ⬆️
pandas/core/series.py 93.73% <100%> (+0.01%) ⬆️
pandas/core/dtypes/missing.py 93.18% <100%> (+0.07%) ⬆️
pandas/core/indexes/range.py 97.34% <100%> (+0.01%) ⬆️
pandas/core/dtypes/cast.py 88.72% <0%> (-0.67%) ⬇️
pandas/core/arrays/datetimelike.py 95.45% <0%> (-0.2%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f1c1dc...aa4028a. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Dec 15, 2018

needs. parameterization pass

pandas/core/base.py Show resolved Hide resolved
pandas/core/dtypes/missing.py Outdated Show resolved Hide resolved
pandas/core/dtypes/missing.py Show resolved Hide resolved
@@ -286,7 +286,7 @@ def min(self, axis=None, *args, **kwargs):
except ValueError:
return self._na_value

def argmin(self, axis=None, *args, **kwargs):
def argmin(self, axis=None, skipna=True, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these inherit the doc-string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look. Might also get rid of *args while at it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm the docstrings here and for the corresponding methods in base.IndexOpsMixin are kind of clunky. This may merit a separate look, @datapythonista ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know well what's the class hierarchy and whether makes sense to inherit. But we'll have to add Parameters, Returns and Examples section here if this is public.

@@ -297,12 +297,14 @@ def _minmax(self, meth):

return self._start + self._step * no_steps

def min(self):
def min(self, axis=None, skipna=True):
"""The minimum value of the RangeIndex"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same. why don't these inherit the doc-string

@@ -212,6 +213,10 @@ def _get_values(values, skipna, fill_value=None, fill_value_typ=None,
mask = isna(values)

dtype = values.dtype
if is_datetime64tz_dtype(orig_values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very very odd place to do this as no other dtype specific things are here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semi-agree. Most of nanops is numpy dtype/ndarray specific, so making datetime64tz work is klunky. That's why I thought it made more sense to implement these directly in DTA in #23890. But conditional on implementing this here in nanops, this check is necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is this is in the wrong place. nanops already handles dtypes just not here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific as to where you want this? The place where dtype is currently defined is one line above this, so doing this anywhere else seems weird to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this should go in _view_if_needed (or maybe just remove that function and inline it here is ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't yet. view_if_needed is called ~25 lines below this. It isn't obvious to me that it is OK to move it up to here. If it is, then sure, its a one-liner that can be inlined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes move it, as I said above you have dtype inference in 2 places for no reason.

pandas/core/nanops.py Outdated Show resolved Hide resolved
pandas/core/nanops.py Show resolved Hide resolved
pandas/core/nanops.py Outdated Show resolved Hide resolved
@jreback jreback added Datetime Datetime data dtype API Design labels Dec 15, 2018
@@ -212,6 +213,10 @@ def _get_values(values, skipna, fill_value=None, fill_value_typ=None,
mask = isna(values)

dtype = values.dtype
if is_datetime64tz_dtype(orig_values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is this is in the wrong place. nanops already handles dtypes just not here.

pandas/core/nanops.py Outdated Show resolved Hide resolved
pandas/core/nanops.py Outdated Show resolved Hide resolved
pandas/core/arrays/datetimelike.py Outdated Show resolved Hide resolved
@jbrockmendel
Copy link
Member Author

If everyone is happy with this, I can get started on a follow-up that will trim reductions off of #24024.

@@ -951,22 +957,36 @@ def max(self):
>>> idx.max()
('b', 2)
"""
nv.validate_minmax_axis(axis)
return nanops.nanmax(self.values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't skipna passed? Is that a followup item?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered this a follow-up item, but am now leaning towards fixing these now. Good catch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turns out to be more involved than expected, requires some changes in Series._reduce and IndexOpsMixin._reduce. Neither of which is all that invasive, but changing them entails changing/fixing/testing behavior for other reductions I hadn't planned on doing in this PR.

Currently going through to get that all done here. I'm also OK with doing this in two phases so as to take things out of #24024 sooner rather than later.

pandas/core/base.py Outdated Show resolved Hide resolved
@jbrockmendel
Copy link
Member Author

Updated to pass skipna in the appropriate places in Index and DatetimeIndexOpsMixin methods. Updated a handful of tests, but I'm not confident we got thorough coverage

@jbrockmendel
Copy link
Member Author

Travis fail is hypothesis

@jbrockmendel
Copy link
Member Author

rebased

pandas/core/base.py Show resolved Hide resolved
"""
Return the maximum value of the Index.

Parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we have similar things in pandas/core/base, these should change as well.

# quick check
if len(i8) and self.is_monotonic:
if i8[-1] != iNaT:
return self._box_func(i8[-1])

if not len(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif

# quick check
if len(i8) and self.is_monotonic:
if i8[-1] != iNaT:
return self._box_func(i8[-1])

if not len(self):
return self._na_value

if self.hasnans:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -212,6 +213,10 @@ def _get_values(values, skipna, fill_value=None, fill_value_typ=None,
mask = isna(values)

dtype = values.dtype
if is_datetime64tz_dtype(orig_values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you do this?

pandas/tests/reductions/test_reductions.py Show resolved Hide resolved
@@ -558,6 +599,8 @@ def test_minmax_nat_series(self, nat_ser):
# GH#23282
assert nat_ser.min() is pd.NaT
assert nat_ser.max() is pd.NaT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can parameterize over skipna in a lot of the code you added to tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just implemented test_reductions, still need to move over the reduction tests from tests.indexes and tests.frames. After they're all in one place is when to really focus on parametrization.

@@ -212,6 +213,10 @@ def _get_values(values, skipna, fill_value=None, fill_value_typ=None,
mask = isna(values)

dtype = values.dtype
if is_datetime64tz_dtype(orig_values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes move it, as I said above you have dtype inference in 2 places for no reason.

@@ -3493,6 +3494,9 @@ def _reduce(self, op, name, axis=0, skipna=True, numeric_only=None,
# dispatch to ExtensionArray interface
if isinstance(delegate, ExtensionArray):
return delegate._reduce(name, skipna=skipna, **kwds)
elif is_datetime64_dtype(delegate):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it isn't clear to me that it would make a difference

pandas/tests/reductions/test_reductions.py Show resolved Hide resolved
@jbrockmendel
Copy link
Member Author

as I said above you have dtype inference in 2 places for no reason.

Just pushed, collected all of that at the top of _get_values.

Parameters
----------
axis : {None}
Dummy argument for consistency with Series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we use this terminology here

axis : {0)
    Axis for the function to be applied on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @jbrockmendel is listing the set of possible values, which we do use. In this case, that set is just the single value None, so it looks strange.

I think

axis : int, optional
    For compatibility with NumPy. Only 0 or None are allowed.

is clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger verbiage would be good

axis : {None}
Dummy argument for consistency with Series
skipna : bool, default True

See Also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be consisten about the See Also, e.g. make sure in all, and add a refernce to the Series.min function as well (as appropriate)


i8 = self.asi8
try:
# quick check
if len(i8) and self.is_monotonic:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these quick checks make sense for Index as they are immutable, but may not make much sense here (but i guess can evaluate later)

pandas/core/nanops.py Outdated Show resolved Hide resolved
@jreback
Copy link
Contributor

jreback commented Dec 28, 2018

code looks good. if you can update the doc-strings then would be good.

@jbrockmendel
Copy link
Member Author

I think docstring comments have been addressed; bears double-checking

@jreback jreback added this to the 0.24.0 milestone Dec 29, 2018
@jreback jreback merged commit fe696e4 into pandas-dev:master Dec 29, 2018
@jreback
Copy link
Contributor

jreback commented Dec 29, 2018

thanks @jbrockmendel

I think we need a whatsnew for this #24265, if you can add in next PR

@jreback
Copy link
Contributor

jreback commented Dec 29, 2018

@datapythonista @jorisvandenbossche @TomAugspurger its worth investigating if we can somehow share doc-strings with EA, Index, Series for some operations by templating.

@jbrockmendel jbrockmendel deleted the reduction3 branch December 29, 2018 15:31
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series.max incorrect for skipna=False with missing datetime64[ns] data.
5 participants