Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: implement reductions for DatetimeArray/TimedeltaArray/PeriodArray #23890

Closed
wants to merge 5 commits into from

Conversation

jbrockmendel
Copy link
Member

Some idiosyncrasies in signatures between Series vs Index make the parametrized test cases ugly. Do we want to adapt the Index reductions to have a skipna kwarg like the Series and EA methods?

Still needs tests for timedelta and period dtypes.

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

@gfyoung gfyoung added the Datetime Datetime data dtype label Nov 25, 2018
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your intent to share the same implementation between the arrays and index classes? Having _make_reduction and _get_reduction_vals be top-level functions instead of bound methods is a bit unfortunate, but OK if we get to have one implementation.


def method(self, skipna=True, **kwargs):
if only_timedelta:
raise TypeError('"{meth}" reduction is not valid for {cls}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. only_timedelta makes me think we shouldn't raise here if the array is timedelta dtype, but right now it looks like we raise unconditionally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, that should be if only_timedelta and not is_timedelta64_dtype(self)

if vals is NaT:
return NaT

# Try to minimize floating point error by rounding before casting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does DatetimeIndex do this casting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reductions DatetimeIndex has is min and max, for which it is not relevant.

pandas/core/series.py Show resolved Hide resolved
@@ -460,6 +460,14 @@ def nanmean(values, axis=None, skipna=True, mask=None):
elif is_float_dtype(dtype):
dtype_sum = dtype
dtype_count = dtype
elif is_datetime64_dtype(dtype) or is_datetime64tz_dtype(dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_datetime64_any_dtype I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, but I'm kind of hoping to get rid of that

@@ -460,6 +460,14 @@ def nanmean(values, axis=None, skipna=True, mask=None):
elif is_float_dtype(dtype):
dtype_sum = dtype
dtype_count = dtype
elif is_datetime64_dtype(dtype) or is_datetime64tz_dtype(dtype):
from pandas import DatetimeIndex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be boxed in an index? Shouldn't values be a DatetimeArray right now? Or is it not yet since we haven't implemented DTA as an extension array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this module especially well, but my assumption is that it could be a numpy array at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Ideally nanmean will only be called via Series.mean and our ExtensionArray's .mean methods. Though I may be missing some cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the exception will be Series[datetime64[ns]], which may pass an ndarray here...


from pandas.tseries import frequencies
from pandas.tseries.offsets import DateOffset, Tick

from .base import ExtensionOpsMixin


def _get_reduction_vals(obj, skipna):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you seem to be reinventing the wheel here. we already do all of this in nanops.py for timedelta. I am not sure how this should be integrated here, but this is not the way. (meaning re-write all of the code we already have)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is definitely a Proof Of Concept. For now the main question for you is if you're on board with the idea of bringing the Index reduction signatures in line with everything else

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes about the signatures generally. but maybe let's start with min/max as more straightforward? To do what you are suggestnig here will require using nanops (the interface can certainly be here), but the implementation is already there (in nanops).

@TomAugspurger TomAugspurger mentioned this pull request Dec 4, 2018
12 tasks
@@ -795,7 +795,7 @@ def _ndarray_values(self):
def empty(self):
return not self.size

def max(self):
def max(self, skipna=True, axis=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Series, these are all (self, axis=None, skipna=None, ...).

masked_vals = values
if mask is not None:
masked_vals = values[~mask]
the_mean = DatetimeIndex(masked_vals).mean(skipna=skipna)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls try to follow the patterns in this module
_wrap_results exists for a reason

again i see lots of reinventing the wheel

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in this PR has been updated since previous conversation. I’ll get around to this suggestion before too long.

@jbrockmendel
Copy link
Member Author

Closing in favor of #24757.

@jbrockmendel jbrockmendel deleted the reduction2 branch April 5, 2020 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants