Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ValueError raised by cummin/cummax when datetime64 Series contains NaT. #8966

Merged
merged 1 commit into from
Dec 3, 2014

Conversation

unutbu
Copy link
Contributor

@unutbu unutbu commented Dec 2, 2014

closes #8965

@@ -4109,13 +4109,16 @@ def func(self, axis=None, dtype=None, out=None, skipna=True,
axis = self._get_axis_number(axis)

y = _values_from_object(self).copy()
if not issubclass(y.dtype.type, (np.integer, np.bool_)):

if skipna and com.is_datetime64_dtype(y):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can handle timedelta64[ns] here as well; just put pd.tslib.iNaT directly as the value (it will work! for both datetime64[ns] and timedelta64[ns])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I think this is a bit more complicated, e.g. for cummax this will work, but not cummin

see nanops/nanmin for what you can do (e.g. replace the values with the largest intmax value before cumming, then replace the result at the end)

@jreback jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timedelta Timedelta data type Timeseries labels Dec 2, 2014
@jreback jreback added this to the 0.16.0 milestone Dec 2, 2014
@unutbu unutbu force-pushed the cummin-NaT branch 2 times, most recently from 71cc1b7 to 1394b3a Compare December 2, 2014 15:12
@unutbu
Copy link
Contributor Author

unutbu commented Dec 2, 2014

@jreback: Sorry, I don't understand what part of nanops I should be using, or what makes cummax more complicated. Could you give me some more guidance?

@jreback
Copy link
Contributor

jreback commented Dec 2, 2014

cummin with skipna=False will fail in your current impl because the NaT will always be the min (as its the smallest value)

expected = pd.Series(pd.to_datetime(
['NaT', '2000-1-2', '2000-1-2', '2000-1-1', '2000-1-1', '2000-1-1']))
result = s.cummin(skipna=False)
self.assert_series_equal(expected, result)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback: s.cummin(skipna=False) is passing this test. Can you give an example where it fails?

@jreback jreback modified the milestones: 0.15.2, 0.16.0 Dec 2, 2014
else:
result = accum_func(y, axis)

d = self._construct_axes_dict()
d['copy'] = False
s = self._constructor(result, **d).__finalize__(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like an extra line here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback: Yes, thanks for the correction.


if skipna and issubclass(y.dtype.type,
(np.datetime64, np.timedelta64)):
result = accum_func(y, axis)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling accum_func with NaTs present is okay because the NaTs are neither greater than nor less than any other datetime64:

In [6]: pd.NaT < np.datetime64(('2010-01-01'))
Out[6]: False

In [7]: np.datetime64(('2010-01-01')) < pd.NaT 
Out[7]: False

@jreback jreback merged commit 010741f into pandas-dev:master Dec 3, 2014
@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timedelta Timedelta data type Timeseries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series.cummin raises ValueError when datetime64 Series contains NaT
2 participants