BUG: datetime64 series reduces to nan when empty instead of nat #11245

Merged
merged 1 commit into from Oct 11, 2015

Conversation

Projects
None yet
2 participants
Contributor

llllllllll commented Oct 5, 2015

I ran into some strange behavior with a series of dtype datetime64[ns] where I called max and got back a nan. I think the correct behavior here is to return nat. I looked through test_nanops but I am not sure where the correct place to put the test for this is.

The new behavior is:

In [1]: pd.Series(dtype='datetime64[ns]').max()
Out[1]: NaT

where the old behavior was:

In [1]: pd.Series(dtype='datetime64[ns]').max()
Out[1]: nan

llllllllll referenced this pull request in blaze/odo Oct 5, 2015

Merged

ENH: Adds nan->pd.NaT edge #331

Contributor

jreback commented Oct 5, 2015

can I add a test in test_nanops and/or test_series

Contributor

jreback commented Oct 5, 2015

also want to test for timedelta64/datetime64[ns, tz] too

Contributor

llllllllll commented Oct 5, 2015

Sure, I will update this tonight when I get home

jreback added this to the 0.17.1 milestone Oct 5, 2015

Contributor

jreback commented Oct 5, 2015

pls also add a note in the 0.17.1 whatsnew bug section

Contributor

llllllllll commented Oct 7, 2015

ping @jreback

@jreback jreback commented on the diff Oct 7, 2015

pandas/core/nanops.py
@@ -637,7 +619,7 @@ def _maybe_null_out(result, axis, mask):
else:
result = result.astype('f8')
result[null_mask] = np.nan
- else:
+ elif result is not tslib.NaT:
@jreback

jreback Oct 7, 2015

Contributor

if we started with M8/m8 and then do a .view('i8') this needs to be compared to pd.lib.iNaT, not sure why you are not hitting this here

@llllllllll

llllllllll Oct 7, 2015

Contributor

so this should be elif not (result is tslib.NaT or result is tslib.iNaT)?

@jreback

jreback Oct 7, 2015

Contributor

== tslib.iNaT

but puzzled why a NaT is there
as I don't think it's wrapped yet

@llllllllll

llllllllll Oct 7, 2015

Contributor

If you take a ts and take a view as an int and then reduce, I think you still want nan. The only reason that you would want a NaT is that the dtype of the sequence being reduced is datetime64.

@llllllllll

llllllllll Oct 7, 2015

Contributor

Oh, I see what you are saying, you were suggesting:

elif result.view('i8') == tslib.iNaT

It looks like result is still just a timestamp at this point so it will be the NaT object. I don't know if this is a guarantee or not.

@jreback

jreback Oct 7, 2015

Contributor

no I mean result should already be an int if it's M8 as wrapping is the last step

@jreback

jreback Oct 7, 2015

Contributor

ok then

@jreback jreback commented on an outdated diff Oct 7, 2015

pandas/tests/test_dtypes.py
@@ -148,6 +149,14 @@ def test_dst(self):
self.assertTrue(is_datetimetz(s2))
self.assertEqual(s1.dtype, s2.dtype)
+ def test_parser(self):
+ for tz, constructor in product(('UTC', 'US/Eastern'),
+ ('M8', 'datetime64')):
@jreback

jreback Oct 7, 2015

Contributor

gr8 thanks!

@jreback

jreback Oct 10, 2015

Contributor

add the pr number as a comment

@jreback jreback commented on an outdated diff Oct 7, 2015

pandas/tests/test_series.py
@@ -7960,6 +7960,11 @@ def test_datetime_timedelta_quantiles(self):
self.assertTrue(pd.isnull(Series([],dtype='M8[ns]').quantile(.5)))
self.assertTrue(pd.isnull(Series([],dtype='m8[ns]').quantile(.5)))
+ def test_empty_timeseries_redections_return_nat(self):
+ for dtype in ('m8[ns]', 'm8[ns]', 'M8[ns]', 'M8[ns, UTC]'):
@jreback

jreback Oct 7, 2015

Contributor

add the issue number as a comment here

Contributor

jreback commented Oct 7, 2015

question for you. otherwise lgtm, pls squash to a single commit (I know different from other projects, just convention here)

@jreback jreback commented on an outdated diff Oct 7, 2015

pandas/core/nanops.py
- result = ensure_float(values.sum(axis, dtype=dtype_max))
- result.fill(np.nan)
- except:
- result = np.nan
+def _nanminmax(meth, fill_value_typ):
+ builtin = getattr(builtins, meth)
+
+ @bottleneck_switch()
+ def reduction(values, axis=None, skipna=True):
+ values, mask, dtype, dtype_max = _get_values(
+ values,
+ skipna,
+ fill_value_typ=fill_value_typ,
+ )
+
+ # numpy 1.6.1 workaround in Python 3.x
@jreback

jreback Oct 7, 2015

Contributor

hmm, you might be able to take this workaround out entirely, as we don't support 1.6 any longer (you can try and if travis passes, then ok!)

@llllllllll llllllllll commented on the diff Oct 7, 2015

pandas/core/nanops.py
- result = _wrap_results(result, dtype)
- return _maybe_null_out(result, axis, mask)
+ result = _wrap_results(result, dtype)
+ return _maybe_null_out(result, axis, mask)
@llllllllll

llllllllll Oct 7, 2015

Contributor

We have already wrapped the types by the time we call maybe_null_out. The result will already be coerced so I think the is check is safe.

@jreback

jreback Oct 7, 2015

Contributor

ok, then your check is good. thxs

Contributor

llllllllll commented Oct 7, 2015

Also I removed the workaround branch so hopefully the tests pass, otherwise we can put that branch back.

Contributor

jreback commented Oct 7, 2015

ok looks good. I'll be merging things on friday after releasing 0.17.0

Contributor

llllllllll commented Oct 7, 2015

Thank you very much

Contributor

jreback commented Oct 7, 2015

@llllllllll no thank you!

Contributor

llllllllll commented Oct 10, 2015

@jreback just rebased against master, good to merge?

@jreback jreback and 1 other commented on an outdated diff Oct 10, 2015

doc/source/whatsnew/v0.17.1.txt
@@ -45,32 +48,10 @@ Bug Fixes
- Bug in ``.to_latex()`` output broken when the index has a name (:issue: `10660`)
- Bug in ``HDFStore.append`` with strings whose encoded length exceded the max unencoded length (:issue:`11234`)
-
-
-
@jreback

jreback Oct 10, 2015

Contributor

all of this space is here on purpose
pls revert

@llllllllll

llllllllll Oct 10, 2015

Contributor

Okay, sorry. What is the space for though?

@llllllllll

llllllllll Oct 10, 2015

Contributor

reverted

Contributor

jreback commented Oct 10, 2015

we have lots of prs that are worked on at the same time
so when someone adds a note in whatsnew if they are all at the end
merging one requires everyone to rebase
but if the notes are inserted in a big list they are resolved by git
so we can merge many things w/o conflicts

Contributor

llllllllll commented Oct 10, 2015

Ah, that's a cool trick

@jreback jreback commented on an outdated diff Oct 11, 2015

doc/source/whatsnew/v0.17.1.txt
@@ -74,3 +74,7 @@ Bug Fixes
- Bugs in ``to_excel`` with duplicate columns (:issue:`11007`, :issue:`10982`, :issue:`10970`)
+- min and max reductions on ``datetime64`` and ``timedelta64`` dtyped series now
+ result in ``NaT`` and not ``nan`` (:issue:`11245`).
@jreback

jreback Oct 11, 2015

Contributor

I would move this top comment to API change section (its not a big deal, but in theory get's highlited to a user in a slightly more useful way).
(leave the empty series of dtype here)

Contributor

jreback commented Oct 11, 2015

minor comment, ping when green

@llllllllll llllllllll BUG: datetime64 series reduces to nan when empty instead of nat
Fixes the parser for datetimetz to also allow the `M8[ns, tz]` alias.
40c8fcf
Contributor

llllllllll commented Oct 11, 2015

tests are passing

@jreback jreback added a commit that referenced this pull request Oct 11, 2015

@jreback jreback Merge pull request #11245 from llllllllll/dt64-reduce-to-nat
BUG: datetime64 series reduces to nan when empty instead of nat
a4843cb

@jreback jreback merged commit a4843cb into pandas-dev:master Oct 11, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Contributor

jreback commented Oct 11, 2015

thank you sir!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment