Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: datetime64 series reduces to nan when empty instead of nat #11245

Merged
merged 1 commit into from Oct 11, 2015

Conversation

llllllllll
Copy link
Contributor

I ran into some strange behavior with a series of dtype datetime64[ns] where I called max and got back a nan. I think the correct behavior here is to return nat. I looked through test_nanops but I am not sure where the correct place to put the test for this is.

The new behavior is:

In [1]: pd.Series(dtype='datetime64[ns]').max()
Out[1]: NaT

where the old behavior was:

In [1]: pd.Series(dtype='datetime64[ns]').max()
Out[1]: nan

@jreback
Copy link
Contributor

jreback commented Oct 5, 2015

can I add a test in test_nanops and/or test_series

@jreback
Copy link
Contributor

jreback commented Oct 5, 2015

also want to test for timedelta64/datetime64[ns, tz] too

@llllllllll
Copy link
Contributor Author

Sure, I will update this tonight when I get home

@jreback jreback added Bug Timeseries Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Compat pandas objects compatability with Numpy or Python functions labels Oct 5, 2015
@jreback jreback added this to the 0.17.1 milestone Oct 5, 2015
@jreback
Copy link
Contributor

jreback commented Oct 5, 2015

pls also add a note in the 0.17.1 whatsnew bug section

@llllllllll
Copy link
Contributor Author

ping @jreback

@@ -637,7 +619,7 @@ def _maybe_null_out(result, axis, mask):
else:
result = result.astype('f8')
result[null_mask] = np.nan
else:
elif result is not tslib.NaT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we started with M8/m8 and then do a .view('i8') this needs to be compared to pd.lib.iNaT, not sure why you are not hitting this here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this should be elif not (result is tslib.NaT or result is tslib.iNaT)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== tslib.iNaT

but puzzled why a NaT is there
as I don't think it's wrapped yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take a ts and take a view as an int and then reduce, I think you still want nan. The only reason that you would want a NaT is that the dtype of the sequence being reduced is datetime64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you are saying, you were suggesting:

elif result.view('i8') == tslib.iNaT

It looks like result is still just a timestamp at this point so it will be the NaT object. I don't know if this is a guarantee or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I mean result should already be an int if it's M8 as wrapping is the last step

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok then

@jreback
Copy link
Contributor

jreback commented Oct 7, 2015

question for you. otherwise lgtm, pls squash to a single commit (I know different from other projects, just convention here)

fill_value_typ=fill_value_typ,
)

# numpy 1.6.1 workaround in Python 3.x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, you might be able to take this workaround out entirely, as we don't support 1.6 any longer (you can try and if travis passes, then ok!)

result = _wrap_results(result, dtype)
return _maybe_null_out(result, axis, mask)
result = _wrap_results(result, dtype)
return _maybe_null_out(result, axis, mask)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have already wrapped the types by the time we call maybe_null_out. The result will already be coerced so I think the is check is safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then your check is good. thxs

@llllllllll
Copy link
Contributor Author

Also I removed the workaround branch so hopefully the tests pass, otherwise we can put that branch back.

@jreback
Copy link
Contributor

jreback commented Oct 7, 2015

ok looks good. I'll be merging things on friday after releasing 0.17.0

@llllllllll
Copy link
Contributor Author

Thank you very much

@jreback
Copy link
Contributor

jreback commented Oct 7, 2015

@llllllllll no thank you!

@llllllllll
Copy link
Contributor Author

@jreback just rebased against master, good to merge?

@@ -45,32 +48,10 @@ Bug Fixes

- Bug in ``.to_latex()`` output broken when the index has a name (:issue: `10660`)
- Bug in ``HDFStore.append`` with strings whose encoded length exceded the max unencoded length (:issue:`11234`)



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of this space is here on purpose
pls revert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sorry. What is the space for though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

@jreback
Copy link
Contributor

jreback commented Oct 10, 2015

we have lots of prs that are worked on at the same time
so when someone adds a note in whatsnew if they are all at the end
merging one requires everyone to rebase
but if the notes are inserted in a big list they are resolved by git
so we can merge many things w/o conflicts

@llllllllll
Copy link
Contributor Author

Ah, that's a cool trick

@@ -74,3 +74,7 @@ Bug Fixes


- Bugs in ``to_excel`` with duplicate columns (:issue:`11007`, :issue:`10982`, :issue:`10970`)
- min and max reductions on ``datetime64`` and ``timedelta64`` dtyped series now
result in ``NaT`` and not ``nan`` (:issue:`11245`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this top comment to API change section (its not a big deal, but in theory get's highlited to a user in a slightly more useful way).
(leave the empty series of dtype here)

@jreback
Copy link
Contributor

jreback commented Oct 11, 2015

minor comment, ping when green

Fixes the parser for datetimetz to also allow the `M8[ns, tz]` alias.
@llllllllll
Copy link
Contributor Author

tests are passing

jreback added a commit that referenced this pull request Oct 11, 2015
BUG: datetime64 series reduces to nan when empty instead of nat
@jreback jreback merged commit a4843cb into pandas-dev:master Oct 11, 2015
@jreback
Copy link
Contributor

jreback commented Oct 11, 2015

thank you sir!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Timeseries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants