Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: nanops on empty Series have wrong type #7869

Closed
ischwabacher opened this issue Jul 29, 2014 · 12 comments · Fixed by #8184
Closed

BUG: nanops on empty Series have wrong type #7869

ischwabacher opened this issue Jul 29, 2014 · 12 comments · Fixed by #8184
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Timedelta Timedelta data type
Milestone

Comments

@ischwabacher
Copy link
Contributor

In [1]: import pandas as pd

In [2]: pd.Series(dtype='m8[ns]').sum()
Out[2]: 0

In [3]: pd.Series(dtype='m8[ns]').mean()
Out[3]: 
0   NaT
dtype: timedelta64[ns]

In [4]: pd.Series(dtype='m8[ns]').median()
Out[4]: nan

In [5]: pd.Series(dtype=float).sum()
Out[5]: 0

In [6]: pd.Series([0], dtype=float).sum()
Out[6]: 0.0

It looks like _wrap_results is doing double duty, both converting an output to the right time dtype, and wrapping that result in a series (I don't know why it does this), and in some cases results aren't getting wrapped. But the wrapper ignores other dtypes, which doesn't make sense to me; aside from the unfortunate fact that numpy provides no _NA_integer equivalent, I as a user expect the dtype of a result to depend only on the dtype of the inputs, not on their shapes or values, so it would make sense to have a more general wrapper somewhere.

I'm not sure how I'd implement such a wrapper, because ideally it would be able to handle situations like that the standard deviation of a Series of datetime64s is a timedelta64 or such. Maybe that's a bit ambitious, though, since it's inching towards a general notion of symbolic units; in such a case the user can unwrap, operate and wrap manually.

It looks like there are several points in pandas/core/nanops.py where a result is returned without wrapping, and I could submit a narrow PR for just this issue but I want to wait until I have time to really go over nanops.py and get all of these.

@jreback
Copy link
Contributor

jreback commented Jul 29, 2014

This is just an edge case in creation of the empty array.

as you can see these work correctly (just not on an empty array).

In [12]: pd.to_timedelta(np.arange(5),unit='s')
Out[12]: 
0   00:00:00
1   00:00:01
2   00:00:02
3   00:00:03
4   00:00:04
dtype: timedelta64[ns]

In [13]: pd.to_timedelta(np.arange(5),unit='s').sum()
Out[13]: 
0   00:00:10
dtype: timedelta64[ns]

In [14]: pd.to_timedelta(np.arange(5),unit='s').mean()
Out[14]: 
0   00:00:02
dtype: timedelta64[ns]

In [15]: pd.to_timedelta(np.arange(5),unit='s').median()
Out[15]: 
0   00:00:02
dtype: timedelta64[ns]

@jreback jreback added this to the 0.15.0 milestone Jul 29, 2014
@jreback
Copy link
Contributor

jreback commented Jul 29, 2014

tests for these types of things are in tseries/tests/tests_timeseries and/or tseries/test.tests_timedeltas

@ischwabacher
Copy link
Contributor Author

I still don't get why these tests don't belong in pandas/pandas/tests/test_nanops.py.

@jreback
Copy link
Contributor

jreback commented Jul 29, 2014

nanops only deals with very low level things. eg.. it knows nothing about wrapping the return results.

@ischwabacher
Copy link
Contributor Author

The thing that's confusing me is that _wrap_results is in nanops.py, and when I run this operation through pdb, I don't see any wrapping code other than _wrap_results. It's always possible that I ned past a decorator that did the wrapping, but Substitution doesn't seem to have the right signature to do the job, and I think that and functools.wraps are the only decorators I didn't check.

@ischwabacher ischwabacher changed the title BUG: sum and median nanops on empty timedelta Series have wrong type BUG: nanops on empty Series have wrong type Jul 29, 2014
@ischwabacher
Copy link
Contributor Author

Edited title and code sample to more accurately reflect the issue.

@ischwabacher
Copy link
Contributor Author

I was going back over this and noticed that _wrap_results contains this comment explaining why scalar timedelta64 results of reduction operations get wrapped in length-1 Series:

this is a scalar timedelta result!

we have series convert then take the element (scalar)

as series will do the right thing in py3 (and deal with numpy

1.6.2 bug in that it results dtype of timedelta64[us]

I don't know what the py3 issue is, but since we're dropping numpy <1.7 compatibility, can we get rid of this behavior? Having to strip the wrapper from a timedelta64 result but not other results causes a lot of friction and encourages using the "wrong" representation just to avoid having separate code paths.

@jreback
Copy link
Contributor

jreback commented Sep 12, 2014

this is all fixed in #8184 merging shortly

@jreback
Copy link
Contributor

jreback commented Sep 12, 2014

@ischwabacher sorry, what I mean is that that comment is removed (and a Timedelta) is returned. I think this is still a valid (and separate issue). But will need to be addressed after I merge.

@ischwabacher
Copy link
Contributor Author

I was not following that patch. I have a lot of reading to do on my bus ride home tonight.

@jreback
Copy link
Contributor

jreback commented Sep 12, 2014

haha, ok. Should be straightforward after that I think. In essecent, np.timedelta64 are not longer going to be returned anywhere, instead Timedelta, which is analagous to Timestamp in its construction.

@jreback
Copy link
Contributor

jreback commented Sep 12, 2014

covering this in #8184. pretty easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants