Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG(?): Var of Timedelta with empty / NA #18880

Closed
TomAugspurger opened this issue Dec 20, 2017 · 11 comments · Fixed by #28289
Closed

BUG(?): Var of Timedelta with empty / NA #18880

TomAugspurger opened this issue Dec 20, 2017 · 11 comments · Fixed by #28289
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Timedelta Timedelta data type
Milestone

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 20, 2017

edit: it seems like the bug is that var returns numeric values for the operation, instead of timedeltas like .std (#18880 (comment))


Working on the empty / all-NA stuff. This is related, but separate.

Currently for an all-NA series of timedelta's, we return NaT for .var() on an empty series (see Out[44]). I think this should be NaN since the reduction is returning numeric values, not timedeltas.

In [40]: s = pd.Series(pd.timedelta_range(0, periods=12, freq='S'))

In [41]: s[0] = np.nan

In [42]: s.var()
Out[42]: 1.1e+19

In [43]: s.var(skipna=False)
Out[43]: nan

In [44]: s[:0].var()
Out[44]: NaT

I've just noticed some other buggy behavior with aggregations on timedeltas, so maybe this can become a meta-issue about timedeltas and numeric aggregations:

In [57]: s.sum(skipna=False)
Out[57]: Timedelta('-106752 days +00:13:49.145223')

I'm guessing we improperly add the timedelta min here.

@TomAugspurger TomAugspurger added this to the Next Major Release milestone Dec 20, 2017
@TomAugspurger TomAugspurger added Difficulty Intermediate Numeric Operations Arithmetic, Comparison, and Logical operations Timedelta Timedelta data type labels Dec 20, 2017
@TomAugspurger
Copy link
Contributor Author

Ahh interesting, .std returns the units in timedelta's so maybe that's the bug that .var isn't handling dtypes correctly:

In [11]: s.std()
Out[11]: Timedelta('0 days 00:00:03.316624')

In [12]: s[:0].std()
Out[12]: NaT

@jreback
Copy link
Contributor

jreback commented Dec 20, 2017

what would this be numeric? any reduction should stay the same dtype (except int-> float).

@jreback
Copy link
Contributor

jreback commented Dec 20, 2017

yeah it prob doesn't does handle the ufunc propertly (sqrt), e.g.

This works, but prob not what its doing.

In [7]: pd.Timedelta(np.sqrt(s.std().value))
Out[7]: Timedelta('0 days 00:00:00.000057')

@TomAugspurger
Copy link
Contributor Author

Yeah, I was just saying numeric because the non-NA versions were numeric, but that's the real bug. I'll edit the original bug report.

@JesperDramsch
Copy link
Contributor

Should this be closed? @TomAugspurger

@TomAugspurger
Copy link
Contributor Author

I don't think so, do you? Series[timedelta].var() is still numeric for me, rather than a Timedelta.

@jbrockmendel
Copy link
Member

Shouldnt Series[timedelta].var() raise? std() makes sense.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jun 7, 2019

Shouldnt Series[timedelta].var() raise? std() makes sense.

Can you clarify this? The meaning of std & var for timedeltas is a bit hazy to me, but why would one make sense and not the other?

@jbrockmendel
Copy link
Member

Can you clarify this? The meaning of std & var for timedeltas is a bit hazy to me, but why would one make sense and not the other?

The issue is the units. Var(td64ns) would have to have unit ns^2, which we don't have a way to represent.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jun 7, 2019 via email

@jbrockmendel
Copy link
Member

i8values = np.array([1e9, 2e9, 3e9], dtype='i8')
tdi = pd.to_timedelta(i8values)
ser = pd.Series(tdi)

>>> ser.std()
Timedelta('0 days 00:00:01')

I think we're on the same page that ser.std() here makes sense. If we could do ser.var(), it would have to obey the identity ser.var() == (ser.std())**2. But the right hand side we can't do.

Does this help? Any way you do it, you need to get a ns^2 in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants