Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timedelta64[D] is always coerced to timedelta64[ns] #1143

Closed
hottwaj opened this issue Nov 29, 2016 · 5 comments
Closed

timedelta64[D] is always coerced to timedelta64[ns] #1143

hottwaj opened this issue Nov 29, 2016 · 5 comments

Comments

@hottwaj
Copy link

hottwaj commented Nov 29, 2016

Hi guys, the following snippets show the issue...

xarray.DataArray([1,2,3,4]).astype('timedelta64[D]')

#output is
"""
<xarray.DataArray (dim_0: 4)>
array([ 86400000000000, 172800000000000, 259200000000000, 345600000000000], dtype='timedelta64[ns]')
Coordinates:
  * dim_0    (dim_0) int64 0 1 2 3
"""

Compare this with Pandas:

pandas.Series([1,2,3,4]).astype('timedelta64[D]')

#output is
"""
0   1 days
1   2 days
2   3 days
3   4 days
dtype: timedelta64[D]
"""

This behvaiour becomes more problematic when trying to convert from timedelta[ns] to e.g. days as ints:

xarray.DataArray(pandas.Series([1,2,3,4]).astype('timedelta64[D]')).astype(int)

#output is
"""
<xarray.DataArray (dim_0: 4)>
array([ 86400000000000, 172800000000000, 259200000000000, 345600000000000])
Coordinates:
  * dim_0    (dim_0) int64 0 1 2 3
"""

Again contrast that with pandas:

pandas.Series([1,2,3,4]).astype('timedelta64[D]').astype(int)

#output is 
"""
0    1
1    2
2    3
3    4
dtype: int64
"""

Other variations of timedelta e.g. timedelta64[s], timedelta64[W] etc suffer from the same problem.

Thanks

@hottwaj
Copy link
Author

hottwaj commented Nov 29, 2016

The conversion to timedelta64[ns] is done on this line of code:

data = np.asarray(data, 'timedelta64[ns]')

Is there a reason behind the conversion, or could it be removed?

@shoyer
Copy link
Member

shoyer commented Nov 29, 2016

Interesting. Pandas always uses nanosecond precision for TimedeltaIndex but not Series:

In [13]: s = pandas.Series([1,2,3,4]).astype('timedelta64[D]')

In [14]: s
Out[14]:
0   1 days
1   2 days
2   3 days
3   4 days
dtype: timedelta64[D]

In [16]: pandas.Index(s)
Out[16]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)

This might actually be a pandas bug -- as far as I recall, this goes against the documented behavior.

@hottwaj
Copy link
Author

hottwaj commented Dec 1, 2016

The pandas docs do seem to say that conversion to timedelta64[D] (or other frequencies) is possible - see: http://pandas.pydata.org/pandas-docs/stable/timedeltas.html#frequency-conversion

Also here's a more realistic example of why this is problematic for me - I have a sequence of dates and I want to calculate the difference between them in days: possible in pandas, but not possible in xarray without first reverting to pandas/numpy types

dates = pandas.Series([datetime.date(2016, 01, 10), datetime.date(2016, 01, 20), datetime.date(2016, 01, 25)]).astype('datetime64[ns]')

dates.diff().astype('timedelta64[D]').astype(float)
#returns 
#0     NaN
#1    10.0
#2     5.0
#dtype: float6

xarray.DataArray(dates).diff(dim = 'dim_0').astype('timedelta64[D]').astype(float)
#returns
#<xarray.DataArray (dim_0: 2)>
#array([  8.64000000e+14,   4.32000000e+14])
#Coordinates:
#  * dim_0    (dim_0) int64 1 2

Again the xarray result is in ns rather than days.

Thanks

@DavidTsangHW
Copy link

Pardon me for extending this discussion.

I encountered the same problem when calculating timedelta in a dataframe. It even ended with an error when I tried to call the days attribute. I am using Numpy 1.6.1

AttributeError: 'Series' object has no attribute 'days'

Problem

df_trans['DELTA'] = df_trans['DATE2'] - df_trans['DATE1']

print df_trans['DELTA'].dtype

timedelta64[ns]

print df_trans['DELTA']

0 8 days, 00:00:00
1 15 days, 00:00:00
2 5 days, 00:00:00

df_trans['DELTA'] = df_trans['DELTA'].astype('timedelta64[D]')
print df_trans['DELTA'].dtype

Name: DELTA, dtype: timedelta64[D]

print df_trans['DELTA']

0 8 days, 00:00:00
1 15 days, 00:00:00
2 5 days, 00:00:00
Nothing changed at all

print df_trans['DELTA'].days

AttributeError: 'Series' object has no attribute 'days'

I get rid of the problem by putting it in to a list for the conversion.

            Ss_timedelta = df_trans['DATE2'] - df_trans['DATE1']
            ls_timedelta = Ss_timedelta.values.astype('timedelta64[D]').tolist()
            for i in range(0, len(ls_timedelta)):
                    ls_timedelta[i] = ls_timedelta[i].days / 1000                        
            df_trans['HOLDDAYS'] = pd.Series(ls_timedelta)

@shoyer
Copy link
Member

shoyer commented Aug 30, 2018

df_trans['DELTA'].dt.days should work, in both pandas in xarray.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants