Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: TimeDeltaIndex and corresponding scalar #3009

Closed
6 tasks done
jreback opened this issue Mar 11, 2013 · 16 comments · Fixed by #8184
Closed
6 tasks done

ENH: TimeDeltaIndex and corresponding scalar #3009

jreback opened this issue Mar 11, 2013 · 16 comments · Fixed by #8184
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Refactor Internal refactoring of code Timedelta Timedelta data type
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Mar 11, 2013

@cpcloud
Copy link
Member

cpcloud commented Apr 17, 2013

This would be totally awesome, especially for any kind of experimental data.

@jreback
Copy link
Contributor Author

jreback commented Apr 17, 2013

there is already quite a lot of support, see the time delta section (in time series part of docs)
this is just an extension to make it easier to do some things
eg the index is works but its am Int64Index, so a bit non-intuitive

@cpcloud
Copy link
Member

cpcloud commented Jul 27, 2013

ha! didn't realize i had seen this already

@hughesadam87
Copy link

For reference (as #7640) was not the appropriate place to share, here's a hacked implementation of datetime --> timeinterval.

http://nbviewer.ipython.org/github/hugadams/pyuvvis/blob/master/examples/Notebooks/intervals.ipynb

I'd imagine the TimeDeltaIndex should supplant the need for such an API entirely.

@jreback
Copy link
Contributor Author

jreback commented Sep 5, 2014

Almost have this working.

In [1]: pd.TimedeltaIndex(range(5),unit='s')
Out[1]: 
<class 'pandas.tseries.tdi.TimedeltaIndex'>
[00:00:00, ..., 00:00:04]
Length: 5, Freq: None

In [2]: pd.TimedeltaIndex(range(5),unit='d')
Out[2]: 
<class 'pandas.tseries.tdi.TimedeltaIndex'>
[00:00:00, ..., 4 days, 00:00:00]
Length: 5, Freq: None

I think I need something better for the repr, because unlike other formats, commas CAN be in a single element, so it gets confusing.

In [3]: pd.TimedeltaIndex(range(5),unit='d')+pd.offsets.Hour(1)
Out[3]: 
<class 'pandas.tseries.tdi.TimedeltaIndex'>
[01:00:00, ..., 4 days, 01:00:00]
Length: 5, Freq: None

any thoughts

@jorisvandenbossche @cpcloud @hayd @TomAugspurger
cc @shoyer

@shoyer
Copy link
Member

shoyer commented Sep 5, 2014

@jreback Sweet! Is there a Timedelta scalar or are you just reusing np.timedelta64? Is there some support for missing values like NaT? (Maybe you have a PR or branch which answers these questions?)

@jreback
Copy link
Contributor Author

jreback commented Sep 5, 2014

@shoyer just created a Timedelta scalar (pretty much like a Period, but no freq), just a .value really. I need it mainly to box (otherwise you get silly np.timedelta64 when displaying which is ugly).

In [1]: (pd.TimedeltaIndex(range(5),unit='d')+pd.offsets.Hour(1)).tolist()
Out[1]: 
[Timedelta('01:00:00'),
 Timedelta('1 days, 01:00:00'),
 Timedelta('2 days, 01:00:00'),
 Timedelta('3 days, 01:00:00'),
 Timedelta('4 days, 01:00:00')]

NaT should work as well.
not really any tests yet though :)

https://github.com/jreback/pandas/tree/tdi

@jreback
Copy link
Contributor Author

jreback commented Sep 5, 2014

API issue:

should a Timedelta scalar be more timedelta like or np.timedelta64 like?

mainly this is for the constructor. should it be as close to timedelta as possible so that they are 'effectively' interchangeable (but with Timedelta improving on the oddiietes), much like Timestamp does?

@hayd
Copy link
Contributor

hayd commented Sep 5, 2014

If you can get back from the string to the timedelta, then maybe stringify the repr to remove comma ambiguity:

In [2]: pd.TimedeltaIndex(range(5),unit='d')
Out[2]: 
<class 'pandas.tseries.tdi.TimedeltaIndex'>
["00:00:00", ..., "4 days, 00:00:00"]
Length: 5, Freq: None

+1 This looks great!

@shoyer
Copy link
Member

shoyer commented Sep 5, 2014

I assume that the you will be able to query using any string that can be handled by pd.to_timedelta? e.g., s.loc['1 day']?

RE: timedelta like or np.timedelta64 like: I think Timestamp sets a clear precedent (even though I'm not sure that was the right decision -- there is something to be said for being similar to a numpy scalar, which Timestamp is not) so I would make Timedelta more timedelta like.

@jreback
Copy link
Contributor Author

jreback commented Sep 5, 2014

yep that's pretty easy

a bit trickier (but not a lot) is partial string indexing

eg s.loc['1 day':]

is effectively >=1 day and < 2 days

@cpcloud
Copy link
Member

cpcloud commented Sep 5, 2014

... s.loc['1 day':]
is effectively >=1 day and < 2 days

why wouldn't that be ">= 1 day"?

@jorisvandenbossche
Copy link
Member

For the repr, other option is to use ; to seperate? But maybe quoting is better (; is not used for anything else)

@jorisvandenbossche
Copy link
Member

@jreback How would both look like (what would be the differences) if Timedelta would more be like datetime.timedelta or np.timedelta64?

And would a timedelta64 series element also return this? (like you now also get Timestamps from a datetime64 series)?

@jreback
Copy link
Contributor Author

jreback commented Sep 5, 2014

So I created Timedelta as an extension type almost exactly like Timestamp (it has a c-extension and a shadow python class). This makes it a sub-class of timedelta which is nice (and also has a numpy-like value which makes it fast, though like Timestamp its not actually stored in an Index or Series, rather the underlying data is (which is how it is currently)

In [2]: from pandas import Timedelta

In [3]: Timedelta('10 days, 00:00:10')
Out[3]: Timedelta('10 days, 00:00:10')

In [4]: type(Timedelta('10 days, 00:00:10'))
Out[4]: pandas.tslib.Timedelta

In [5]: Timedelta(days=10,milliseconds=10*1000)
Out[5]: Timedelta('10 days, 00:00:10')

In [6]: Timedelta('nat')
Out[6]: Timedelta('NaT')

In [7]: Timedelta(10,unit='d')
Out[7]: Timedelta('10 days, 00:00:00')

In [8]: isinstance(Timedelta(10,unit='d'),timedelta)
Out[8]: True

In [10]: pd.to_timedelta(range(5),unit='h')
Out[10]: 
<class 'pandas.tseries.td.TimedeltaIndex'>
[00:00:00, ..., 04:00:00]
Length: 5, Freq: None

In [11]: Series(pd.to_timedelta(range(5),unit='h'))
Out[11]: 
0   00:00:00
1   01:00:00
2   02:00:00
3   03:00:00
4   04:00:00
dtype: timedelta64[ns]

In [12]: list(Series(pd.to_timedelta(range(5),unit='h')))
Out[12]: 
[Timedelta('0 days, 00:00:00'),
 Timedelta('0 days, 01:00:00'),
 Timedelta('0 days, 02:00:00'),
 Timedelta('0 days, 03:00:00'),
 Timedelta('0 days, 04:00:00')]

only API change in this entire thing really is that to_timedelta will now return a TimedeltaIndex rather than a Series by default (which makes it consisten with to_datetime as well).

Still need more tests / integration.

@jreback
Copy link
Contributor Author

jreback commented Sep 5, 2014

PR is now #8184, further comments can go there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Refactor Internal refactoring of code Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants