Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: timedelta should write correctly to csv #4378

Closed
jreback opened this issue Jul 27, 2013 · 6 comments
Closed

ENH: timedelta should write correctly to csv #4378

jreback opened this issue Jul 27, 2013 · 6 comments
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string Timedelta Timedelta data type
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jul 27, 2013

Though read_csv can't read these in yet....
see this: https://github.com/playpauseandstop/kikola/blob/master/kikola/utils/timedelta.py

In [58]: data
Out[58]: '  completed             deadline\n15-07-2013 23:10    15-07-2013 23:15\n16-07-2013 00:20    16-07-2013 00:15\n16-07-2013 00:20    16-07-2013 00:15\n16-07-2013 21:04    16-07-2013 21:30\n16-07-2013 21:58    16-07-2013 22:00\n16-07-2013 23:21    16-07-2013 23:15\n16-07-2013 23:21    16-07-2013 23:15\n17-07-2013 00:19    17-07-2013 00:15\n17-07-2013 00:19    17-07-2013 00:15\n17-07-2013 21:18    17-07-2013 21:30\n17-07-2013 22:07    17-07-2013 22:00'

In [59]: df = read_csv(StringIO(data),index_col=None,skiprows=1,header=None,parse_dates=[[0,1],[2,3]],sep='\s+')

In [60]: df.columns = ['completed','deadline']

In [61]: df['delta'] = df['completed']-df['deadline']

In [62]: df
Out[62]: 
             completed            deadline     delta
0  2013-07-15 23:10:00 2013-07-15 23:15:00 -00:05:00
1  2013-07-16 00:20:00 2013-07-16 00:15:00  00:05:00
2  2013-07-16 00:20:00 2013-07-16 00:15:00  00:05:00
3  2013-07-16 21:04:00 2013-07-16 21:30:00 -00:26:00
4  2013-07-16 21:58:00 2013-07-16 22:00:00 -00:02:00
5  2013-07-16 23:21:00 2013-07-16 23:15:00  00:06:00
6  2013-07-16 23:21:00 2013-07-16 23:15:00  00:06:00
7  2013-07-17 00:19:00 2013-07-17 00:15:00  00:04:00
8  2013-07-17 00:19:00 2013-07-17 00:15:00  00:04:00
9  2013-07-17 21:18:00 2013-07-17 21:30:00 -00:12:00
10 2013-07-17 22:07:00 2013-07-17 22:00:00  00:07:00

In [63]: df.dtypes
Out[63]: 
completed     datetime64[ns]
deadline      datetime64[ns]
delta        timedelta64[ns]
dtype: object

In [64]: df.to_csv('test.csv')

In [65]: !cat test.csv
,completed,deadline,delta
0,2013-07-15 23:10:00,2013-07-15 23:15:00,-300000000000
1,2013-07-16 00:20:00,2013-07-16 00:15:00,300000000000
2,2013-07-16 00:20:00,2013-07-16 00:15:00,300000000000
3,2013-07-16 21:04:00,2013-07-16 21:30:00,-1560000000000
4,2013-07-16 21:58:00,2013-07-16 22:00:00,-120000000000
5,2013-07-16 23:21:00,2013-07-16 23:15:00,360000000000
6,2013-07-16 23:21:00,2013-07-16 23:15:00,360000000000
7,2013-07-17 00:19:00,2013-07-17 00:15:00,240000000000
8,2013-07-17 00:19:00,2013-07-17 00:15:00,240000000000
9,2013-07-17 21:18:00,2013-07-17 21:30:00,-720000000000
10,2013-07-17 22:07:00,2013-07-17 22:00:00,420000000000

work-around (but should be handled in the datetimeblock though)

In [66]: df['delta'] = df['delta'].apply(pd.lib.repr_timedelta64)

In [67]: df.to_csv('test.csv')

In [68]: !cat test.csv
,completed,deadline,delta
0,2013-07-15 23:10:00,2013-07-15 23:15:00,-00:05:00
1,2013-07-16 00:20:00,2013-07-16 00:15:00,00:05:00
2,2013-07-16 00:20:00,2013-07-16 00:15:00,00:05:00
3,2013-07-16 21:04:00,2013-07-16 21:30:00,-00:26:00
4,2013-07-16 21:58:00,2013-07-16 22:00:00,-00:02:00
5,2013-07-16 23:21:00,2013-07-16 23:15:00,00:06:00
6,2013-07-16 23:21:00,2013-07-16 23:15:00,00:06:00
7,2013-07-17 00:19:00,2013-07-17 00:15:00,00:04:00
8,2013-07-17 00:19:00,2013-07-17 00:15:00,00:04:00
9,2013-07-17 21:18:00,2013-07-17 21:30:00,-00:12:00
10,2013-07-17 22:07:00,2013-07-17 22:00:00,00:07:00

@cpcloud
Copy link
Member

cpcloud commented Jul 27, 2013

how are you thinking of handling the case of only positive timedelta?

@jreback
Copy link
Contributor Author

jreback commented Jul 27, 2013

I don't think it needs special handling

the main problem is read_csv will read as object
but then no easy way to reinterpret as timedeltas
(need to have to_datetime) parse these

@cpcloud
Copy link
Member

cpcloud commented Jul 27, 2013

it might be useful to have a Timedelta pandas scalar type

@cpcloud
Copy link
Member

cpcloud commented Jul 27, 2013

don't think that would help here...just thinking about it....maybe it would solve the py26 timedelta problem long term

@jreback
Copy link
Contributor Author

jreback commented Jul 27, 2013

see #3009

@jreback
Copy link
Contributor Author

jreback commented Sep 12, 2013

#4684 closed this

In [1]: df = DataFrame(dict(A = Timestamp('20130110'), B = [ Timestamp('20130101') + timedelta(seconds=i) for i in range(10) ]))

In [3]: df['C'] = df['B']-df['A']

In [4]: df
Out[4]: 
                    A                   B                 C
0 2013-01-10 00:00:00 2013-01-01 00:00:00 -9 days, 00:00:00
1 2013-01-10 00:00:00 2013-01-01 00:00:01 -8 days, 23:59:59
2 2013-01-10 00:00:00 2013-01-01 00:00:02 -8 days, 23:59:58
3 2013-01-10 00:00:00 2013-01-01 00:00:03 -8 days, 23:59:57
4 2013-01-10 00:00:00 2013-01-01 00:00:04 -8 days, 23:59:56
5 2013-01-10 00:00:00 2013-01-01 00:00:05 -8 days, 23:59:55
6 2013-01-10 00:00:00 2013-01-01 00:00:06 -8 days, 23:59:54
7 2013-01-10 00:00:00 2013-01-01 00:00:07 -8 days, 23:59:53
8 2013-01-10 00:00:00 2013-01-01 00:00:08 -8 days, 23:59:52
9 2013-01-10 00:00:00 2013-01-01 00:00:09 -8 days, 23:59:51

In [5]: df.to_csv('test.csv')

In [6]: !cat test.csv
,A,B,C
0,2013-01-10 00:00:00,2013-01-01 00:00:00,"-9 days, 00:00:00"
1,2013-01-10 00:00:00,2013-01-01 00:00:01,"-8 days, 23:59:59"
2,2013-01-10 00:00:00,2013-01-01 00:00:02,"-8 days, 23:59:58"
3,2013-01-10 00:00:00,2013-01-01 00:00:03,"-8 days, 23:59:57"
4,2013-01-10 00:00:00,2013-01-01 00:00:04,"-8 days, 23:59:56"
5,2013-01-10 00:00:00,2013-01-01 00:00:05,"-8 days, 23:59:55"
6,2013-01-10 00:00:00,2013-01-01 00:00:06,"-8 days, 23:59:54"
7,2013-01-10 00:00:00,2013-01-01 00:00:07,"-8 days, 23:59:53"
8,2013-01-10 00:00:00,2013-01-01 00:00:08,"-8 days, 23:59:52"
9,2013-01-10 00:00:00,2013-01-01 00:00:09,"-8 days, 23:59:51"

In [9]: df_in = read_csv('test.csv',parse_dates=['A','B'],index_col=0)

In [10]: df_in
Out[10]: 
                    A                   B                  C
0 2013-01-10 00:00:00 2013-01-01 00:00:00  -9 days, 00:00:00
1 2013-01-10 00:00:00 2013-01-01 00:00:01  -8 days, 23:59:59
2 2013-01-10 00:00:00 2013-01-01 00:00:02  -8 days, 23:59:58
3 2013-01-10 00:00:00 2013-01-01 00:00:03  -8 days, 23:59:57
4 2013-01-10 00:00:00 2013-01-01 00:00:04  -8 days, 23:59:56
5 2013-01-10 00:00:00 2013-01-01 00:00:05  -8 days, 23:59:55
6 2013-01-10 00:00:00 2013-01-01 00:00:06  -8 days, 23:59:54
7 2013-01-10 00:00:00 2013-01-01 00:00:07  -8 days, 23:59:53
8 2013-01-10 00:00:00 2013-01-01 00:00:08  -8 days, 23:59:52
9 2013-01-10 00:00:00 2013-01-01 00:00:09  -8 days, 23:59:51

In [11]: df_in.dtypes
Out[11]: 
A    datetime64[ns]
B    datetime64[ns]
C            object
dtype: object

Parsing the timedeltas (manually for now)

In [14]: df_in['C'] = pd.to_timedelta(df_in['C'])

In [15]: df_in
Out[15]: 
                    A                   B                 C
0 2013-01-10 00:00:00 2013-01-01 00:00:00 -9 days, 00:00:00
1 2013-01-10 00:00:00 2013-01-01 00:00:01 -8 days, 23:59:59
2 2013-01-10 00:00:00 2013-01-01 00:00:02 -8 days, 23:59:58
3 2013-01-10 00:00:00 2013-01-01 00:00:03 -8 days, 23:59:57
4 2013-01-10 00:00:00 2013-01-01 00:00:04 -8 days, 23:59:56
5 2013-01-10 00:00:00 2013-01-01 00:00:05 -8 days, 23:59:55
6 2013-01-10 00:00:00 2013-01-01 00:00:06 -8 days, 23:59:54
7 2013-01-10 00:00:00 2013-01-01 00:00:07 -8 days, 23:59:53
8 2013-01-10 00:00:00 2013-01-01 00:00:08 -8 days, 23:59:52
9 2013-01-10 00:00:00 2013-01-01 00:00:09 -8 days, 23:59:51

In [16]: df_in.dtypes
Out[16]: 
A     datetime64[ns]
B     datetime64[ns]
C    timedelta64[ns]
dtype: object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

2 participants