ENH: timedelta should write correctly to csv #4378

jreback · 2013-07-27T13:06:05Z

Though read_csv can't read these in yet....
see this: https://github.com/playpauseandstop/kikola/blob/master/kikola/utils/timedelta.py

In [58]: data
Out[58]: '  completed             deadline\n15-07-2013 23:10    15-07-2013 23:15\n16-07-2013 00:20    16-07-2013 00:15\n16-07-2013 00:20    16-07-2013 00:15\n16-07-2013 21:04    16-07-2013 21:30\n16-07-2013 21:58    16-07-2013 22:00\n16-07-2013 23:21    16-07-2013 23:15\n16-07-2013 23:21    16-07-2013 23:15\n17-07-2013 00:19    17-07-2013 00:15\n17-07-2013 00:19    17-07-2013 00:15\n17-07-2013 21:18    17-07-2013 21:30\n17-07-2013 22:07    17-07-2013 22:00'

In [59]: df = read_csv(StringIO(data),index_col=None,skiprows=1,header=None,parse_dates=[[0,1],[2,3]],sep='\s+')

In [60]: df.columns = ['completed','deadline']

In [61]: df['delta'] = df['completed']-df['deadline']

In [62]: df
Out[62]: 
             completed            deadline     delta
0  2013-07-15 23:10:00 2013-07-15 23:15:00 -00:05:00
1  2013-07-16 00:20:00 2013-07-16 00:15:00  00:05:00
2  2013-07-16 00:20:00 2013-07-16 00:15:00  00:05:00
3  2013-07-16 21:04:00 2013-07-16 21:30:00 -00:26:00
4  2013-07-16 21:58:00 2013-07-16 22:00:00 -00:02:00
5  2013-07-16 23:21:00 2013-07-16 23:15:00  00:06:00
6  2013-07-16 23:21:00 2013-07-16 23:15:00  00:06:00
7  2013-07-17 00:19:00 2013-07-17 00:15:00  00:04:00
8  2013-07-17 00:19:00 2013-07-17 00:15:00  00:04:00
9  2013-07-17 21:18:00 2013-07-17 21:30:00 -00:12:00
10 2013-07-17 22:07:00 2013-07-17 22:00:00  00:07:00

In [63]: df.dtypes
Out[63]: 
completed     datetime64[ns]
deadline      datetime64[ns]
delta        timedelta64[ns]
dtype: object

In [64]: df.to_csv('test.csv')

In [65]: !cat test.csv
,completed,deadline,delta
0,2013-07-15 23:10:00,2013-07-15 23:15:00,-300000000000
1,2013-07-16 00:20:00,2013-07-16 00:15:00,300000000000
2,2013-07-16 00:20:00,2013-07-16 00:15:00,300000000000
3,2013-07-16 21:04:00,2013-07-16 21:30:00,-1560000000000
4,2013-07-16 21:58:00,2013-07-16 22:00:00,-120000000000
5,2013-07-16 23:21:00,2013-07-16 23:15:00,360000000000
6,2013-07-16 23:21:00,2013-07-16 23:15:00,360000000000
7,2013-07-17 00:19:00,2013-07-17 00:15:00,240000000000
8,2013-07-17 00:19:00,2013-07-17 00:15:00,240000000000
9,2013-07-17 21:18:00,2013-07-17 21:30:00,-720000000000
10,2013-07-17 22:07:00,2013-07-17 22:00:00,420000000000

work-around (but should be handled in the datetimeblock though)

In [66]: df['delta'] = df['delta'].apply(pd.lib.repr_timedelta64)

In [67]: df.to_csv('test.csv')

In [68]: !cat test.csv
,completed,deadline,delta
0,2013-07-15 23:10:00,2013-07-15 23:15:00,-00:05:00
1,2013-07-16 00:20:00,2013-07-16 00:15:00,00:05:00
2,2013-07-16 00:20:00,2013-07-16 00:15:00,00:05:00
3,2013-07-16 21:04:00,2013-07-16 21:30:00,-00:26:00
4,2013-07-16 21:58:00,2013-07-16 22:00:00,-00:02:00
5,2013-07-16 23:21:00,2013-07-16 23:15:00,00:06:00
6,2013-07-16 23:21:00,2013-07-16 23:15:00,00:06:00
7,2013-07-17 00:19:00,2013-07-17 00:15:00,00:04:00
8,2013-07-17 00:19:00,2013-07-17 00:15:00,00:04:00
9,2013-07-17 21:18:00,2013-07-17 21:30:00,-00:12:00
10,2013-07-17 22:07:00,2013-07-17 22:00:00,00:07:00

The text was updated successfully, but these errors were encountered:

cpcloud · 2013-07-27T13:39:09Z

how are you thinking of handling the case of only positive timedelta?

jreback · 2013-07-27T13:54:10Z

I don't think it needs special handling

the main problem is read_csv will read as object
but then no easy way to reinterpret as timedeltas
(need to have to_datetime) parse these

cpcloud · 2013-07-27T13:55:07Z

it might be useful to have a Timedelta pandas scalar type

cpcloud · 2013-07-27T13:56:44Z

don't think that would help here...just thinking about it....maybe it would solve the py26 timedelta problem long term

jreback · 2013-07-27T14:05:28Z

see #3009

jreback · 2013-09-12T16:15:04Z

#4684 closed this

In [1]: df = DataFrame(dict(A = Timestamp('20130110'), B = [ Timestamp('20130101') + timedelta(seconds=i) for i in range(10) ]))

In [3]: df['C'] = df['B']-df['A']

In [4]: df
Out[4]: 
                    A                   B                 C
0 2013-01-10 00:00:00 2013-01-01 00:00:00 -9 days, 00:00:00
1 2013-01-10 00:00:00 2013-01-01 00:00:01 -8 days, 23:59:59
2 2013-01-10 00:00:00 2013-01-01 00:00:02 -8 days, 23:59:58
3 2013-01-10 00:00:00 2013-01-01 00:00:03 -8 days, 23:59:57
4 2013-01-10 00:00:00 2013-01-01 00:00:04 -8 days, 23:59:56
5 2013-01-10 00:00:00 2013-01-01 00:00:05 -8 days, 23:59:55
6 2013-01-10 00:00:00 2013-01-01 00:00:06 -8 days, 23:59:54
7 2013-01-10 00:00:00 2013-01-01 00:00:07 -8 days, 23:59:53
8 2013-01-10 00:00:00 2013-01-01 00:00:08 -8 days, 23:59:52
9 2013-01-10 00:00:00 2013-01-01 00:00:09 -8 days, 23:59:51

In [5]: df.to_csv('test.csv')

In [6]: !cat test.csv
,A,B,C
0,2013-01-10 00:00:00,2013-01-01 00:00:00,"-9 days, 00:00:00"
1,2013-01-10 00:00:00,2013-01-01 00:00:01,"-8 days, 23:59:59"
2,2013-01-10 00:00:00,2013-01-01 00:00:02,"-8 days, 23:59:58"
3,2013-01-10 00:00:00,2013-01-01 00:00:03,"-8 days, 23:59:57"
4,2013-01-10 00:00:00,2013-01-01 00:00:04,"-8 days, 23:59:56"
5,2013-01-10 00:00:00,2013-01-01 00:00:05,"-8 days, 23:59:55"
6,2013-01-10 00:00:00,2013-01-01 00:00:06,"-8 days, 23:59:54"
7,2013-01-10 00:00:00,2013-01-01 00:00:07,"-8 days, 23:59:53"
8,2013-01-10 00:00:00,2013-01-01 00:00:08,"-8 days, 23:59:52"
9,2013-01-10 00:00:00,2013-01-01 00:00:09,"-8 days, 23:59:51"

In [9]: df_in = read_csv('test.csv',parse_dates=['A','B'],index_col=0)

In [10]: df_in
Out[10]: 
                    A                   B                  C
0 2013-01-10 00:00:00 2013-01-01 00:00:00  -9 days, 00:00:00
1 2013-01-10 00:00:00 2013-01-01 00:00:01  -8 days, 23:59:59
2 2013-01-10 00:00:00 2013-01-01 00:00:02  -8 days, 23:59:58
3 2013-01-10 00:00:00 2013-01-01 00:00:03  -8 days, 23:59:57
4 2013-01-10 00:00:00 2013-01-01 00:00:04  -8 days, 23:59:56
5 2013-01-10 00:00:00 2013-01-01 00:00:05  -8 days, 23:59:55
6 2013-01-10 00:00:00 2013-01-01 00:00:06  -8 days, 23:59:54
7 2013-01-10 00:00:00 2013-01-01 00:00:07  -8 days, 23:59:53
8 2013-01-10 00:00:00 2013-01-01 00:00:08  -8 days, 23:59:52
9 2013-01-10 00:00:00 2013-01-01 00:00:09  -8 days, 23:59:51

In [11]: df_in.dtypes
Out[11]: 
A    datetime64[ns]
B    datetime64[ns]
C            object
dtype: object

Parsing the timedeltas (manually for now)

In [14]: df_in['C'] = pd.to_timedelta(df_in['C'])

In [15]: df_in
Out[15]: 
                    A                   B                 C
0 2013-01-10 00:00:00 2013-01-01 00:00:00 -9 days, 00:00:00
1 2013-01-10 00:00:00 2013-01-01 00:00:01 -8 days, 23:59:59
2 2013-01-10 00:00:00 2013-01-01 00:00:02 -8 days, 23:59:58
3 2013-01-10 00:00:00 2013-01-01 00:00:03 -8 days, 23:59:57
4 2013-01-10 00:00:00 2013-01-01 00:00:04 -8 days, 23:59:56
5 2013-01-10 00:00:00 2013-01-01 00:00:05 -8 days, 23:59:55
6 2013-01-10 00:00:00 2013-01-01 00:00:06 -8 days, 23:59:54
7 2013-01-10 00:00:00 2013-01-01 00:00:07 -8 days, 23:59:53
8 2013-01-10 00:00:00 2013-01-01 00:00:08 -8 days, 23:59:52
9 2013-01-10 00:00:00 2013-01-01 00:00:09 -8 days, 23:59:51

In [16]: df_in.dtypes
Out[16]: 
A     datetime64[ns]
B     datetime64[ns]
C    timedelta64[ns]
dtype: object

jreback closed this as completed Sep 12, 2013

jreback mentioned this issue Sep 12, 2013

ENH: TimeDeltaIndex and corresponding scalar #3009

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: timedelta should write correctly to csv #4378

ENH: timedelta should write correctly to csv #4378

jreback commented Jul 27, 2013

cpcloud commented Jul 27, 2013

jreback commented Jul 27, 2013

cpcloud commented Jul 27, 2013

cpcloud commented Jul 27, 2013

jreback commented Jul 27, 2013

jreback commented Sep 12, 2013

ENH: timedelta should write correctly to csv #4378

ENH: timedelta should write correctly to csv #4378

Comments

jreback commented Jul 27, 2013

cpcloud commented Jul 27, 2013

jreback commented Jul 27, 2013

cpcloud commented Jul 27, 2013

cpcloud commented Jul 27, 2013

jreback commented Jul 27, 2013

jreback commented Sep 12, 2013