# Conver to timedelta

* [pandas.Timedelta](https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html)

> Represents a duration, the difference between two dates or times.
> Timedelta is the **pandas equivalent of python’s datetime.timedelta** and is interchangeable with it in most cases.
> ```
> td = pd.Timedelta(1, "d")
> td
> Timedelta('1 days 00:00:00')
> ```

* [pandas.to_timedelta](https://pandas.pydata.org/docs/reference/api/pandas.to_timedelta.html)
> Convert one string
> ```
> pd.to_timedelta('1 days 06:05:01.00003')
> Timedelta('1 days 06:05:01.000030')
> ```
> 
> Parse list of strings.
> ```
> pd.to_timedelta(['1 days 06:05:01.00003', '15.5us', 'nan'])
> TimedeltaIndex(
>   ['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT],
>   dtype='timedelta64[ns]', freq=None
> )
>```

In [3]:
import numpy as np
import pandas as pd

In [21]:
df = pd.read_json(
    "./data/recovery.json",
    dtype={
        "facility": pd.CategoricalDtype(),
        "supplier": 'category',
        "supplierCode": 'category',
        "suppliedM3": np.float32,
        "recoveredM3": np.float32,
    },
    convert_dates=['date']
)

## Convert duration column to timedelta

In [13]:
df.insert(
    loc=3,
    column='elapsed',
    value=pd.to_timedelta('00:' + df['processTime'], errors='coerce'),
    allow_duplicates=False
)
df

Unnamed: 0,facility,timeStart,processTime,elapsed,supplier,suppliedM3,recoveredM3,date,timeEnd,supplierCode
0,Bundaberg,9/1/22 8:16 AM,4:05,0 days 00:04:05,Mary,5.09,4.13,NaT,,
1,Newcastle,8:29:00 AM,,NaT,,2.00,1.55,2022-09-01,9:07:00 AM,har
2,Newcastle,9:27:00 AM,,NaT,,6.80,4.15,2022-09-01,11:28:00 AM,dic
3,Newcastle,11:38:00 AM,,NaT,,1.95,1.55,2022-09-01,12:21:00 PM,har
4,Bundaberg,9/1/22 12:34 PM,1:50,0 days 00:01:50,Mary Therese,3.78,2.56,NaT,,
...,...,...,...,...,...,...,...,...,...,...
227,Newcastle,11:40:00 AM,,NaT,,3.70,2.35,2022-09-30,12:41:00 PM,tom
228,Newcastle,12:52:00 PM,,NaT,,6.35,4.55,2022-09-30,2:36:00 PM,dic
229,Bundaberg,9/30/22 1:48 PM,3:40,0 days 00:03:40,Mary Therese,4.53,2.73,NaT,,
230,Newcastle,3:02:00 PM,,NaT,,2.00,1.45,2022-09-30,3:42:00 PM,har


In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 232 entries, 0 to 231
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype          
---  ------        --------------  -----          
 0   facility      232 non-null    category       
 1   timeStart     232 non-null    object         
 2   processTime   111 non-null    object         
 3   elapsed       111 non-null    timedelta64[ns]
 4   supplier      111 non-null    category       
 5   suppliedM3    232 non-null    float32        
 6   recoveredM3   232 non-null    float32        
 7   date          121 non-null    datetime64[ns] 
 8   timeEnd       121 non-null    object         
 9   supplierCode  121 non-null    category       
dtypes: category(3), datetime64[ns](1), float32(2), object(3), timedelta64[ns](1)
memory usage: 12.1+ KB


In [7]:
df.describe()

Unnamed: 0,elapsed,suppliedM3,recoveredM3
count,111,232.0,232.0
mean,0 days 00:01:50.090090090,4.141035,2.857543
std,0 days 00:00:56.612647085,1.369829,0.92911
min,0 days 00:00:40,1.9,1.2
25%,0 days 00:01:05,3.08,2.1875
50%,0 days 00:01:30,4.15,2.865
75%,0 days 00:02:20,5.05,3.58
max,0 days 00:04:30,6.95,5.5


---
# Example 2

In [17]:
import pandas as pd
data = {
    'Minutes': ['18:30', '24:50', '33:21', '28:39', '27:30'],
    'Team': ['team1', 'team2', 'team1', 'team1', 'team2']
}

df = pd.DataFrame(data)
df['Minutes'] = pd.to_timedelta('00:' + df['Minutes'].replace('',np.NaN))
df

Unnamed: 0,Minutes,Team
0,0 days 00:18:30,team1
1,0 days 00:24:50,team2
2,0 days 00:33:21,team1
3,0 days 00:28:39,team1
4,0 days 00:27:30,team2


In [19]:
df.groupby('Team')['Minutes'].mean()

Team
team1   0 days 00:26:50
team2   0 days 00:26:10
Name: Minutes, dtype: timedelta64[ns]