# Notebook with datetime methods

#### Explore Capital-Onebike dataset (2017 Q4) from data.world

In [1]:
import pandas as pd
rides = pd.read_csv('https://query.data.world/s/wgsnsk2czzssrmxxyffuzbrqkjseft')

In [2]:
rides.head()

Unnamed: 0,Duration (ms),Start date,End date,Start station number,Start station,End station number,End station,Bike number,Member type
0,197068,10/1/2017 12:00,10/1/2017 12:03,31214,17th & Corcoran St NW,31229,New Hampshire Ave & T St NW,W21022,Member
1,434934,10/1/2017 12:00,10/1/2017 12:07,31104,Adams Mill & Columbia Rd NW,31602,Park Rd & Holmead Pl NW,W00470,Member
2,955437,10/1/2017 12:00,10/1/2017 12:16,31221,18th & M St NW,31103,16th & Harvard St NW,W20206,Member
3,461619,10/1/2017 12:00,10/1/2017 12:08,31111,10th & U St NW,31102,11th & Kenyon St NW,W21014,Member
4,3357184,10/1/2017 12:00,10/1/2017 12:56,31260,23rd & E St NW,31260,23rd & E St NW,W22349,Casual


In [3]:
rides.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 815370 entries, 0 to 815369
Data columns (total 9 columns):
 #   Column                Non-Null Count   Dtype 
---  ------                --------------   ----- 
 0   Duration (ms)         815370 non-null  int64 
 1   Start date            815370 non-null  object
 2   End date              815370 non-null  object
 3   Start station number  815370 non-null  int64 
 4   Start station         815370 non-null  object
 5   End station number    815370 non-null  int64 
 6   End station           815370 non-null  object
 7   Bike number           815370 non-null  object
 8   Member type           815370 non-null  object
dtypes: int64(3), object(6)
memory usage: 56.0+ MB


In [4]:
rides['Start date'].iloc[2]

'10/1/2017 12:00'

### Note the dates are imported as Dtype "object" - which is a string
In order to import these columns as datetime we can use the "parse_dates" key with a list of columns on import

In [5]:
rides = pd.read_csv('https://query.data.world/s/wgsnsk2czzssrmxxyffuzbrqkjseft',
                   parse_dates = ['Start date', 'End date'])

In [6]:
rides.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 815370 entries, 0 to 815369
Data columns (total 9 columns):
 #   Column                Non-Null Count   Dtype         
---  ------                --------------   -----         
 0   Duration (ms)         815370 non-null  int64         
 1   Start date            815370 non-null  datetime64[ns]
 2   End date              815370 non-null  datetime64[ns]
 3   Start station number  815370 non-null  int64         
 4   Start station         815370 non-null  object        
 5   End station number    815370 non-null  int64         
 6   End station           815370 non-null  object        
 7   Bike number           815370 non-null  object        
 8   Member type           815370 non-null  object        
dtypes: datetime64[ns](2), int64(3), object(4)
memory usage: 56.0+ MB


In [7]:
rides['Start date'].iloc[2]

Timestamp('2017-10-01 12:00:00')

### Note that this is no longer a string of dtype "object" but rather a datetime64[ns] object

### Alternatively we could use:
```
ride['Start date'] = pd.to_datetime(rides['Start date'], format="%Y-%m-%d %H:%M:%S")
```

## Perform Timezone-aware arithmetic

In [8]:
rides['Duration'] = rides['End date'] - rides['Start date']

In [9]:
rides['Duration'].head()

0   0 days 00:03:00
1   0 days 00:07:00
2   0 days 00:16:00
3   0 days 00:08:00
4   0 days 00:56:00
Name: Duration, dtype: timedelta64[ns]

In [10]:
rides['Duration'].describe()

count                       815370
mean     0 days 00:05:29.196070495
std      0 days 02:05:45.621270716
min              -1 days +12:01:00
25%                0 days 00:06:00
50%                0 days 00:10:00
75%                0 days 00:18:00
max                1 days 11:00:00
Name: Duration, dtype: object

In [11]:
rides.Duration.dt.total_seconds().head()

0     180.0
1     420.0
2     960.0
3     480.0
4    3360.0
Name: Duration, dtype: float64