# Agenda: Dates and times

1. Core concepts and data structures
2. Parsing of dates and times
3. Using `.dt`
4. Comparisons and the like
5. Timedelta, and comparisons with it
6. Grouper
7. Time series
8. Resampling 

# Concepts

- `datetime` or `timestamp` -- unique point in time, with year, month, day, hour, minute, second, etc.
- `timedelta` or `interval` -- range of time, measured in seconds, minutes, days

Date math:
- `datetime` - `datetime` = `timedelta`
- `datetime` + `timedelta` = `datetime`
- `datetime` - `timedelta` = `datetime`

In [1]:
import pandas as pd

filename = 'taxi.csv'
df = pd.read_csv(filename)

In [2]:
df.head()

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,pickup_longitude,pickup_latitude,RateCodeID,store_and_fwd_flag,dropoff_longitude,dropoff_latitude,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount
0,2,2015-06-02 11:19:29,2015-06-02 11:47:52,1,1.63,-73.95443,40.764141,1,N,-73.974754,40.754093,2,17.0,0.0,0.5,0.0,0.0,0.3,17.8
1,2,2015-06-02 11:19:30,2015-06-02 11:27:56,1,0.46,-73.971443,40.758942,1,N,-73.978539,40.761909,1,6.5,0.0,0.5,1.0,0.0,0.3,8.3
2,2,2015-06-02 11:19:31,2015-06-02 11:30:30,1,0.87,-73.978111,40.738434,1,N,-73.990273,40.745438,1,8.0,0.0,0.5,2.2,0.0,0.3,11.0
3,2,2015-06-02 11:19:31,2015-06-02 11:39:02,1,2.13,-73.945892,40.773529,1,N,-73.971527,40.76033,1,13.5,0.0,0.5,2.86,0.0,0.3,17.16
4,1,2015-06-02 11:19:32,2015-06-02 11:32:49,1,1.4,-73.979088,40.776772,1,N,-73.982162,40.758999,2,9.5,0.0,0.5,0.0,0.0,0.3,10.3


In [3]:
df.dtypes

VendorID                   int64
tpep_pickup_datetime      object
tpep_dropoff_datetime     object
passenger_count            int64
trip_distance            float64
pickup_longitude         float64
pickup_latitude          float64
RateCodeID                 int64
store_and_fwd_flag        object
dropoff_longitude        float64
dropoff_latitude         float64
payment_type               int64
fare_amount              float64
extra                    float64
mta_tax                  float64
tip_amount               float64
tolls_amount             float64
improvement_surcharge    float64
total_amount             float64
dtype: object

In [4]:
# how much memory?

df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 19 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   VendorID               9999 non-null   int64  
 1   tpep_pickup_datetime   9999 non-null   object 
 2   tpep_dropoff_datetime  9999 non-null   object 
 3   passenger_count        9999 non-null   int64  
 4   trip_distance          9999 non-null   float64
 5   pickup_longitude       9999 non-null   float64
 6   pickup_latitude        9999 non-null   float64
 7   RateCodeID             9999 non-null   int64  
 8   store_and_fwd_flag     9999 non-null   object 
 9   dropoff_longitude      9999 non-null   float64
 10  dropoff_latitude       9999 non-null   float64
 11  payment_type           9999 non-null   int64  
 12  fare_amount            9999 non-null   float64
 13  extra                  9999 non-null   float64
 14  mta_tax                9999 non-null   float64
 15  tip_

In [5]:
# pd.to_datetime

pd.to_datetime(df['tpep_pickup_datetime'])

0      2015-06-02 11:19:29
1      2015-06-02 11:19:30
2      2015-06-02 11:19:31
3      2015-06-02 11:19:31
4      2015-06-02 11:19:32
               ...        
9994   2015-06-01 00:12:59
9995   2015-06-01 00:12:59
9996   2015-06-01 00:13:00
9997   2015-06-01 00:13:02
9998   2015-06-01 00:13:04
Name: tpep_pickup_datetime, Length: 9999, dtype: datetime64[ns]

In [6]:
# assign back to the data frame

df['tpep_pickup_datetime'] = pd.to_datetime(df['tpep_pickup_datetime'])
df['tpep_dropoff_datetime'] = pd.to_datetime(df['tpep_dropoff_datetime'])

In [7]:
df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 19 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   VendorID               9999 non-null   int64         
 1   tpep_pickup_datetime   9999 non-null   datetime64[ns]
 2   tpep_dropoff_datetime  9999 non-null   datetime64[ns]
 3   passenger_count        9999 non-null   int64         
 4   trip_distance          9999 non-null   float64       
 5   pickup_longitude       9999 non-null   float64       
 6   pickup_latitude        9999 non-null   float64       
 7   RateCodeID             9999 non-null   int64         
 8   store_and_fwd_flag     9999 non-null   object        
 9   dropoff_longitude      9999 non-null   float64       
 10  dropoff_latitude       9999 non-null   float64       
 11  payment_type           9999 non-null   int64         
 12  fare_amount            9999 non-null   float64       
 13  ext

# What format?

- `YYYY-MM-DD HH:MM:SS` -- this works automatically
- `DD/MM/YYYY HH:MM:SS` -- works fine, if we pass `dayfirst=True`
- `MM/DD/YYYY HH:MM:SS` -- this works automatically

We can pass the `format` keyword argument, and pass a format string (https://strftime.org/)

In [10]:
df = (pd.
     read_csv(filename)
      .assign(tpep_pickup_datetime = lambda df_: pd.to_datetime(df_['tpep_pickup_datetime']),
              tpep_dropoff_datetime = lambda df_: pd.to_datetime(df_['tpep_dropoff_datetime']))
     )

In [11]:
df.dtypes

VendorID                          int64
tpep_pickup_datetime     datetime64[ns]
tpep_dropoff_datetime    datetime64[ns]
passenger_count                   int64
trip_distance                   float64
pickup_longitude                float64
pickup_latitude                 float64
RateCodeID                        int64
store_and_fwd_flag               object
dropoff_longitude               float64
dropoff_latitude                float64
payment_type                      int64
fare_amount                     float64
extra                           float64
mta_tax                         float64
tip_amount                      float64
tolls_amount                    float64
improvement_surcharge           float64
total_amount                    float64
dtype: object

In [13]:
# we can also use parse_dates

df = (pd.
     read_csv(filename,
              usecols=['tpep_pickup_datetime', 'tpep_dropoff_datetime',
                      'passenger_count', 'trip_distance', 'total_amount'],
             parse_dates=['tpep_pickup_datetime', 'tpep_dropoff_datetime'])
     )
df.dtypes

tpep_pickup_datetime     datetime64[ns]
tpep_dropoff_datetime    datetime64[ns]
passenger_count                   int64
trip_distance                   float64
total_amount                    float64
dtype: object

In [14]:
df.head(10)

Unnamed: 0,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,total_amount
0,2015-06-02 11:19:29,2015-06-02 11:47:52,1,1.63,17.8
1,2015-06-02 11:19:30,2015-06-02 11:27:56,1,0.46,8.3
2,2015-06-02 11:19:31,2015-06-02 11:30:30,1,0.87,11.0
3,2015-06-02 11:19:31,2015-06-02 11:39:02,1,2.13,17.16
4,2015-06-02 11:19:32,2015-06-02 11:32:49,1,1.4,10.3
5,2015-06-02 11:19:33,2015-06-02 11:28:48,1,1.4,10.55
6,2015-06-02 11:19:34,2015-06-02 11:38:46,1,1.8,16.3
7,2015-06-02 11:19:35,2015-06-02 12:36:46,4,11.9,73.84
8,2015-06-02 11:19:36,2015-06-02 11:45:19,1,1.27,15.8
9,2015-06-02 11:19:38,2015-06-02 11:23:50,1,0.6,6.3


# Usint `.dt`

We can use `.dt` on a `datetime` column and retrieve parts from it:

- `.d