# Agenda: Dates and times

1. Background theory
2. How to turn a column into datetime info
3. Calculations we can perform
4. Datetime columns as our index ("time series")

# Background

When we use the word "time" in our everyday speech, we might mean one of two different things:

1. One specific, unique point in time, specified with year/month/day and hour/minute/second. Of course, you can get even more precise than that. Examples: Time of birth. Time of death. When you graduated university. When a meeting starts. Each of these things is a unique point in time, and you can specify when precisely it happened. When we keep track of these on a computer, they're known as either a `datetime` value or a `timestamp`.
2. A span of time, between two points. This isn't unique, and it doesn't have a specific year/month/day, but we do measure it with the same units. We can use this kind of value to measure a lifespan, or the time spent in school, or the time that people were married, or the time you were in a meeting. The data type used for this is known as a `timedelta` or an `interval`.

Date math:
- `timestamp` - `timestamp` = `timedelta`
- `timestamp` + `timedelta` = `timestamp`
- `timestamp` - `timedelta` = `timestamp`

In [13]:
import pandas as pd
filename = '../data/taxi.csv'

df = pd.read_csv(filename,
                 usecols=['tpep_pickup_datetime',
                          'tpep_dropoff_datetime',
                          'passenger_count',
                          'trip_distance',
                          'total_amount'])
                          

In [14]:
df.head()

Unnamed: 0,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,total_amount
0,2015-06-02 11:19:29,2015-06-02 11:47:52,1,1.63,17.8
1,2015-06-02 11:19:30,2015-06-02 11:27:56,1,0.46,8.3
2,2015-06-02 11:19:31,2015-06-02 11:30:30,1,0.87,11.0
3,2015-06-02 11:19:31,2015-06-02 11:39:02,1,2.13,17.16
4,2015-06-02 11:19:32,2015-06-02 11:32:49,1,1.4,10.3


In [15]:
df.dtypes

tpep_pickup_datetime      object
tpep_dropoff_datetime     object
passenger_count            int64
trip_distance            float64
total_amount             float64
dtype: object

In [16]:
# the main way to take a string column and get a datetime column based on it
# is with pd.to_datetime, a function that comes with Pandas.

pd.to_datetime(df['tpep_pickup_datetime'])

0      2015-06-02 11:19:29
1      2015-06-02 11:19:30
2      2015-06-02 11:19:31
3      2015-06-02 11:19:31
4      2015-06-02 11:19:32
               ...        
9994   2015-06-01 00:12:59
9995   2015-06-01 00:12:59
9996   2015-06-01 00:13:00
9997   2015-06-01 00:13:02
9998   2015-06-01 00:13:04
Name: tpep_pickup_datetime, Length: 9999, dtype: datetime64[ns]

In [17]:
# how much memory are we *really* using right now on this data frame?
df.memory_usage(deep=True).sum()    # 1,599,972

1599972

In [18]:
# we can assign the datetime version of these columns back to the orignal data frame

df['tpep_pickup_datetime'] = pd.to_datetime(df['tpep_pickup_datetime'])
df['tpep_dropoff_datetime'] = pd.to_datetime(df['tpep_dropoff_datetime'])

In [19]:
df.memory_usage(deep=True).sum()   

400092

In [None]:
# dates and times come in *many* different formats.
# how did pd.to_datetime know how to convert our strings into datetime values?

