# 1. Datetime and Timedelta

### Objectives

+ Define date, time, and datetime
+ A Pandas **`Timestamp`** is a datetime
+ **`pd.to_datetime`** is flexible and intelligent. Converts both strings and numbers to Timestamps
+ Timestamp and datetime refer to the same thing in Pandas
+ **`pd.to_timedelta`** converts strings and numbers to Timedeltas

### Resource
+ [Time series pandas documentation][1]
+ [Timedelta pandas documentation][2]

## Introduction
This notebook covers two distinct concepts, datetimes and timedeltas and how they are created and used in Pandas. A datetime represents a specific **moment** in time. A timedelta represents an **amount** of time.

[1]: http://pandas.pydata.org/pandas-docs/stable/timeseries.html
[2]: http://pandas.pydata.org/pandas-docs/stable/timedeltas.html

# Date vs Time vs Datetime
There is a distinction that needs to be made between the terms **date**, **time**, and **datetime**. They all three mean different things.

* **date** - Only the month, day, and year. So 2016-01-01 would represent January 1, 2016 and be considered a **date**.
* **time** - Only the hours, minutes, seconds and parts of a second (milli/micro/nano). 5 hours, 45 minutes and 6.74234 seconds for example would be considered a **time**.
* **datetime** - A combination of the above two. Has both date (Year, Month, Day) and time (Hour, Minute, Second) components. January 1, 2016 at 5:45 p.m. would be an example of a **datetime**.

The Python standard library contains the [datetime module][1]. It is a popular and important module but will not be covered here since Pandas builds its own datetime and timedelta objects that are more powerful. However, there are some notes available on datetime standard library in the **extras** directory.

[1]: https://docs.python.org/3.5/library/datetime.html

# Datetimes in NumPy
NumPy has its own datetime data type called [datetime64][1]. It is more powerful and flexible than core Python's datetime module. We won't be working with it either.

# Pandas Timestamp
Pandas has its own datetime data type called a **`Timestamp`**, which adds more functionality to NumPy's **`datetime64`**.

[1]: https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html

# Creating a single Timestamp with the `to_datetime` function

The **`pd.to_datetime`** function converts both strings and numbers to Timestamps as well as entire arrays or Series. It is intelligent and can detect a wide variety of strings. Each of the following create a single Pandas Timestamp object.

In [1]:
import pandas as pd
import numpy as np

In [2]:
pd.to_datetime('2016/1/10')

Timestamp('2016-01-10 00:00:00')

In [3]:
pd.to_datetime('2016-1-10')

Timestamp('2016-01-10 00:00:00')

In [4]:
pd.to_datetime('Jan 3, 2019 20:45.56')

Timestamp('2019-01-03 20:45:33')

In [5]:
pd.to_datetime('January 3, 2019 20:45.56')

Timestamp('2019-01-03 20:45:33')

In [6]:
pd.to_datetime('2016-01-05T05:34:43.123456789')

Timestamp('2016-01-05 05:34:43.123456789')

### Epoch
The term epoch refers to the origin of a particular era. Like many other programming languages, Python uses January 1, 1970 (also known as the Unix epoch) as its epoch for keeping track of datetime. In Pandas, integers are used to represent the number of nanoseconds that have elapsed since the epoch.

### Converting numbers to Timestamps
You can pass numbers to the **`to_datetime`** function and it will convert it to a Timestamp. It assumes you are passing in the number of nanoseconds after the epoch. The following creates the datetime that is 100 nanoseconds after Jan 1, 1970.

In [7]:
pd.to_datetime(100)

Timestamp('1970-01-01 00:00:00.000000100')

### Specify unit
The default unit is nanoseconds, but you can specify a different one with the **`unit`** parameter.

In [8]:
# 100 seconds after the epoch
pd.to_datetime(100, unit='s')

Timestamp('1970-01-01 00:01:40')

In [9]:
# 20,000 days after the epoch
pd.to_datetime(20000, unit='d')

Timestamp('2024-10-04 00:00:00')

### Not a Series or a DataFrame
When using Pandas, you are almost always working with either a Series or a DataFrame (and occasionally an Index). The Pandas Timestamp is another type unique to Pandas, but you will rarely be working with it directly.

# Why is `to_datetime` returning a `Timestamp` object?
It must look a bit odd to see a Timestamp object being returned from the **`to_datetime`** function. The docstrings for **`to_datetime`** even write:

> **Convert argument to datetime.**

Technically, the object is definitely a Pandas Timestamp object. We can verify this with the **`type`** function:

In [10]:
dt = pd.to_datetime(20000, unit='d')
type(dt)

pandas._libs.tslibs.timestamps.Timestamp

### Datetime is common terminology in many languages
The term **datetime** is common in many programming languages and this is what the Pandas documentation is referring to. The technical name of the Pandas object is indeed Timestamp, but the common name for what it represents is a datetime.

## Timestamp and datetime refer to the same thing
The terms **Timestamp** and **datetime** refer to the exact same concept in pandas. Technically, each value is a Pandas **`Timestamp`** object but the term **datetime** is used to refer to it as well. Yes, that is extremely confusing, but hopefully now it is clear.

## Typical Timestamps in Pandas
Typically, you will encounter Timestamps within a column of a Pandas DataFrame as we do below. Note, that the data type is `datetime64`. This is confusing, but again, Timestamp and datetime are equivalent terms.

In [11]:
emp = pd.read_csv('../data/employee.csv', parse_dates=['hire_date'])
emp.dtypes

title                object
dept                 object
salary              float64
race                 object
gender               object
hire_date    datetime64[ns]
dtype: object

### Each individual value in the datetime columns is a Timestamp
If we extract the **`hire_date`** column as a Series and print out the first few rows, you will see that data type (at the bottom of the output) is still written with the word **datetime**.

In [12]:
hire_date = emp['hire_date']
hire_date.head()

0   2015-02-03
1   1982-02-08
2   1984-11-26
3   2012-03-26
4   2013-11-04
Name: hire_date, dtype: datetime64[ns]

If we select the first value in the Series, we get a Timestamp.

In [13]:
hire_date.loc[0]

Timestamp('2015-02-03 00:00:00')

## Timestamp attributes
These Timestamp objects have similar attributes and methods as the **`dt`** Series accessor in a previous notebook. Let's see some of these again.

In [14]:
ts = pd.to_datetime('Jan 3, 2019 20:45.56')

In [15]:
ts.day

3

In [16]:
ts.day_name()

'Thursday'

In [17]:
ts.minute

45

# Timedelta - an amount of time
A timedelta is a specific amount of time such as 20 seconds, or 13 days 5 minutes and 10 seconds. Use the **`to_timedelta`** function to create a Timedelta object. It works analogously to the **`to_datetime`** function.

### Converting strings to a Timedelta with `to_timedelta`
A wide variety of strings are able to be converted to Timedeltas. [See the docs][1] for more info.

[1]: http://pandas.pydata.org/pandas-docs/stable/timedeltas.html#to-timedelta

In [18]:
pd.to_timedelta('5 days 03:12:45.123')

Timedelta('5 days 03:12:45.123000')

In [19]:
# 10 hours and 13 microseconds
pd.to_timedelta('10h 13ms')

Timedelta('0 days 10:00:00.013000')

### Converting numbers to Timedeltas with `to_timedelta`
As with **`to_datetime`**, passing a number to **`to_timedelta`** will be by default treated as the number of nanoseconds. Use the **`unit`** parameter to change the time unit.

In [20]:
# 123,000 nanoseconds
pd.to_timedelta(123000)

Timedelta('0 days 00:00:00.000123')

In [21]:
# 500 days
pd.to_timedelta(500, unit='d')

Timedelta('500 days 00:00:00')

Since years is not a standard amount, the highest unit returned is in days. You can still use 'y' to represent years with the output converted to days.

In [22]:
# 23 years
pd.to_timedelta(23, unit='y')

Timedelta('8400 days 13:51:36')

In [23]:
# 10 hours
pd.to_timedelta(10, 'h')

Timedelta('0 days 10:00:00')

### No name confusion with Timedelta
The Timedelta data type is unique to pandas just like the Timestamp object is. Pandas Timedelta is built upon NumPy's timedelta64 data type which is superior to pure Python's timedelta. Forunately, the Pandas developers used the name **Timedelta** for the data type which is the same as NumPy's. 

There is no name confusion here, unlike there is with **Timestamp/Datetime**.

In [24]:
td = pd.to_timedelta(3, 'y')
type(td)

pandas._libs.tslibs.timedeltas.Timedelta

## Timedelta attributes and methods
There are many attributes and methods available to Timedelta objects. Let's see some below:

In [25]:
td

Timedelta('1095 days 17:27:36')

In [26]:
td.days

1095

In [27]:
td.seconds

62856

In [28]:
td.components

Components(days=1095, hours=17, minutes=27, seconds=36, milliseconds=0, microseconds=0, nanoseconds=0)

## Creating Timedeltas by subtracting Datetimes
It is possible to create a Timedelta object by subtracting two Datetimes.

In [29]:
dt1 = pd.to_datetime('2012-12-21 5:30')
dt2 = pd.to_datetime('2016-1-1 12:45:12')

In [30]:
dt1

Timestamp('2012-12-21 05:30:00')

In [31]:
dt2

Timestamp('2016-01-01 12:45:12')

Subtraction:

In [32]:
dt2 - dt1

Timedelta('1106 days 07:15:12')

### Negative Timedeltas
A negative amount of time is possible just like any negative number is.

In [33]:
dt1 - dt2

Timedelta('-1107 days +16:44:48')

### Math with Timedeltas
You can do many different math operations with two Timedeltas together.

In [34]:
td1 = pd.to_timedelta('05:23:10')
td2 = pd.to_timedelta('00:02:20')

In [35]:
td1 - td2

Timedelta('0 days 05:20:50')

In [36]:
td2 + 5 * td2

Timedelta('0 days 00:14:00')

Dividing two timedeltas will remove the units and return a number.

In [37]:
td1 / td2

138.5

## Creating Timedeltas in a DataFrame by subtracting two Datetime columns
The bikes dataset has two datetime columns, **`starttime`** and **`stoptime`**.

In [38]:
bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])
bikes.head()

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
0,7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
1,7524,Subscriber,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wells St & Walton St,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0,partlycloudy
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
3,12907,Subscriber,Male,2013-07-01 10:05:00,2013-07-01 10:16:00,667,Carpenter St & Huron St,41.894556,-87.653449,19.0,Clark St & Randolph St,41.884576,-87.63189,31.0,72.0,10.0,16.1,-9999.0,mostlycloudy
4,13168,Subscriber,Male,2013-07-01 11:16:00,2013-07-01 11:18:00,130,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,73.0,10.0,17.3,-9999.0,partlycloudy


Let's find the amount of time that elapsed between the start and stop times.

In [39]:
time_elapsed = bikes['stoptime'] - bikes['starttime']
time_elapsed.head()

0   00:16:00
1   00:10:00
2   00:18:00
3   00:11:00
4   00:02:00
dtype: timedelta64[ns]

Since both start and stop time are datetime columns, subtracting them resulted in a timedelta column. The maximum unit of time for timedelta is days.

# Exercises

## Problem 1
<span  style="color:green; font-size:16px">What day of the week was Jan 15, 1997?</span>

In [43]:
import pandas as pd
date = pd.to_datetime('Jan 15, 1997')

In [45]:
date.day_name()

'Wednesday'

## Problem 2
<span  style="color:green; font-size:16px">Was 1925 a leap year?</span>

In [49]:
dt = pd.to_datetime('Jan 15, 1925')
dt.is_leap_year #every 4 years february has 29 days

False

## Problem 3
<span  style="color:green; font-size:16px">What year will it be 1 million hours after the UNIX epoch?</span>

In [54]:
dt = pd.to_datetime(10 ** 6, unit='h') #10**6=1000000
dt

Timestamp('2084-01-29 16:00:00')

In [56]:
dt.year

2084

## Problem 4
<span  style="color:green; font-size:16px">Create the datetime July 20, 1969 at 2:56 a.m. and 15 seconds.</span>

In [57]:
df = pd.to_datetime('July 20, 1969 2:56:15 AM')
df

Timestamp('1969-07-20 02:56:15')

## Problem 5
<span  style="color:green; font-size:16px">Neil Armstrong stepped on the moon at the time in the last problem. How many days have passed since that happened? Use the string 'today' when creating your datetime.</span>

In [59]:
df2 = pd.to_datetime('today')
df2

Timestamp('2018-12-22 09:50:43.160199')

In [60]:
days_passed = df2-df
days_passed

Timedelta('18052 days 06:54:28.160199')

## Problem 6
<span  style="color:green; font-size:16px">Which is larger - 35 days or 700 hours?</span>

In [68]:
days = pd.to_timedelta(35, unit='d')
days #use time delta because is amount of time

Timedelta('35 days 00:00:00')

In [69]:
hours = pd.to_timedelta(700, unit='h')
hours

Timedelta('29 days 04:00:00')

## Problem 7
<span  style="color:green; font-size:16px">In a previous notebook, we were told that the employee data was retrieved on Dec 1, 2016. We used the simple calculation `2016 - emp['hire_date'].dt.year` to determine the years of experience. Can you improve upon this method to get the exact amount of years of experience and assign this as a new column named `experience`?</span>

In [72]:
2016 - emp['hire_date'].dt.year.head()

0     1
1    34
2    32
3     4
4     3
Name: hire_date, dtype: int64

In [79]:
day = pd.to_datetime('2016/1/1')
day

Timestamp('2016-01-01 00:00:00')

In [81]:
year = pd.to_timedelta(1, unit='y')
year

Timedelta('365 days 05:49:12')

In [86]:
emp['experience'] = (day - emp['hire_date'])/year
emp.head()
#add a new column experience in the dataset emp
#(day - hire_date)/365 to get the experience per year or would be per days

Unnamed: 0,title,dept,salary,race,gender,hire_date,experience
0,POLICE OFFICER,Houston Police Department-HPD,45279.0,White,Male,2015-02-03,0.908985
1,ENGINEER/OPERATOR,Houston Fire Department (HFD),63166.0,White,Male,1982-02-08,33.895289
2,SENIOR POLICE OFFICER,Houston Police Department-HPD,66614.0,Black,Male,1984-11-26,31.097148
3,ENGINEER,Public Works & Engineering-PWE,71680.0,Asian,Male,2012-03-26,3.76736
4,CARPENTER,Houston Airport System (HAS),42390.0,White,Male,2013-11-04,2.157471
