# Datetime and Timedelta

This chapter covers two distinct concepts, datetimes and timedeltas and how they are created and used in Pandas. A datetime represents a specific **moment** in time. A timedelta represents an **amount** of time.

## Date vs Time vs Datetime
There is a distinction that needs to be made between the terms **date**, **time**, and **datetime**. They all three mean different things.

* **date** - Only the month, day, and year. So 2016-01-01 would represent January 1, 2016 and be considered a **date**.
* **time** - Only the hours, minutes, seconds and parts of a second (milli/micro/nano). 5 hours, 45 minutes and 6.74234 seconds for example would be considered a **time**.
* **datetime** - A combination of the above two. Has both date (Year, Month, Day) and time (Hour, Minute, Second) components. January 1, 2016 at 5:45 p.m. would be an example of a **datetime**.

The Python standard library contains the [datetime module][1]. It is a popular and important module but will not be covered here since Pandas builds its own datetime and timedelta objects that are more powerful. However, there are some notes available on datetime standard library in the **extras** directory.

[1]: https://docs.python.org/3.5/library/datetime.html

### Datetimes in numpy

numpy has its own datetime data type called [datetime64][1]. It is more powerful and flexible than core Python's datetime module. We won't be working with it either.

### pandas Timestamp

pandas has its own datetime data type called a `Timestamp`, which adds more functionality to NumPy's `datetime64`.

[1]: https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html

### Creating a single Timestamp with the `to_datetime` function

The `to_datetime` function converts both strings and numbers to Timestamps as well as entire arrays or Series. It is intelligent and can detect a wide variety of strings. Each of the following create a single pandas Timestamp object.

In [None]:
import pandas as pd
import numpy as np

In [None]:
pd.to_datetime('2016/1/10')

In [None]:
pd.to_datetime('2016-1-10')

In [None]:
pd.to_datetime('Jan 3, 2019 20:45.56')

In [None]:
pd.to_datetime('January 3, 2019 20:45.56')

In [None]:
pd.to_datetime('2016-01-05T05:34:43.123456789')

### Epoch
The term epoch refers to the origin of a particular era. Like many other programming languages, Python uses January 1, 1970 (also known as the Unix epoch) as its epoch for keeping track of datetime. In Pandas, integers are used to represent the number of nanoseconds that have elapsed since the epoch.

### Converting numbers to Timestamps

You can pass numbers to the `to_datetime` function and it will convert it to a Timestamp. It assumes you are passing in the number of nanoseconds after the epoch. The following creates the datetime that is 100 nanoseconds after Jan 1, 1970.

In [None]:
pd.to_datetime(100)

### Specify unit

The default unit is nanoseconds, but you can specify a different one with the **`unit`** parameter.

In [None]:
# 100 seconds after the epoch
pd.to_datetime(100, unit='s')

In [None]:
# 20,000 days after the epoch
pd.to_datetime(20000, unit='d')

### Not a Series or a DataFrame

When using Pandas, you are almost always working with either a Series or a DataFrame (and occasionally an Index). The Pandas Timestamp is another type unique to Pandas, but you will rarely be working with it directly.

## Why is `to_datetime` returning a `Timestamp` object?

It must look a bit odd to see a Timestamp object being returned from the `to_datetime` function. The docstrings for `to_datetime` even write:

> Convert argument to datetime.

Technically, the object is definitely a pandas Timestamp object. We can verify this with the `type` function:

In [None]:
dt = pd.to_datetime(20000, unit='d')
type(dt)

### Datetime is common terminology in many languages

The term **datetime** is common in many programming languages and this is what the Pandas documentation is referring to. The technical name of the Pandas object is indeed Timestamp, but the common name for what it represents is a datetime.

### Timestamp and datetime refer to the same thing

The terms **Timestamp** and **datetime** refer to the exact same concept in pandas. Technically, each value is a Pandas `Timestamp` object but the term **datetime** is used to refer to it as well. Yes, that is extremely confusing, but hopefully now it is clear.

### Typical Timestamps in Pandas
Typically, you will encounter Timestamps within a column of a Pandas DataFrame as we do below. Note, that the data type is `datetime64`. This is confusing, but again, Timestamp and datetime are equivalent terms.

In [None]:
emp = pd.read_csv('../data/employee.csv', parse_dates=['hire_date'])
emp.dtypes

### Each individual value in the datetime columns is a Timestamp
If we extract the **`hire_date`** column as a Series and print out the first few rows, you will see that data type (at the bottom of the output) is still written with the word **datetime**.

In [None]:
hire_date = emp['hire_date']
hire_date.head()

If we select the first value in the Series, we get a Timestamp.

In [None]:
hire_date.loc[0]

## Timestamp attributes
These Timestamp objects have similar attributes and methods as the **`dt`** Series accessor in a previous notebook. Let's see some of these again.

In [None]:
ts = pd.to_datetime('Jan 3, 2019 20:45.56')

In [None]:
ts.day

In [None]:
ts.day_name()

In [None]:
ts.minute

## Timedelta - an amount of time
A timedelta is a specific amount of time such as 20 seconds, or 13 days 5 minutes and 10 seconds. Use the **`to_timedelta`** function to create a Timedelta object. It works analogously to the **`to_datetime`** function.

### Converting strings to a Timedelta with `to_timedelta`
A wide variety of strings are able to be converted to Timedeltas. [See the docs][1] for more info.

[1]: http://pandas.pydata.org/pandas-docs/stable/timedeltas.html#to-timedelta

In [None]:
pd.to_timedelta('5 days 03:12:45.123')

In [None]:
# 10 hours and 13 microseconds
pd.to_timedelta('10h 13ms')

### Converting numbers to Timedeltas with `to_timedelta`
As with **`to_datetime`**, passing a number to **`to_timedelta`** will be by default treated as the number of nanoseconds. Use the **`unit`** parameter to change the time unit.

In [None]:
# 123,000 nanoseconds
pd.to_timedelta(123000)

In [None]:
# 500 days
pd.to_timedelta(500, unit='d')

Since years is not a standard amount, the highest unit returned is in days. You can still use 'y' to represent years with the output converted to days.

In [None]:
# 23 years
pd.to_timedelta(23, unit='y')

In [None]:
# 10 hours
pd.to_timedelta(10, 'h')

### No name confusion with Timedelta
The Timedelta data type is unique to pandas just like the Timestamp object is. Pandas Timedelta is built upon NumPy's timedelta64 data type which is superior to pure Python's timedelta. Forunately, the Pandas developers used the name **Timedelta** for the data type which is the same as NumPy's. 

There is no name confusion here, unlike there is with **Timestamp/Datetime**.

In [None]:
td = pd.to_timedelta(3, 'y')
type(td)

## Timedelta attributes and methods
There are many attributes and methods available to Timedelta objects. Let's see some below:

In [None]:
td

In [None]:
td.days

In [None]:
td.seconds

In [None]:
td.components

## Creating Timedeltas by subtracting Datetimes
It is possible to create a Timedelta object by subtracting two Datetimes.

In [None]:
dt1 = pd.to_datetime('2012-12-21 5:30')
dt2 = pd.to_datetime('2016-1-1 12:45:12')

In [None]:
dt1

In [None]:
dt2

Subtraction:

In [None]:
dt2 - dt1

### Negative Timedeltas
A negative amount of time is possible just like any negative number is.

In [None]:
dt1 - dt2

### Math with Timedeltas
You can do many different math operations with two Timedeltas together.

In [None]:
td1 = pd.to_timedelta('05:23:10')
td2 = pd.to_timedelta('00:02:20')

In [None]:
td1 - td2

In [None]:
td2 + 5 * td2

Dividing two timedeltas will remove the units and return a number.

In [None]:
td1 / td2

### Creating Timedeltas in a DataFrame by subtracting two Datetime columns
The bikes dataset has two datetime columns, **`starttime`** and **`stoptime`**.

In [None]:
bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])
bikes.head()

Let's find the amount of time that elapsed between the start and stop times.

In [None]:
time_elapsed = bikes['stoptime'] - bikes['starttime']
time_elapsed.head()

Since both start and stop time are datetime columns, subtracting them resulted in a timedelta column. The maximum unit of time for timedelta is days.

## Exercises

### Exercise 1
<span  style="color:green; font-size:16px">What day of the week was Jan 15, 1997?</span>

### Exercise 2
<span  style="color:green; font-size:16px">Was 1925 a leap year?</span>

### Exercise 3
<span  style="color:green; font-size:16px">What year will it be 1 million hours after the UNIX epoch?</span>

### Exercise 4
<span  style="color:green; font-size:16px">Create the datetime July 20, 1969 at 2:56 a.m. and 15 seconds.</span>

### Exercise 5
<span  style="color:green; font-size:16px">Neil Armstrong stepped on the moon at the time in the last exercise. How many days have passed since that happened? Use the string 'today' when creating your datetime.</span>

### Exercise 6
<span  style="color:green; font-size:16px">Which is larger - 35 days or 700 hours?</span>

### Exercise 7
<span  style="color:green; font-size:16px">In a previous notebook, we were told that the employee data was retrieved on Dec 1, 2016. We used the simple calculation `2016 - emp['hire_date'].dt.year` to determine the years of experience. Can you improve upon this method to get the exact amount of years of experience and assign this as a new column named `experience`?</span>