# Datetime, Timedelta, and Period Objects

Before analyzing time series datasets in pandas, we must learn about datetime, timedelta, and period objects. While we did cover these objects in the chapters in the Data Types part, this current chapter provides comprehensive coverage so that you can use them during an actual data analysis.

## Definitions

* **Datetime** - A specific **moment** in time. Has components **year**, **month**, **day**, **hour**, **minute**, **second**, and **part of second**.
* **Timedelta** - An **amount** of time. Has components **day**, **hour**, **minute**, **second**, and **part of second**. It is independent to any specific moment in time.
* **Period** - A specific **span** of time. A time period with a start and end time. Example: the entire month of December, 2002 (December 1, 2002 at midnight to December 31 at 11:59:59.999999999).


## Date vs Time vs Datetime

Within the term **datetime**, we have two separate terms, **date**, and **time**, each of which mean something specific.

* **date** - Only the month, day, and year. 2016-01-05 would represent January 5, 2016
* **time** - Only the hours, minutes, seconds, and parts of a second (millisecond, microsecond, nanosecond, etc...). Fore example, 5 hours, 45 minutes and 6.74234 seconds
* **datetime** - A combination of a date and time. It has both the date (year, month, day) and the time (hour, minute, second, part of second) components. January 5, 2016 at 5:45 p.m and 6.742344 seconds would be an example of a **datetime**.

The Python standard library contains the [datetime module][1]. It is a popular and important module, but will not be covered here since pandas builds its own datetime and timedelta objects that are more powerful.

### Time vs Timedelta

Notice that we've introduced two similar terms, **time**, and **timedelta**. These two terms are essentially the same thing and represent an amount of time. A timedelta typically allows for the use of a day component in addition to hour, minute, second, and part of second. Since years and months are not standard amounts of time, they are not part of the timedelta definition.

[1]: https://docs.python.org/3/library/datetime.html

### Datetimes in numpy

In the Data Types part, we covered the numpy datetime data type. It is more powerful and flexible than the identically named object from the standard library's datetime module, but does not have the features of the pandas datetime object. This chapter only covers datetimes in pandas.

## Creating single datetime objects in pandas

Previously, we used the Series constructor to create a Series of datetimes. It's actually possible to create single datetime objects with the `to_datetime` function and the `Timestamp` constructor.

### Creating a single datetime with the `to_datetime` function

The `to_datetime` function can create a single scalar datetime with nanosecond precision. These scalars are analogous to single integers, floats, or strings. They are not part of an array, Series, or DataFrame. The `to_datetime` function is very flexible and can take a variety of different inputs. We'll explore most of these options, beginning with a string with the format `'YYYY-MM-DD'`.

In [None]:
import pandas as pd
d = pd.to_datetime('2020-01-05')
d

This is a new type of object. Let's formally return its type.

In [None]:
type(d)

### Why is a Timestamp object returned?

The type that pandas uses for individual datetimes is `Timestamp`. In general, the word 'timestamp' has the same meaning as datetime. If you look at the docstring for `to_datetime` it states the following:

> Convert argument to datetime.

It would have been nice if pandas had chosen the name `Datetime` for the type so that it could match the name of the data type and function. Since it did not, there is potential for confusion. Let's create a Series of datetimes to show that the data type is `'datetime64[ns]'`.

In [None]:
s = pd.Series(['2020-01-05', '2020-01-06'], dtype='datetime64[ns]')
s

When selecting a single value from this Series, a Timestamp object is returned. In the official documentation, both of the words 'timestamp' and 'datetime' are used interchangeably to refer to the same concept - an object with year, month, day, hour, minute, second, and part of second components.

In [None]:
s.loc[0]

### More string formats

Let's see more examples of strings with different formats that can be converted to datetimes. Here, we use a hyphen to separate the components but do not place the leading zero in front of the month and day. It's important to remember that `to_datetime` is a function and not a Series or DataFrame method. It must be accessed directly from `pd`. 

In [None]:
pd.to_datetime('2016-1-5')

The hour, minute, second, and part of second components were not explicitly given, so pandas sets them to 0. Let's slowly create more datetimes by adding one more component each time. Here, we add the hour.

In [None]:
pd.to_datetime('2020-1-5 15')

The hour and minute are separated by a colon.

In [None]:
pd.to_datetime('2020-1-5 15:39')

The minute and second are also separated by a colon.

In [None]:
pd.to_datetime('2020-1-5 15:39:55')

The part of second needs to be separated from the second by a decimal. Enough precision exists to contain nanoseconds, which are nine places after the decimal. The last two decimal places are truncated below.

In [None]:
pd.to_datetime('2020-1-5 15:39:55.12345678912')

Forward slashes can be used instead of hyphens to separate the date components. The hour, minute, and second components do not require any separator.

In [None]:
pd.to_datetime('2020/01/05 153955.123456789')

The date components also don't need a separator.

In [None]:
pd.to_datetime('20200105 153955.123456789')

You can also use the month name spelled out as a string, have an ending for the day, and use AM/PM to denote part of day.

In [None]:
pd.to_datetime('January 5th, 2020 03:39:55 PM')

### ISO 8601 Format

The [International Organization of Standards code 8601][1] describes a standard format for datetimes where the letter **T** is used to separate the date and time. There are several variations of the format, such as using hyphens to separate the year, month, and day components.

[1]: https://en.wikipedia.org/wiki/ISO_8601

In [None]:
pd.to_datetime('20200105T153955.123456789')

### Same results with the `pd.Timestamp` constructor

The `pd.Timestamp` constructor produces the exact same output as the `pd.to_datetime` function when passed a string. A single `Timestamp` will be produced. Here, we test the equality of one of the strings.

In [None]:
pd.to_datetime('2020/01/05 153955') == pd.Timestamp('2020/01/05 153955')

### Day first strings

All of the above strings had the full four character year first, e.g. `'2020-5-9'` for May 9th, 2020 . It's possible to provide month, then day, then year in the following format.

In [None]:
pd.to_datetime('5/9/2020')

It is customary in many countries to provide the day first. Below, we set the `dayfirst` parameter to `True` to create the date May 9, 2020. This is not possible with `pd.Timestamp` as its signature differs significantly from `pd.to_datetime`.

In [None]:
pd.to_datetime('5/9/2020', dayfirst=True)

### Custom datetime string specification

Occasionally, you might have a string that pandas does not know how to parse. Take the following uncommon string, which will produce an error when passed to `pd.to_datetime`

In [None]:
pd.to_datetime('The 5th of January, 2020 at 5:45 pm')

You may use the specific format codes that each refer to a specific component of a datetime within the string. Pass this format as a string to the `format` parameter.

In [None]:
pd.to_datetime('The 5th of January, 2020 at 5:45 pm', 
               format='The %dth of %B, %Y at %I:%M %p')

In order to use the `format` parameter, you must be aware of the format codes, also known as directives. A partial list of format codes is given in the table below. See the [official Python documentation][1] for full details.

<table>
    <thead>
        <tr><td>Format code</td> <td>Definition</td> <td>Examples</td></tr>
    </thead>
    <tbody>
        <tr> <td>%d</td> <td>zero-padded day of month</td> <td>- 01, 02, ... 30,31</td></tr>
        <tr> <td>%b/%B</td> <td>abbreviated/full month name</td> <td>Jan/January, Feb/February</td></tr>
        <tr> <td>%m</td> <td>zero-padded month number</td> <td>01, 02</td></tr>
        <tr> <td>%y/%Y</td> <td>two-digit/four-digit year</td> <td>05/2005, 10/2010</td></tr>
        <tr> <td>%H</td> <td>zero-padded 24 hour clock</td> <td>00, 01, 23</td></tr>
        <tr> <td>%I</td> <td>zero-padded 12 hour clock</td> <td>01, 02, 12</td></tr>
        <tr> <td>%M</td> <td>zero-padded minute</td> <td>00, 01, 59</td></tr>
        <tr> <td>%S</td> <td>zero-padded second</td> <td>00, 01, 59</td></tr>
        <tr> <td>%p</td> <td>AM or PM</td> <td>am, pm, AM, PM</td></tr>
    </tbody>
    </table>
    
[1]: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

### Epoch

The term epoch refers to the origin of a particular era. Like many other programming languages, Python uses January 1, 1970 (also known as the Unix epoch) as its epoch for keeping track of datetime. In pandas, integers are used to represent the number of nanoseconds that have elapsed since the epoch.

### Converting numbers to Timestamps

The `to_datetime` function also accepts numbers and converts them to Timestamps. By default, it uses nanoseconds as the units for the passed number. The following creates a datetime 100 nanoseconds after January 1, 1970.

In [None]:
pd.to_datetime(100)

### Specify unit

The default unit is nanoseconds, but you can specify a different one with the `unit` parameter. Use the characters 'd' (days), 'h' (hours), 'm' (minutes), 's' (seconds), 'ms' (milliseconds), 'us' (microseconds), and 'ns' (nanoseconds).  Here, we create a datetime 100 seconds after the epoch.

In [None]:
pd.to_datetime(100, unit='s')

Here, a datetime 20,000 days after the epoch is created.

In [None]:
pd.to_datetime(20_000, unit='d')

Again, the `pd.Timestamp` constructor works the same. A timestamp 5 million minutes after the epoch is created.

In [None]:
pd.Timestamp(5_000_000, unit='m')

## Timestamp attributes and methods

Timestamp objects have similar attributes and methods as the `dt` Series accessor. Let's create a Timestamp and retrieve see some of these attributes.

In [None]:
ts = pd.to_datetime('2020/10/05 153955.123456789')
ts

In [None]:
ts.year

In [None]:
ts.month

In [None]:
ts.second

In [None]:
ts.microsecond

In [None]:
ts.month_name()

In [None]:
ts.day_of_week

In [None]:
ts.day_name()

In [None]:
ts.day_of_year

In [None]:
ts.daysinmonth

In [None]:
ts.is_month_end

The offset aliases are used for the `round`, `ceil`, and `floor` methods. Here, we round to the nearest hour and day.

In [None]:
ts.round('H')

In [None]:
ts.round('D')

The `floor` and `ceil` method work identically as their Series counterparts.

In [None]:
ts.floor('H')

In [None]:
ts.ceil('H')

### Datetimes in DataFrames

It's more common to encounter datetimes in a DataFrame. Let's read in the City of Houston employee dataset converting the `hire_date` column to a datetime.

In [None]:
emp = pd.read_csv('../data/employee.csv', parse_dates=['hire_date'])
emp.dtypes

### Each individual value in the datetime columns is a Timestamp

If we extract the `hire_date` column as a Series and print out the first few rows, you will see that data type (at the bottom of the output) is still written with the word `datetime64[ns]`.

In [None]:
hire_date = emp['hire_date']
hire_date.head()

If we select the first value in the Series, we get a Timestamp.

In [None]:
hire_date.loc[0]

## Creating single timedelta objects in pandas

A timedelta is a specific amount of time such as 20 seconds, or 13 days 5 minutes and 10 seconds. Use the `to_timedelta` function or the `pd.Timedelta` constructor to create a Timedelta object. They work analogously to the `to_datetime` function and `pd.Timestamp` constructors. Thankfully, there is no name confusion as there is with datetime/timestamp as the function, constructor, and type all use the timedelta name. 

A wide variety of strings are able to be converted to Timedeltas, some of which will be showcased below. We begin by creating a timedelta of 5 hours and 45 minutes.

In [None]:
pd.to_timedelta('5:45:00')

Use the string `'days'` to set the days, the largest possible component for timedeltas.

In [None]:
pd.to_timedelta('5 days 03:12:45.123')

The `pd.Timedelta` constructor works with the exact same inputs.

In [None]:
pd.Timedelta('5 days 03:12:45.123')

### Converting numbers to Timedeltas

As with `to_datetime`, numbers passed to `to_timedelta` (or `pd.Timedelta`) will be by default treated as the number of nanoseconds. Use the `unit` parameter to change the time unit. We start by converting 123,000 nanoseconds to a timedelta.

In [None]:
pd.to_timedelta(123_000)

Here, we create a timedelta of exactly 500 days.

In [None]:
pd.to_timedelta(500, unit='d')

Over 700 hours converted to a timedelta.

In [None]:
pd.to_timedelta(705.87, unit='h')

Since years is not a standard amount, you'll get an error if you use it's unit abbreviation, 'y'. Month is also not a standard unit so you won't be able to use it either.

In [None]:
pd.to_timedelta(23, unit='y')

### No name confusion with Timedelta

Pandas Timedelta is built upon numpy's timedelta64 data type which is superior to the standard library's datetime module's timedelta. Fortunately, the pandas developers used the name timedelta for the data type which is the same as numpy's. There is no name confusion here, unlike there is with datetime/timestamp.

## Timedelta attributes and methods

There are many attributes and methods available to Timedelta objects. Let's see some below:

In [None]:
td = pd.to_timedelta(705.87, unit='h')
td

In [None]:
td.days

In [None]:
td.seconds

In [None]:
td.components

Get the total number of seconds.

In [None]:
td.total_seconds()

## Creating timedeltas by subtracting datetimes

It is possible to create timedeltas by subtracting two datetimes.

In [None]:
dt1 = pd.to_datetime('2012-12-21 5:30')
dt2 = pd.to_datetime('2016-1-1 12:45:12')
dt2 - dt1

### Negative Timedeltas

A negative timedelta is possible just like any negative number is.

In [None]:
dt1 - dt2

### Math with Timedeltas

You can do many different math operations with two timedeltas together. Two timedeltas are subtracted below.

In [None]:
td1 = pd.to_timedelta('05:23:10')
td2 = pd.to_timedelta('00:02:20')
td1 - td2

Multiplication by other integers and floats is possible.

In [None]:
td1 * 6.3

Dividing two timedeltas will remove the units and return a number.

In [None]:
td1 / td2

### Creating Timedeltas in a DataFrame by subtracting two Datetime columns

The bikes dataset has two datetime columns, `starttime` and `stoptime`.

In [None]:
bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])
bikes.head(2)

Let's find the amount of time that elapsed between the start and stop times.

In [None]:
time_elapsed = bikes['stoptime'] - bikes['starttime']
time_elapsed.head()

Since both start and stop time are datetime columns, subtracting them resulted in a timedelta column. The maximum unit of time for timedelta is days.

## Creating Period Objects in Pandas

A pandas Period is a span of time that has a start and end time. The span of time can be any length, from a single nanosecond to many years. The start and end time are datetimes. The `Period` constructor accepts many of the same strings that were used to create datetimes. Let's create a period for the entire month of December, 2020.

In [None]:
p = pd.Period('2010-12')
p

Every Period has a `start_time` and `end_time` that are datetimes, and are accessible as attributes.

In [None]:
p.start_time

In [None]:
p.end_time

Below we create a time period for the entire hour of 3 p.m. on December 25, 2010. The letter to the right of the date is the "frequency" and uses the same strings as the offset aliases. 

In [None]:
p = pd.Period('2010-12-25 15')
p

We verify the start and end datetimes.

In [None]:
p.start_time, p.end_time

It's possible to create an entire quarter of the year as a period. Here, we create the third quarter of 2010 (July 1, 2010 to September 30, 2010).

In [None]:
p = pd.Period('2010Q3')
p

## Creating multiple datetimes and timestamps

The `pd.to_datetime` and `pd.to_timedelta` functions allow you to convert multiple values into datetimes or timedeltas. However, the constructors `pd.Timestamp` and `pd.Timedelta` do not and only create scalar values. Below, we convert two strings to timestamps. Notice that a `DatetimeIndex` object is returned. We will see more of this object in the upcoming chapters.

In [None]:
pd.to_datetime(['2021-1-1', '2021-2-1'])

## Exercises

### Exercise 1

<span style="color:green; font-size:16px">What day of the week was Jan 15, 1997?</span>

### Exercise 2

<span style="color:green; font-size:16px">Was 1924 a leap year?</span>

### Exercise 3

<span style="color:green; font-size:16px">What year will it be 1 million hours after the UNIX epoch?</span>

### Exercise 4

<span style="color:green; font-size:16px">Create the datetime July 20, 1969 at 2:56 a.m. and 15 seconds.</span>

### Exercise 5

<span style="color:green; font-size:16px">Neil Armstrong stepped on the moon at the time in the last Exercise. How many days have passed since that happened? Use the string 'today' when creating your datetime.</span>

### Exercise 6

<span style="color:green; font-size:16px">Create the Timedelta 84 hours and 17 minutes with both `pd.Timedelta` and `pd.to_timedelta` and verify that they are equal.</span>

### Exercise 7

<span style="color:green; font-size:16px">Which is larger? 5,206 days or 123,000 hours?</span>

### Exercise 8

<span style="color:green; font-size:16px">Take a look at the `pd.Timestamp` docstring. Each component (year, month, day, etc...) is available as a parameter in the constructor. Use the parameters to create a time stamp that has a non-zero value for each component.</span>

### Exercise 9

<span style="color:green; font-size:16px">Convert the given string to a datetime.</span>

In [None]:
s = 'month=10 year=2021 day=19 hour=6 minute=23'

### Exercise 10

<span style="color:green; font-size:16px">How many seconds elapsed from Feb 23, 2018 at 5:45 pm until Dec 14, 2020 at 7:32 am</span>

### Exercise 11

<span style="color:green; font-size:16px">What day of the year is October 11 on a leap year?</span>

### Exercise 12

<span style="color:green; font-size:16px">What was the date and time 198 hours and 33 minutes past December 3, 2020 at 5:15 pm </span>

### Exercise 13

<span style="color:green; font-size:16px">It takes painter A 3 days 14 hours and 38 minutes to paint a house. Painter B takes 9 hours and 56 minutes to paint the same house. How many houses of the same size can painter B paint in the time it takes painter A to paint one.</span>

### Exercise 14

<span style="color:green; font-size:16px">The following string represents June 3rd, 2020. Convert it to the correct datetime.</span>

In [None]:
s = '3/6/2020'

### Exercise 15

<span style="color:green; font-size:16px">Create a Period object for the entire minute of 2:32 pm on October 11, 2020.</span>

### Exercise 16

<span style="color:green; font-size:16px">The City of Houston employee data was retrieved on June 1, 2019. Can you calculate the exact amount of years of experience and assign as a new column named `experience`?</span>

In [None]:
emp = pd.read_csv('../data/employee.csv', parse_dates=['hire_date'])