# 4. Datetime Series Methods


# Methods for Series with Datetime data types
In this notebook we will focus on methods that work for Series that contain datetime data. Just like Pandas has the **`str`** accessor to give us access to string only methods, it also has the **`dt`** accessor to give us access to datetime only methods.

Let's read in the bikes dataset which has two datetime columns, **`starttime`**, and **`stoptime`**.

In [None]:
import pandas as pd

bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])
bikes.head()

## Pandas datetime columns are always nanosecond precision
Pandas forces all datetime columns to have nanosecond precision. It relies on NumPy's datetime64 data type as the foundation. NumPy does allow you to have different ranges of precision, microsecond or millisecond, for example, but pandas requires nanosecond precision. Pandas converts any other NumPy datatime to nanoseconds.

Let's take a look at the data types of each column with the **`dtypes`** attribute to verify that we do have two datetime columns.

In [None]:
bikes.dtypes

# The `dt` accessor
The primary focus of this notebook will be the methods that follow the **`dt`** accessor. [Visit the API][1] to view all the possible datetime attributes and methods that are available.

## Use the `read_html` to scrape its own API page and output the `dt` attributes and methods as a DataFrame

The `read_html` function attempts to turn every single HTML table found on the given URL into a Pandas DataFrame. It returns a list DataFrames. It takes an optional second parameter, a string that must be contained in the table.

The Pandas API page places all of the object attribute and methods within HTML tables. This makes it a great page to work with `read_html`. The function searches each table for the phrase `Series.dt.`. Four DataFrames are returned in a list. The first two contain the attributes and methods for the `dt` accessor.

[1]: http://pandas.pydata.org/pandas-docs/stable/api.html#datetimelike-properties

In [None]:
dfs = pd.read_html('http://pandas.pydata.org/pandas-docs/stable/api.html', 'Series[.]dt[.]')

dt_attr = dfs[0]
dt_attr.columns = ['Attributes', 'Description']

dt_methods = dfs[1]
dt_methods.columns = ['Methods', 'Description']

dt_attr

In [None]:
dt_methods

## You can also do this by embedding an iframe directly in the notebook
An iframe is an web page embedded inside of another web page.

In [None]:
from IPython.display import IFrame
IFrame('http://pandas.pydata.org/pandas-docs/stable/api.html#datetimelike-properties', 900, 400)

### Only available for Series
The **`dt`** accessor (and **`str`**) are only available to Series objects and not DataFrames. You will have to select a single Series first in order to use it. Let's select the **`starttime`** column as a Series.

In [None]:
start = bikes['starttime']

### Datetime attributes and methods are simpler than strings
Almost all the attributes and methods available for datetimes are simple and straightforward. Let's take a look at some of them. We will output the head of the Series so that we can visually verify the results of the attributes and methods.

In [None]:
start.head()

There are many attributes that return a particular part of the datetime such as **`year, month, day, hour, minute, second`**, etc...

In [None]:
start.dt.year.head()

In [None]:
start.dt.month.head()

In [None]:
start.dt.minute.head()

In [None]:
# monday is 0
start.dt.dayofweek.head()

In [None]:
# week of year
start.dt.week.head()

## Datetime methods
There are actually only a few methods that exist with the most useful being **`ceil`**, **`round`**, **`floor`**, **`strftime`**, and **`to_period`**. To use these methods you will need to be familiar with the [offset aliases][1], which are short strings, usually one character, that represent a unit of time.

* **`D`** - day
* **`H`** - hour
* **`T`** or **`min`** - minute
* **`S`** - second

[1]: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

### Scrape the offset aliases and output them in the notebook

In [None]:
dfs = pd.read_html('http://pandas.pydata.org/pandas-docs/stable/timeseries.html', 
                   match='business day frequency',
                   attrs={'class' :"colwidths-given docutils"})
offset_aliases = dfs[0]
offset_aliases

### Use offset aliases with datetime methods

In [None]:
start.head()

## `ceil` rounds up to nearest unit

Round up to nearest hour:

In [None]:
start.dt.ceil('H').head()

Round up to nearest day:

In [None]:
start.dt.ceil('D').head()

**`floor`** rounds down:

In [None]:
start.dt.floor('min').head()

**`round`** rounds normally to nearest whole unit.

In [None]:
start.dt.round('H').head()

# Exercises

### Problem 1
<span  style="color:green; font-size:16px">What percentage of bike rides happen in January?</span>

### Problem 2
<span  style="color:green; font-size:16px">What percentage of bike rides happen on the weekend?</span>

### Problem 3
<span  style="color:green; font-size:16px">What percentage of bike rides happen on the last day of the month?</span>

### Problem 4
<span  style="color:green; font-size:16px">We would expect that the value of the minutes recorded for each starting ride is approximately random. Can you show some data that confirms or rejects this?</span>

### Problem 5
<span  style="color:green; font-size:16px">Assign the length of the ride to `ride_length`. Then find the percentage of rides that lasted longer than 30 minutes.</span>

# Explore the `dt` accessor

# Extra
Some extra notes on the Period and Timedelta objects

## Format time as a string with `strftime`
The **`strftime`** stands for **str**ing **f**ormat **time**. It turns each datetime into a string object. You must consult [Python's documentation][1] to determine how you want your string to be formatted.

[1]: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

In [None]:
start.dt.strftime('%A, %B %d, %Y at %X').head()

## Convert to a Period object
Period objects are special data types unique to pandas and simply represent an entire period of time such as the entire month of June, 2012 or the entire year 1998, or the entire minute of June 11, 2011 12:34 p.m.

This contrasts with datetimes which represent a particular moment in time with nanosecond precision. Datetimes are always specific all the way down to a nanosecond.

### Use offset aliases to convert to a period
To convert to a period use the same [offset aliases](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) from above.

Let's do some conversions: First to a month.

In [None]:
start.dt.to_period('M').head()

Convert to a time span of an hour:

In [None]:
start.dt.to_period('h').head()

# Timedeltas
Timedeltas are a separate data type that represent an amount of time such as 5 minutes and 34 seconds. The highest unit of a timedelta is days. Timedelta Series can also use the **`dt`** accessor.

### Creating a Timedelta
To create a timedelta, subtract two datetime Series from each other. Here, we select the stop time as a Series and subtract the **`start`** Series from it.

In [None]:
stop = bikes['stoptime']
ride_length = stop - start
ride_length.head()

There are much fewer attributes and methods for timedeltas but they work the same way: