# Datetime Series Methods

In this chapter, we focus on methods that work for Series containing datetime data. Just like pandas has the `str` accessor to give us access to string-only methods, it also has the `dt` accessor to give us access to datetime-only methods. Let's read in the bikes dataset which has two datetime columns, `starttime` and `stoptime`.

In [1]:
import pandas as pd
bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])
bikes.head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
0,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,11.0,Michigan Ave & Oak St,15.0,73.9,12.7,mostlycloudy
1,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,31.0,Wells St & Walton St,19.0,69.1,6.9,partlycloudy
2,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,15.0,Dearborn St & Monroe St,23.0,73.0,16.1,mostlycloudy


## The `dt` accessor

This chapter focuses on the attributes and methods that are available with the `dt` accessor. [Visit the API][1] to view all of them.

[1]: https://pandas.pydata.org/pandas-docs/stable/reference/series.html#datetime-properties

### Only available for Series

The `dt` and `str` accessors are only available to Series and not DataFrames. You will have to select a single Series first in order to use them. Let's begin by selecting the `starttime` column as a Series.

In [2]:
start = bikes['starttime']

### Datetime attributes and methods are simpler than strings

Almost all the attributes and methods available to datetime Series are simple and straightforward. Let's begin by outputting the head of the Series so that we can visually verify the results of the attributes and methods.

In [3]:
start.head(3)

0   2013-06-28 19:01:00
1   2013-06-28 22:53:00
2   2013-06-30 14:43:00
Name: starttime, dtype: datetime64[ns]

## Datetime Attributes

Unlike the `str` accessor, many of the available objects to the `dt` accessor are attributes and not methods. These are not called, but simply accessed to return a new Series.

### Retrieving a part of the datetime

There are many attributes that return a particular part of the datetime such as `year`, `month`, `day`, `hour`, `minute`, `second`, etc... as integers. Let's see retrieve these components of the datetime as their own Series.

In [4]:
start.dt.year.head()

0    2013
1    2013
2    2013
3    2013
4    2013
Name: starttime, dtype: int64

In [None]:
start.dt.month.head(3)

In [None]:
start.dt.day.head(3)

In [None]:
start.dt.hour.head(3)

In [None]:
start.dt.minute.head(3)

In [None]:
start.dt.second.head(3)

We can also return the day of week as integers, where 0 corresponds to Monday and 6 to Sunday.

In [22]:
start.dt.dayofweek.head(3)

0    4
1    4
2    6
Name: starttime, dtype: int64

### Start or End?

There are several attributes that return boolean Series based on whether the datetime is the start or end of the month, quarter, or year.

In [None]:
start.dt.is_month_start.head()

In [None]:
start.dt.is_quarter_end.head()

In [None]:
start.dt.is_year_start.head()

## Datetime methods

There are only a few methods that are available to the `dt` accessor with the most useful being `ceil`, `round`, `floor`, `strftime`, and `to_period`. To use these methods, you need to be familiar with the [offset aliases][1], which are short strings, usually one character in length, that represent a unit of time. Below are a few of the offset aliases.

* `H` - hour
* `T` or `min` - minute
* `S` - second
* `D` - day
* `W` - week

[1]: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

### Use offset aliases with datetime methods

Let's output our datetime Series again, and then call some of these methods that require offset aliases.

In [6]:
start.head(3)

0   2013-06-28 19:01:00
1   2013-06-28 22:53:00
2   2013-06-30 14:43:00
Name: starttime, dtype: datetime64[ns]

### `ceil` rounds up to the nearest unit

Use the `ceil` method to round up to the nearest hour by using the offset alias 'H'.

In [7]:
start.dt.ceil('H').head(3)

0   2013-06-28 20:00:00
1   2013-06-28 23:00:00
2   2013-06-30 15:00:00
Name: starttime, dtype: datetime64[ns]

Round up to the nearest day.

In [8]:
start.dt.ceil('D').head(3)

0   2013-06-29
1   2013-06-29
2   2013-07-01
Name: starttime, dtype: datetime64[ns]

### `floor` rounds down to the nearest unit

Use the `floor` method to round down to the nearest minute.

In [9]:
start.dt.floor('min').head(3)

0   2013-06-28 19:01:00
1   2013-06-28 22:53:00
2   2013-06-30 14:43:00
Name: starttime, dtype: datetime64[ns]

### `round` rounds to nearest whole unit

The `round` method uses typical rounding logic. Here, we round to the nearest hour.

In [10]:
start.dt.round('H').head(3)

0   2013-06-28 19:00:00
1   2013-06-28 23:00:00
2   2013-06-30 15:00:00
Name: starttime, dtype: datetime64[ns]

## Format time as a string with `strftime`

The `strftime` method stands for **string format time**. It converts each datetime value into a string object. You will use something called **string directives** to convert a part of a datetime to a string. For instance, the string directive '%A' converts to the weekday. Consult [Python's documentation][1] to view all of the string directives.

Below is an example using multiple string directives to form a complex string from a datetime. You can write any other string intertwined with the directives. 

By default, the maximum column width of a pandas DataFrame is 60 characters. The `set_option` function is used to increase this width so that the entire new string value is viewable in the output.

[1]: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

In [11]:
pd.set_option('display.max_colwidth', 100)
start.dt.strftime('On %A, %B %d, %Y at %X something great happened').head(3)

0    On Friday, June 28, 2013 at 19:01:00 something great happened
1    On Friday, June 28, 2013 at 22:53:00 something great happened
2    On Sunday, June 30, 2013 at 14:43:00 something great happened
Name: starttime, dtype: object

## Convert to period

A period is a special data type unique to pandas (it don't exist in numpy) and represents an entire period of time such as the entire month of June, 2012, the entire year 1998, or the entire minute of June 11, 2011 12:34 p.m. This contrasts with datetimes, which represent a single moment in time with nanosecond precision. In pandas, datetimes always have precision down to nanoseconds. A period refers to some period of time.

### Use offset aliases to convert to a period

To convert a datetime column to a period column, use the same offset aliases from above. Let's convert the start datetime column to a period column representing an entire month.

In [12]:
per = start.dt.to_period('M').head()
per

0    2013-06
1    2013-06
2    2013-06
3    2013-07
4    2013-07
Name: starttime, dtype: period[M]

Let's verify that the data type of this Series is indeed a period.

In [13]:
per.dtype

period[M]

Let's see another example with a different offset alias converting the datetime to a time period of an hour.

In [14]:
start.dt.to_period('h').head(3)

0    2013-06-28 19:00
1    2013-06-28 22:00
2    2013-06-30 14:00
Name: starttime, dtype: period[H]

### Period Series also have a `dt` accessor

A Series with data type of period has its own special attributes and methods accessible with the `dt` accessor. They overlap substantially with the datetime `dt` attributes and methods. Currently the [official documentation only shows the period properties][1]. You can discover all of the attributes and methods and how to use them by placing a dot after the `dt` and pressing tab. Below, we get the start and end of the period. Note that pandas returns these values as datetimes and not periods.

[1]: https://pandas.pydata.org/pandas-docs/stable/reference/series.html#period-properties

In [15]:
per.dt.start_time

0   2013-06-01
1   2013-06-01
2   2013-06-01
3   2013-07-01
4   2013-07-01
Name: starttime, dtype: datetime64[ns]

In [None]:
per.dt.end_time

## Timedeltas

Timedeltas are a separate data type that represent an amount of time such as 5 minutes and 34 seconds. The highest unit of a timedelta is days and they always have nanosecond precision. Timedeltas are also available in numpy. Timedelta Series have special attributes and methods accessible with the `dt` accessor as you can [find in the documentation][1].

### Creating a timedelta

One way to create a timedelta Series is to subtract two datetime Series from each other. Here, we select `stoptime` as a Series and subtract the `start` Series from it.

[1]: https://pandas.pydata.org/pandas-docs/stable/reference/series.html#timedelta-properties

In [16]:
stop = bikes['stoptime']
ride_length = stop - start
ride_length.head(3)

0   0 days 00:16:00
1   0 days 00:10:00
2   0 days 00:18:00
dtype: timedelta64[ns]

Again, a good way to discover and learn about the attributes and methods is by pressing tab after placing a dot after `dt`. Let's begin by converting each of the timedeltas into seconds.

In [17]:
ride_length.dt.seconds.head(3)

0     960
1     600
2    1080
dtype: int64

There are a few timedelta methods that take offset aliases. Numbers may be placed in front of the offset aliases to designate a more specific amount of time. Below, we round to the nearest 10 minutes.

In [18]:
ride_length.dt.round('10min').head(3)

0   0 days 00:20:00
1   0 days 00:10:00
2   0 days 00:20:00
dtype: timedelta64[ns]

## Exercises

Use the `start` Series for the following exercises.

In [19]:
start.head(3)

0   2013-06-28 19:01:00
1   2013-06-28 22:53:00
2   2013-06-30 14:43:00
Name: starttime, dtype: datetime64[ns]

### Exercise 1
<span  style="color:green; font-size:16px">What percentage of bike rides happen in January?</span>

In [20]:
mon = start.dt.month
mon.head()

0    6
1    6
2    6
3    7
4    7
Name: starttime, dtype: int64

In [21]:
filt = mon == 1

sum(mon[filt])/ len(mon)

0.027191598953862126

In [47]:
start.dt.month_name().value_counts(normalize = True).round(3)

August       0.137
July         0.132
September    0.130
June         0.121
October      0.111
May          0.086
November     0.071
April        0.064
December     0.045
March        0.044
February     0.030
January      0.027
Name: starttime, dtype: float64

### Exercise 2

<span style="color:green; font-size:16px">What percentage of bike rides happen on the weekend?</span>

In [31]:
dow = start.dt.dayofweek
filt = dow.isin([5,6])

len(dow[filt])/len(dow)

0.19692946555131866

In [48]:
start.dt.weekday.isin([5,6]).mean()

0.19692946555131866

In [49]:
(start.dt.weekday > 4).mean()

0.19692946555131866

### Exercise 3

<span  style="color:green; font-size:16px">What percentage of bike rides happen on the last day of the month?</span>

In [32]:
end = start.dt.is_month_end
sum(end)/ len(start)

0.031563816406795904

In [50]:
 start.dt.is_month_end.mean()

0.031563816406795904

### Exercise 4

<span  style="color:green; font-size:16px">We would expect that the value of the minutes recorded for each starting ride is approximately random. Can you show some data that confirms or rejects this?</span>

In [51]:
start.dt.minute.value_counts(normalize = True)

12    0.017968
6     0.017928
8     0.017868
18    0.017808
43    0.017629
21    0.017549
10    0.017529
48    0.017509
44    0.017449
15    0.017409
53    0.017349
17    0.017329
37    0.017309
13    0.017289
19    0.017269
33    0.017229
42    0.017229
39    0.017189
24    0.017189
22    0.017110
29    0.017070
34    0.017070
45    0.016950
5     0.016890
11    0.016870
14    0.016870
36    0.016870
49    0.016850
47    0.016830
30    0.016810
16    0.016710
32    0.016670
38    0.016630
1     0.016630
40    0.016531
7     0.016491
46    0.016471
2     0.016471
4     0.016391
23    0.016331
54    0.016311
57    0.016291
3     0.016251
28    0.016091
59    0.016071
35    0.016071
56    0.016031
0     0.015932
58    0.015912
31    0.015872
50    0.015872
55    0.015852
27    0.015812
9     0.015812
41    0.015712
20    0.015612
25    0.015512
52    0.015253
51    0.015213
26    0.014973
Name: starttime, dtype: float64

### Exercise 5

<span style="color:green; font-size:16px">Assign the length of the ride to `ride_length`. Then find the percentage of rides that lasted longer than 30 minutes.</span>

In [36]:
ride_length = (bikes['stoptime'] - bikes['starttime'])
ride_length.head(2)                                

0   0 days 00:16:00
1   0 days 00:10:00
dtype: timedelta64[ns]

In [40]:
length_sec = ride_length.dt.seconds
length_sec.head(3)

0     960
1     600
2    1080
dtype: int64

In [44]:
filt = length_sec > (60*30)
len(length_sec[filt])/ len(ride_length)

0.019625067380063487