# Datetime Series Methods

In this chapter, we focus on methods that work for Series containing datetime data. Just like pandas has the `str` accessor to give us access to string-only methods, it also has the `dt` accessor to give us access to datetime-only methods. Let's read in the bikes dataset which has two datetime columns, `starttime` and `stoptime`.

In [1]:
import pandas as pd
bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])
bikes.head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
0,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,11.0,Michigan Ave & Oak St,15.0,73.9,12.7,mostlycloudy
1,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,31.0,Wells St & Walton St,19.0,69.1,6.9,partlycloudy
2,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,15.0,Dearborn St & Monroe St,23.0,73.0,16.1,mostlycloudy


## The `dt` accessor

This chapter focuses on the attributes and methods that are available with the `dt` accessor. [Visit the API][1] to view all of them.

[1]: https://pandas.pydata.org/pandas-docs/stable/reference/series.html#datetime-properties

### Only available for Series

The `dt` and `str` accessors are only available to Series and not DataFrames. You will have to select a single Series first in order to use them. Let's begin by selecting the `starttime` column as a Series.

In [2]:
start = bikes['starttime']

### Datetime attributes and methods are simpler than strings

Almost all the attributes and methods available to datetime Series are simple and straightforward. Let's begin by outputting the head of the Series so that we can visually verify the results of the attributes and methods.

In [3]:
start.head(3)

0   2013-06-28 19:01:00
1   2013-06-28 22:53:00
2   2013-06-30 14:43:00
Name: starttime, dtype: datetime64[ns]

## Datetime Attributes

Unlike the `str` accessor, many of the available objects to the `dt` accessor are attributes and not methods. These are not called, but simply accessed to return a new Series.

### Retrieving a part of the datetime

There are many attributes that return a particular part of the datetime such as `year`, `month`, `day`, `hour`, `minute`, `second`, etc... as integers. Let's see retrieve these components of the datetime as their own Series.

In [4]:
start.dt.year.head(3)

0    2013
1    2013
2    2013
Name: starttime, dtype: int32

In [5]:
start.dt.month.head(3)

0    6
1    6
2    6
Name: starttime, dtype: int32

In [6]:
start.dt.day.head(3)

0    28
1    28
2    30
Name: starttime, dtype: int32

In [7]:
start.dt.hour.head(3)

0    19
1    22
2    14
Name: starttime, dtype: int32

In [8]:
start.dt.minute.head(3)

0     1
1    53
2    43
Name: starttime, dtype: int32

In [9]:
start.dt.second.head(3)

0    0
1    0
2    0
Name: starttime, dtype: int32

We can also return the day of week as integers, where 0 corresponds to Monday and 6 to Sunday.

In [10]:
start.dt.dayofweek.head(3)

0    4
1    4
2    6
Name: starttime, dtype: int32

### Start or End?

There are several attributes that return boolean Series based on whether the datetime is the start or end of the month, quarter, or year.

In [14]:
start.head()

0   2013-06-28 19:01:00
1   2013-06-28 22:53:00
2   2013-06-30 14:43:00
3   2013-07-01 10:05:00
4   2013-07-01 11:16:00
Name: starttime, dtype: datetime64[ns]

In [11]:
start.dt.is_month_start.head()

0    False
1    False
2    False
3     True
4     True
Name: starttime, dtype: bool

In [12]:
start.dt.is_quarter_end.head()

0    False
1    False
2     True
3    False
4    False
Name: starttime, dtype: bool

In [13]:
start.dt.is_year_start.head()

0    False
1    False
2    False
3    False
4    False
Name: starttime, dtype: bool

## Datetime methods

There are only a few methods that are available to the `dt` accessor with the most useful being `ceil`, `round`, `floor`, `strftime`, and `to_period`. To use these methods, you need to be familiar with the [offset aliases][1], which are short strings, usually one character in length, that represent a unit of time. Below are a few of the offset aliases.

* `H` - hour
* `T` or `min` - minute
* `S` - second
* `D` - day
* `W` - week

[1]: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

### Use offset aliases with datetime methods

Let's output our datetime Series again, and then call some of these methods that require offset aliases.

In [15]:
start.head(3)

0   2013-06-28 19:01:00
1   2013-06-28 22:53:00
2   2013-06-30 14:43:00
Name: starttime, dtype: datetime64[ns]

### `ceil` rounds up to the nearest unit

Use the `ceil` method to round up to the nearest hour by using the offset alias 'H'.

In [18]:
start.dt.ceil('H').head(3)

  start.dt.ceil('H').head(3)


0   2013-06-28 20:00:00
1   2013-06-28 23:00:00
2   2013-06-30 15:00:00
Name: starttime, dtype: datetime64[ns]

Round up to the nearest day.

In [19]:
start.dt.ceil('D').head(3)

0   2013-06-29
1   2013-06-29
2   2013-07-01
Name: starttime, dtype: datetime64[ns]

### `floor` rounds down to the nearest unit

Use the `floor` method to round down to the nearest minute.

In [22]:
start.head()

0   2013-06-28 19:01:00
1   2013-06-28 22:53:00
2   2013-06-30 14:43:00
3   2013-07-01 10:05:00
4   2013-07-01 11:16:00
Name: starttime, dtype: datetime64[ns]

In [20]:
start.dt.floor('min').head(3)

0   2013-06-28 19:01:00
1   2013-06-28 22:53:00
2   2013-06-30 14:43:00
Name: starttime, dtype: datetime64[ns]

### `round` rounds to nearest whole unit

The `round` method uses typical rounding logic. Here, we round to the nearest hour.

In [24]:
start.dt.round('h').head(3)

0   2013-06-28 19:00:00
1   2013-06-28 23:00:00
2   2013-06-30 15:00:00
Name: starttime, dtype: datetime64[ns]

## Format time as a string with `strftime`

The `strftime` method stands for **string format time**. It converts each datetime value into a string object. You will use something called **string directives** to convert a part of a datetime to a string. For instance, the string directive '%A' converts to the weekday. Consult [Python's documentation][1] to view all of the string directives.

Below is an example using multiple string directives to form a complex string from a datetime. You can write any other string intertwined with the directives. 

By default, the maximum column width of a pandas DataFrame is 60 characters. The `set_option` function is used to increase this width so that the entire new string value is viewable in the output.

[1]: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

In [25]:
start

0       2013-06-28 19:01:00
1       2013-06-28 22:53:00
2       2013-06-30 14:43:00
3       2013-07-01 10:05:00
4       2013-07-01 11:16:00
                ...        
50084   2017-12-30 13:07:00
50085   2017-12-30 13:34:00
50086   2017-12-30 13:34:00
50087   2017-12-31 09:30:00
50088   2017-12-31 15:22:00
Name: starttime, Length: 50089, dtype: datetime64[ns]

In [26]:
pd.set_option('display.max_colwidth', 100)
start.dt.strftime('On %A, %B %d, %Y at %X something great happened').head(3)

0    On Friday, June 28, 2013 at 19:01:00 something great happened
1    On Friday, June 28, 2013 at 22:53:00 something great happened
2    On Sunday, June 30, 2013 at 14:43:00 something great happened
Name: starttime, dtype: object

## Convert to period

A period is a special data type unique to pandas (it don't exist in numpy) and represents an entire period of time such as the entire month of June, 2012, the entire year 1998, or the entire minute of June 11, 2011 12:34 p.m. This contrasts with datetimes, which represent a single moment in time with nanosecond precision. In pandas, datetimes always have precision down to nanoseconds. A period refers to some period of time.

### Use offset aliases to convert to a period

To convert to a datetime column to a period column, use the same offset aliases from above. Let's convert the start datetime column to a period column representing an entire month.

In [27]:
start

0       2013-06-28 19:01:00
1       2013-06-28 22:53:00
2       2013-06-30 14:43:00
3       2013-07-01 10:05:00
4       2013-07-01 11:16:00
                ...        
50084   2017-12-30 13:07:00
50085   2017-12-30 13:34:00
50086   2017-12-30 13:34:00
50087   2017-12-31 09:30:00
50088   2017-12-31 15:22:00
Name: starttime, Length: 50089, dtype: datetime64[ns]

In [28]:
per = start.dt.to_period('M').head()
per

0    2013-06
1    2013-06
2    2013-06
3    2013-07
4    2013-07
Name: starttime, dtype: period[M]

Let's verify that the data type of this Series is indeed a period.

In [29]:
per.dtype


period[M]

Let's see another example with a different offset alias converting the datetime to a time period of an hour.

In [31]:
start.dt.to_period('h').head(3)

0    2013-06-28 19:00
1    2013-06-28 22:00
2    2013-06-30 14:00
Name: starttime, dtype: period[h]

### Period Series also have a `dt` accessor

A Series with data type of period has its own special attributes and methods accessible with the `dt` accessor. They overlap substantially with the datetime `dt` attributes and methods. Currently the [official documentation only shows the period properties][1]. You can discover all of the attributes and methods and how to use them by placing a dot after the `dt` and pressing tab. Below, we get the start and end of the period. Note that pandas returns these values as datetimes and not periods.

[1]: https://pandas.pydata.org/pandas-docs/stable/reference/series.html#period-properties

In [34]:
per.dt.start_time

0   2013-06-01
1   2013-06-01
2   2013-06-01
3   2013-07-01
4   2013-07-01
Name: starttime, dtype: datetime64[ns]

In [35]:
per.dt.end_time

0   2013-06-30 23:59:59.999999999
1   2013-06-30 23:59:59.999999999
2   2013-06-30 23:59:59.999999999
3   2013-07-31 23:59:59.999999999
4   2013-07-31 23:59:59.999999999
Name: starttime, dtype: datetime64[ns]

## Timedeltas

Timedeltas are a separate data type that represent an amount of time such as 5 minutes and 34 seconds. The highest unit of a timedelta is days and they always have nanosecond precision. Timedeltas are also available in numpy. Timedelta Series have special attributes and methods accessible with the `dt` accessor as you can [find in the documentation][1].

### Creating a timedelta

One way to create a timedelta Series is to subtract two datetime Series from each other. Here, we select `stoptime` as a Series and subtract the `start` Series from it.

[1]: https://pandas.pydata.org/pandas-docs/stable/reference/series.html#timedelta-properties

In [37]:
start

0       2013-06-28 19:01:00
1       2013-06-28 22:53:00
2       2013-06-30 14:43:00
3       2013-07-01 10:05:00
4       2013-07-01 11:16:00
                ...        
50084   2017-12-30 13:07:00
50085   2017-12-30 13:34:00
50086   2017-12-30 13:34:00
50087   2017-12-31 09:30:00
50088   2017-12-31 15:22:00
Name: starttime, Length: 50089, dtype: datetime64[ns]

In [38]:
stop = bikes['stoptime']

stop

0       2013-06-28 19:17:00
1       2013-06-28 23:03:00
2       2013-06-30 15:01:00
3       2013-07-01 10:16:00
4       2013-07-01 11:18:00
                ...        
50084   2017-12-30 13:34:00
50085   2017-12-30 13:44:00
50086   2017-12-30 13:48:00
50087   2017-12-31 09:33:00
50088   2017-12-31 15:26:00
Name: stoptime, Length: 50089, dtype: datetime64[ns]

In [39]:
ride_length = stop - start
ride_length.head(3)

0   0 days 00:16:00
1   0 days 00:10:00
2   0 days 00:18:00
dtype: timedelta64[ns]

Again, a good way to discover and learn about the attributes and methods is by pressing tab after placing a dot after `dt`. Let's begin by converting each of the timedeltas into seconds.

In [40]:
ride_length.dt.seconds.head(3)

0     960
1     600
2    1080
dtype: int32

There are a few timedelta methods that take offset aliases. Numbers may be placed in front of the offset aliases to designate a more specific amount of time. Below, we round to the nearest 10 minutes.

In [41]:
ride_length.dt.round('10min').head(3)

0   0 days 00:20:00
1   0 days 00:10:00
2   0 days 00:20:00
dtype: timedelta64[ns]

## Exercises

Use the `start` Series for the following exercises.

### Exercise 1
<span  style="color:green; font-size:16px">What percentage of bike rides happen in January?</span>

In [55]:
(start.dt.month == 1).mean()

np.float64(0.027191598953862126)

### Exercise 2

<span style="color:green; font-size:16px">What percentage of bike rides happen on the weekend?</span>

In [59]:
(start.dt.weekday.isin([6,7]).mean())

np.float64(0.09564974345664717)

### Exercise 3

<span  style="color:green; font-size:16px">What percentage of bike rides happen on the last day of the month?</span>

In [57]:
start.dt.is_month_end.mean()

np.float64(0.031563816406795904)

### Exercise 4

<span  style="color:green; font-size:16px">We would expect that the value of the minutes recorded for each starting ride is approximately random. Can you show some data that confirms or rejects this?</span>

In [62]:
start.dt.minute.value_counts()

starttime
12    900
6     898
8     895
18    892
43    883
21    879
10    878
48    877
44    874
15    872
53    869
17    868
37    867
13    866
19    865
42    863
33    863
24    861
39    861
22    857
29    855
34    855
45    849
5     846
11    845
14    845
36    845
49    844
47    843
30    842
16    837
32    835
38    833
1     833
40    828
7     826
46    825
2     825
4     821
23    818
54    817
57    816
3     814
28    806
59    805
35    805
56    803
0     798
58    797
31    795
50    795
55    794
9     792
27    792
41    787
20    782
25    777
52    764
51    762
26    750
Name: count, dtype: int64

### Exercise 5

<span style="color:green; font-size:16px">Assign the length of the ride to `ride_length`. Then find the percentage of rides that lasted longer than 30 minutes.</span>

In [73]:
ride_length = stop-start

((ride_length.dt.total_seconds() / 60) > 30).mean()

np.float64(0.019625067380063487)