In this video we will take a closer look at how to work with date and time series data in pandas.

We will see how to:


*   Convert strings to datetime types for advanced datetime series operations

*   Select and filter datetime series data
*   Explore properties of series data






In [2]:
import pandas as pd
dataset = pd.DataFrame({'DOB': ['1976-06-01', '1980-09-23', '1984-03-30', '1991-12-31', '1994-10-2', '1973-11-11'], 
                        'Sex': ['F', 'M', 'F', 'M', 'M', 'F'], 
                        'State': ['CA', 'NY', 'OH', 'OR', 'TX', 'CA'], 
                        'Name': ['Jane', 'John', 'Cathy', 'Jo', 'Sam', 'Tai']})

This dataset contains 4 columns and 5 rows corresponding to 5 fictional people. One of the rows present in our dataset is DOB which contains the date of birth of the 5 people.

It is essential to check if the data present int eh DOB column is the right datatype.

In [3]:
dataset.dtypes

DOB      object
Sex      object
State    object
Name     object
dtype: object

As observed in the output, it is possible that the DOB column was set to the object or string datatype during creation. To change this to the datetime datatype we will use to_datetime() emthod and pass the DOB column to it.

In [4]:
dataset.DOB = pd.to_datetime(dataset.DOB)

In [5]:
dataset.dtypes

DOB      datetime64[ns]
Sex              object
State            object
Name             object
dtype: object

Before moving on to selecting and filtering the datetime series, we will need to make sure that the index is set for the DOB column. 

In [6]:
dataset.set_index('DOB', inplace=True)

After this our DOB column is ready to be explored. If we want to take a look at the dataset, we  can do so by using the codeword dataset.

In [7]:
dataset

Unnamed: 0_level_0,Sex,State,Name
DOB,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1976-06-01,F,CA,Jane
1980-09-23,M,NY,John
1984-03-30,F,OH,Cathy
1991-12-31,M,OR,Jo
1994-10-02,M,TX,Sam
1973-11-11,F,CA,Tai


Before we start with filtering we need to understand that there are 4 possible ways to filter the data present in the DOB column.



*   Records for a single year: To display the records for a single year.


*   Record for and after a particular year: To display all of the records for and after a  particular year.

*   Records up until a particular year: To display all records up to and including a particular year.
*   Records that exist in a range of years: To display the records for a given range of years.








In [8]:
# Records for a single year:
dataset['1980']

Unnamed: 0_level_0,Sex,State,Name
DOB,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1980-09-23,M,NY,John


In [14]:
# Record for and after a particular year: 
dataset['1980':]

Unnamed: 0_level_0,Sex,State,Name
DOB,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1980-09-23,M,NY,John
1984-03-30,F,OH,Cathy
1991-12-31,M,OR,Jo
1994-10-02,M,TX,Sam


In [20]:
# Records that exist in a range of years:
dataset['1980':'1984']

Unnamed: 0_level_0,Sex,State,Name
DOB,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1980-09-23,M,NY,John
1984-03-30,F,OH,Cathy


We can also use the time series properties to make the most efficient use of the datatime series data. The drawback in using this functionality is that the datetime filed needs to be a column, not a row.


This can be done by resetting the DOB to an index.

In [21]:
dataset.reset_index(inplace=True)

We would also need to get the corresponding day of the year for each value in the datetime column. This can be done by calling the dayofyear property.

In [22]:
dataset.DOB.dt.dayofyear

0    153
1    267
2     90
3    365
4    275
5    315
Name: DOB, dtype: int64

We can also display the day of the week by calling the weekday property.

In [24]:
dataset.DOB.dt.weekday

0    1
1    1
2    4
3    1
4    6
5    6
Name: DOB, dtype: int64