One of things that you will have to do as a data scientist is work with time series information. Pandas has a few functions that will help with this.

In [1]:
import pandas as pd

In [2]:
time_df = pd.DataFrame(data = [['1/21/2001', 10], 
                               ['1/22/2001', 12], 
                               ['1/23/2001', 8], 
                               ['1/24/2001', 17],
                               ['1/25/2001', 9],
                               ['1/26/2001', 11]], 
                       columns = ['Date', 'Measurement'])

In [3]:
time_df

Unnamed: 0,Date,Measurement
0,1/21/2001,10
1,1/22/2001,12
2,1/23/2001,8
3,1/24/2001,17
4,1/25/2001,9
5,1/26/2001,11


In [4]:
time_df.dtypes

Date           object
Measurement     int64
dtype: object

In [5]:
time_df['Date'] = pd.to_datetime(time_df['Date'])

In [6]:
time_df.dtypes

Date           datetime64[ns]
Measurement             int64
dtype: object

In [7]:
time_df

Unnamed: 0,Date,Measurement
0,2001-01-21,10
1,2001-01-22,12
2,2001-01-23,8
3,2001-01-24,17
4,2001-01-25,9
5,2001-01-26,11


#### Nonstandard timeblocks

Let's look at a non-standard timeframe and how to convert it. 

In [8]:
time_df2 = pd.DataFrame(data = [['2001-01-21:1200', 10], 
                               ['2001-01-22:1400', 12], 
                               ['2001-01-23:1800', 8], 
                               ['2001-01-24:0900', 17],
                               ['2001-01-25:1100', 9],
                               ['2001-01-26:1500', 11]], 
                       columns = ['Date', 'Measurement'])

In [9]:
time_df2['Date'] = pd.to_datetime(time_df2['Date'])

ParserError: Unknown string format: 2001-01-21:1200

Pandas can't figure out how to convert the timestamp. I can add in a format parameter that then will instruct pandas on how to format the time correctly. These codes are on the python datetime library page:
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

`2001-01-01:1500`
%Y-%m-%d:%H%M

In [10]:
time_df2['Date'] = pd.to_datetime(time_df2['Date'], format='%Y-%m-%d:%H%M')

In [11]:
time_df2

Unnamed: 0,Date,Measurement
0,2001-01-21 12:00:00,10
1,2001-01-22 14:00:00,12
2,2001-01-23 18:00:00,8
3,2001-01-24 09:00:00,17
4,2001-01-25 11:00:00,9
5,2001-01-26 15:00:00,11


#### Getting different time/days from the timestamps

I want to separate out the day, the hour and the day of the week into different columns.

In [12]:
time_df2['Day'] = pd.DatetimeIndex(time_df2['Date']).day

In [13]:
time_df2['Hour'] = pd.DatetimeIndex(time_df2['Date']).hour

In [14]:
time_df2['Weekday'] = pd.DatetimeIndex(time_df2['Date']).weekday

In [15]:
time_df2

Unnamed: 0,Date,Measurement,Day,Hour,Weekday
0,2001-01-21 12:00:00,10,21,12,6
1,2001-01-22 14:00:00,12,22,14,0
2,2001-01-23 18:00:00,8,23,18,1
3,2001-01-24 09:00:00,17,24,9,2
4,2001-01-25 11:00:00,9,25,11,3
5,2001-01-26 15:00:00,11,26,15,4


#### Time differences

Now I want to make a column with the time differences between the two measurements.

In [16]:
time_df2['Time_delta'] = (time_df2['Date']-time_df2['Date'].shift())

In [17]:
time_df2

Unnamed: 0,Date,Measurement,Day,Hour,Weekday,Time_delta
0,2001-01-21 12:00:00,10,21,12,6,NaT
1,2001-01-22 14:00:00,12,22,14,0,1 days 02:00:00
2,2001-01-23 18:00:00,8,23,18,1,1 days 04:00:00
3,2001-01-24 09:00:00,17,24,9,2,0 days 15:00:00
4,2001-01-25 11:00:00,9,25,11,3,1 days 02:00:00
5,2001-01-26 15:00:00,11,26,15,4,1 days 04:00:00


Note: Nat is the NA for times (not a time)

There is a whole ton more to do but this will give you the basics and places to look for more info:

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html