Readme:


We encourage you to explore more functionalities in 'Python for Data Analysis, 3E' by Wes McKinney, Chapter 11: 'Time Series'.</br>
Link: https://wesmckinney.com/book/time-series

In [1]:
import numpy as np
import pandas as pd
from datetime import datetime

<p>
pandas is generally oriented toward working with arrays of dates, whether used as an axis index or a column in a DataFrame.  </br>
The pandas.to_datetime method parses many different kinds of date representations. Standard date formats like ISO 8601 can be parsed quickly. </br>
Run below code and analyze the data type it returns. </br>
</p>


In [3]:
datestrs = ["2011-07-06 12:00:00", "2011-08-06 00:00:00"]
dt = pd.to_datetime(datestrs) 
dt # DatetimeIndex

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

<p>
Scalar values from a DatetimeIndex are pandas Timestamp objects.</br>
Now print the second item from the DatetimeIndex array and see what date type it is.</br></br>

Note: A pandas.Timestamp can be substituted most places where you would use a datetime object. The reverse is not true, however, because pandas.Timestamp can store nanosecond precision data, while datetime stores only up to microseconds. </br>
Additionally, pandas.Timestamp can store frequency information (if any) and understands how to do time zone conversions and other kinds of manipulation </br>
</p>


In [4]:
type(dt[1]) # pd Timestamp

pandas._libs.tslibs.timestamps.Timestamp

<p>
Now run below to display how the None value is parsed. What does 'NaT' mean? </br>
Then run isna() on the 'idx' and analyze the output.</br>
</p>


In [5]:
idx = pd.to_datetime(datestrs + [None])
print(idx)
print(idx.isna())

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)
[False False  True]


<p>
Create a Series with length 1000 populated by random numbers with date index starting from 2000-01-01. </br>
</p>


In [6]:
s = pd.Series(np.random.standard_normal(1000), index = pd.date_range('2000-01-01', periods = 1000))
s

2000-01-01    2.164814
2000-01-02    0.373463
2000-01-03    0.203024
2000-01-04    0.612835
2000-01-05   -0.416032
                ...   
2002-09-22   -0.078434
2002-09-23   -0.952530
2002-09-24   -0.392066
2002-09-25   -0.188376
2002-09-26    0.586746
Freq: D, Length: 1000, dtype: float64

<p>
Select data where index contains '2002'. </br>
</p>


In [7]:
s['2002'] # slicing also works - print(s['2002-01-01':'2002-12-31'])

2002-01-01    1.419023
2002-01-02    0.677372
2002-01-03   -0.702965
2002-01-04    0.162919
2002-01-05    0.603247
                ...   
2002-09-22   -0.078434
2002-09-23   -0.952530
2002-09-24   -0.392066
2002-09-25   -0.188376
2002-09-26    0.586746
Freq: D, Length: 269, dtype: float64

<p>
Remove data after 2001-01-01 and display the result</br>
</p>


In [8]:
s.truncate(after='2001-01-01')

2000-01-01    2.164814
2000-01-02    0.373463
2000-01-03    0.203024
2000-01-04    0.612835
2000-01-05   -0.416032
                ...   
2000-12-28   -1.003193
2000-12-29    0.464313
2000-12-30    1.457524
2000-12-31   -1.831429
2001-01-01    1.004287
Freq: D, Length: 367, dtype: float64

<p>
Create a DatetimeIndex with length 100 starting from 2000-01-01 with weekly frequency (Tuesdays). </br>
</p>


In [9]:
dates = pd.date_range('2000-01-01', periods=100, freq='W-TUE')
dates

DatetimeIndex(['2000-01-04', '2000-01-11', '2000-01-18', '2000-01-25',
               '2000-02-01', '2000-02-08', '2000-02-15', '2000-02-22',
               '2000-02-29', '2000-03-07', '2000-03-14', '2000-03-21',
               '2000-03-28', '2000-04-04', '2000-04-11', '2000-04-18',
               '2000-04-25', '2000-05-02', '2000-05-09', '2000-05-16',
               '2000-05-23', '2000-05-30', '2000-06-06', '2000-06-13',
               '2000-06-20', '2000-06-27', '2000-07-04', '2000-07-11',
               '2000-07-18', '2000-07-25', '2000-08-01', '2000-08-08',
               '2000-08-15', '2000-08-22', '2000-08-29', '2000-09-05',
               '2000-09-12', '2000-09-19', '2000-09-26', '2000-10-03',
               '2000-10-10', '2000-10-17', '2000-10-24', '2000-10-31',
               '2000-11-07', '2000-11-14', '2000-11-21', '2000-11-28',
               '2000-12-05', '2000-12-12', '2000-12-19', '2000-12-26',
               '2001-01-02', '2001-01-09', '2001-01-16', '2001-01-23',
      

<p>
Generating Date Ranges.
By default, pandas.date_range generates daily timestamps.
Create a DatetimeIndex in range from 2000-01-01 to 2000-12-01 with frequency 'business end of month'. </br>
</p>


In [None]:
dt_index = pd.date_range("2000-01-01", "2000-12-01", freq="BM") # can use 'BME' in newer versions of pandas
dt_index

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-29', '2000-10-31', '2000-11-30'],
              dtype='datetime64[ns]', freq='BM')

<p>
pandas.date_range by default preserves the time (if any) of the start or end timestamp.</br>
Run below code and see the start and end date values. </br>
</p>


In [12]:
pd.date_range("2012-05-02 12:56:31", periods=5)

DatetimeIndex(['2012-05-02 12:56:31', '2012-05-03 12:56:31',
               '2012-05-04 12:56:31', '2012-05-05 12:56:31',
               '2012-05-06 12:56:31'],
              dtype='datetime64[ns]', freq='D')

<p>
Sometimes you will have start or end dates with time information but want to generate a set of timestamps normalized to midnight as a convention. </br>
To do this, there is a normalize option - run below and analyze the output. </br>
</p>


In [13]:
pd.date_range("2012-05-02 12:56:31", periods=5, normalize=True)

DatetimeIndex(['2012-05-02', '2012-05-03', '2012-05-04', '2012-05-05',
               '2012-05-06'],
              dtype='datetime64[ns]', freq='D')

<p>
Frequencies and Date Offsets. </br>
1. Create a DatetimeIndex in range from 2000-01-01 to 2000-01-03 23:59 with frequency '6 hours'. </br>
2. Then change the frequency to '2 hours and 30 minutes'. </br>

</p>


In [None]:
print(pd.date_range("2000-01-01", "2000-01-03 23:59", freq="6H"))

print(pd.date_range("2000-01-01", "2000-01-03 23:59", freq="2h30min"))

<p>
Now create a DatetimeIndex in range from 2012-01-01 to 2012-09-01 and get fourth Wednesday of each month. </br>
</p>


In [14]:
pd.date_range("2012-01-01", "2012-09-01", freq="WOM-4WED")

DatetimeIndex(['2012-01-25', '2012-02-22', '2012-03-28', '2012-04-25',
               '2012-05-23', '2012-06-27', '2012-07-25', '2012-08-22'],
              dtype='datetime64[ns]', freq='WOM-4WED')

<p>
Periods and Period Arithmetic. </br>
1. Create a pandas.Period that represents the full time span from January 1, 2011, to December 31, 2011, inclusive.</br>
2. Then add 5 to it and analyze the result.</br>
</p>


In [18]:
p = pd.Period("2011", freq="A-DEC") # use 'Y-DEC' in later versions of pandas
print(p) # In this case, the Period object represents the full time span from January 1, 2011, to December 31, 2011, inclusive.
print(p + 5)

2011
2016


<p>
Compare period and date range. </br>
Run below code and analyze the result.
</p>


In [None]:
periods = pd.period_range("2000-01-01", "2000-06-30", freq="M") # for PeriodIndex use 'M'
print(periods) # PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]')

dt = pd.date_range("2000-01-01", "2000-06-30", freq="M") # For DatetimeIndex use 'ME' in later versions of pandas
print(dt) # DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30', '2000-05-31', '2000-06-30'], dtype='datetime64[ns]', freq='ME')

PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]')
DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30',
               '2000-05-31', '2000-06-30'],
              dtype='datetime64[ns]', freq='M')


<p>
Run below and analyze the output. </br>
Think of how could you use these functionalities in real life. </br>

</p>


In [22]:
values = ["2001Q3", "2002Q2", "2003Q1"]
index = pd.PeriodIndex(values, freq="Q-DEC") 
index

PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='period[Q-DEC]')

<p>
Condition </br>
</p>


<p>
Condition </br>
</p>


<p>
Condition </br>
</p>


<p>
Condition </br>
</p>


<p>
Condition </br>
</p>


<p>
Condition </br>
</p>


<p>
Condition </br>
</p>


<p>
Condition </br>
</p>
