<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Partial Autocorrelations demo

---

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Read-in-the-data" data-toc-modified-id="Read-in-the-data-1">Read in the data</a></span></li><li><span><a href="#Get-the-differenced-timeseries" data-toc-modified-id="Get-the-differenced-timeseries-2">Get the differenced timeseries</a></span></li><li><span><a href="#Prepared-lagged-time-series-as-predictors" data-toc-modified-id="Prepared-lagged-time-series-as-predictors-3">Prepared lagged time series as predictors</a></span></li><li><span><a href="#Partial-autocorrelations-with-statsmodels" data-toc-modified-id="Partial-autocorrelations-with-statsmodels-4">Partial autocorrelations with statsmodels</a></span></li><li><span><a href="#Fit-a-linear-regression-model-on-the-k-preceding-lags" data-toc-modified-id="Fit-a-linear-regression-model-on-the-k-preceding-lags-5">Fit a linear regression model on the k preceding lags</a></span></li></ul></div>

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
np.set_printoptions(precision=4)
sns.set(font_scale=1.5)
plt.style.use('fivethirtyeight')

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

## Read in the data

In [2]:
data = pd.read_csv(
    '../../../../resource-datasets/unemployment_timeseries/seasonally-adjusted-quarterly-us.csv')
data.columns = ['year_quarter', 'unemployment_rate']
data['unemployment_rate'] = data['unemployment_rate'].map(
    lambda x: float(str(x).replace('%', '')))
data.dropna(inplace=True)
data['date'] = pd.to_datetime(data.year_quarter).dt.to_period('Q')
data.set_index('date', inplace=True)
data.head()

Unnamed: 0_level_0,year_quarter,unemployment_rate
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1948Q1,1948Q1,3.733
1948Q2,1948Q2,3.667
1948Q3,1948Q3,3.767
1948Q4,1948Q4,3.833
1949Q1,1949Q1,4.667


## Get the differenced timeseries

In [3]:
data_diff = pd.DataFrame([data['unemployment_rate'].diff()]).T[1:]
data_diff.columns = ['rate_0']

## Prepared lagged time series as predictors

In [4]:
for i in range(1, 11):
    data_diff['rate_{}'.format(i)] = data_diff['rate_0'].shift(i)

data_diff.head(11)

Unnamed: 0_level_0,rate_0,rate_1,rate_2,rate_3,rate_4,rate_5,rate_6,rate_7,rate_8,rate_9,rate_10
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1948Q2,-0.066,,,,,,,,,,
1948Q3,0.1,-0.066,,,,,,,,,
1948Q4,0.066,0.1,-0.066,,,,,,,,
1949Q1,0.834,0.066,0.1,-0.066,,,,,,,
1949Q2,1.2,0.834,0.066,0.1,-0.066,,,,,,
1949Q3,0.833,1.2,0.834,0.066,0.1,-0.066,,,,,
1949Q4,0.267,0.833,1.2,0.834,0.066,0.1,-0.066,,,,
1950Q1,-0.567,0.267,0.833,1.2,0.834,0.066,0.1,-0.066,,,
1950Q2,-0.833,-0.567,0.267,0.833,1.2,0.834,0.066,0.1,-0.066,,
1950Q3,-0.934,-0.833,-0.567,0.267,0.833,1.2,0.834,0.066,0.1,-0.066,


## Partial autocorrelations with statsmodels

Note that to we choose the `method=ols` to facilitate reconstruction of the results.

In [5]:
from statsmodels.tsa.stattools import pacf

In [6]:
pacf(data_diff['rate_0'], method='ols')

array([ 1.    ,  0.6351, -0.2452, -0.1633, -0.1342,  0.0326,  0.019 ,
       -0.1204, -0.1834,  0.1661, -0.0338, -0.1538, -0.1916,  0.0938,
       -0.0784,  0.0175, -0.0164,  0.1053, -0.0234, -0.1009,  0.0534,
        0.0105, -0.0682, -0.081 , -0.0202,  0.0845,  0.023 , -0.1427,
       -0.0155,  0.0035, -0.0621, -0.0484, -0.0982,  0.1026, -0.0092,
        0.0337, -0.2229,  0.0236,  0.0046, -0.0201, -0.0753])

## Fit a linear regression model on the k preceding lags

> The partial autocorrelations are obtained by picking always the last of the regression coefficients.

In [7]:
from sklearn.linear_model import LinearRegression

In [8]:
partial_autocorrelations = []
for k in range(1, 11):
    y = data_diff['rate_0'][k:]
    cols = ['rate_{}'.format(i) for i in range(1, k+1)]
    X = data_diff[cols][k:]
    model = LinearRegression()
    model.fit(X, y)
    print(k)
    print(model.coef_)
    partial_autocorrelations.append(model.coef_[-1])

1
[0.6351]
2
[ 0.7899 -0.2452]
3
[ 0.7504 -0.1171 -0.1633]
4
[ 0.7304 -0.1398 -0.0574 -0.1342]
5
[ 0.7082 -0.1168 -0.0614 -0.1577  0.0326]
6
[ 0.7038 -0.1156 -0.0572 -0.1568  0.0183  0.019 ]
7
[ 0.707  -0.1093 -0.074  -0.1668  0.0046  0.1073 -0.1204]
8
[ 0.6803 -0.0836 -0.0587 -0.1884 -0.0234  0.0918  0.0161 -0.1834]
9
[ 0.7078 -0.0829 -0.0721 -0.1823  0.0118  0.0989  0.0374 -0.301   0.1661]
10
[ 0.7107 -0.1001 -0.0676 -0.1765  0.0207  0.0965  0.0253 -0.2985  0.191
 -0.0338]


In [9]:
# our calculation
partial_autocorrelations

[0.6351266023088293,
 -0.245240066142401,
 -0.16333638534997372,
 -0.1341583242181495,
 0.032597651320756725,
 0.018957456400937843,
 -0.1203579431477386,
 -0.18340219322328594,
 0.1661322955387021,
 -0.03382801531687649]

In [10]:
# compare to the statsmodels result
np.allclose(pacf(data_diff['rate_0'], method='ols', nlags=10)[1:],
            np.array(partial_autocorrelations))

True