# ARMA Models in StatsModels - Lab 

## Introduction

In this lesson, you'll fit an ARMA model using `statsmodels` to a real-world dataset. 


## Objectives

In this lab you will: 

- Decide the optimal parameters for an ARMA model by plotting ACF and PACF and interpreting them 
- Fit an ARMA model using StatsModels 

## Dataset

Run the cell below to import the dataset containing the historical running times for the men's 400m in the Olympic games.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import warnings
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter('ignore', ConvergenceWarning)

data = pd.read_csv('winning_400m.csv')
data['year'] = pd.to_datetime(data['year'].astype(str))
data.set_index('year', inplace=True)
data.index = data.index.to_period("Y")

In [None]:
# Preview the dataset
data.head()

Plot this time series data. 

In [None]:
data.plot(figsize=(8,6), linewidth=2, fontsize=12)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Winning times (in seconds)', fontsize=12);

If you plotted the time series correctly, you should notice that it is not stationary. So, difference the data to get a stationary time series. Make sure to remove the missing values.

In [None]:
# Difference the time series
data_diff = data.diff().dropna()
data_diff

Use `statsmodels` to plot the ACF and PACF of this differenced time series. 

In [None]:
# Plot the ACF
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf

fig, ax = plt.subplots(figsize=(8,5))
plot_acf(data_diff, ax=ax, lags=8);

In [None]:
# Plot the PACF
fig, ax = plt.subplots(figsize=(8,5))
plot_pacf(data_diff, ax=ax, lags=8);

Based on the ACF and PACF, fit an ARMA model with the right orders for AR and MA. Feel free to try different models and compare AIC and BIC values, as well as significance values for the parameter estimates. 

In [None]:
# Import ARIMA
#from statsmodels.tsa.arima.model import ARIMA
import statsmodels.api as sm

# Fit an ARMA(1,0) model
mod_arma = sm.tsa.arima.model.ARIMA(data_diff, order=(1,0,0))
res_arma = mod_arma.fit()

# Print out summary information on the fit
print(res_arma.summary())

In [None]:
# Fit an ARMA(2,1) model
mod_arma = sm.tsa.arima.model.ARIMA(data_diff, order=(2,0,1))
res_arma = mod_arma.fit()

# Print out summary information on the fit
print(res_arma.summary())

In [None]:
# Fit an ARMA(2,2) model
mod_arma = sm.tsa.arima.model.ARIMA(data_diff, order=(2,0,2))
res_arma = mod_arma.fit()

# Print out summary information on the fit
print(res_arma.summary())

## What is your final model? Why did you pick this model?

In [None]:

"""
ARMA(1,0), ARMA(2,2) and ARMA(2,1) all seem to have decent fits with significant parameters. 
Depending on whether you pick AIC or BIC as a model selection criterion, 
your result may vary. In this situation, you'd generally go for a model with fewer parameters, 
so ARMA(1,0) seems fine. Note that we have a relatively short time series, 
which can lead to a more difficult model selection process.
"""

## Summary 

Well done. In addition to manipulating and visualizing time series data, you now know how to create a stationary time series and fit ARMA models. 