## Lab: Modeling Time Series Data

#### Scenario

As a junior data scientist at AirGo Analytics, your role is to build a forecasting model for airline passenger trends. In the previous lab, you explored the dataset, detected trends and seasonality, and applied transformations to prepare the data for modeling. Now, your goal is to develop an ARIMA (AutoRegressive Integrated Moving Average) model to predict airline passenger traffic for the next 12 months.

The forecasting process involves selecting ARIMA parameters using Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots, fitting the model, and evaluating its accuracy using error metrics such as MAE and RMSE. This reflects a common industry challenge in which businesses rely on time series forecasting to inform decisions about resource allocation, pricing strategies, and operational planning.



### Step 0: Load and Prepare the Dataset
* Import necessary libraries.
* Load 'AirPassengers.csv' and inspect the data.

In [None]:
# CodeGrade step0

# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Load dataset
data = pd.read_csv('AirPassengers.csv')

# Look at the data
data.head()

### Step 1: Convert the Data to a Time Series Format
* Convert the time column to datetime format.
* Set the Month column as the index.
* Rename value to `Passengers`.
* Set Month as the index and define its frequency as 'MS'.
 * To make sure this runs corrrectly, do this both in 'pd.data_range' and 'data.index.freq'.
* Verify by checking a specific value, viz. the first value using '.iloc'. (Round this to two decimal places.)

In [None]:
# CodeGrade step1

# Convert to DataFrame and set index
data['Month'] = None



### Step 2: Transform and Visualize the Data
* Apply log transformation to stabilize variance.
* Perform first-order differencing to remove trends.
* Display the dataset shape after transformation.

In [None]:
# CodeGrade step2

# Log Transformation
data['Log_Passengers'] = None

# Differencing (first order)


Plot ACF and PACF to determine ARIMA parameters.

In [None]:
# ACF and PACF plots


### Step 3: Fit an ARIMA Model
* Define and train an ARIMA (2,1,2) model based on ACF/PACF insights.
* Compute Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for model evaluation.
 * Return these valiues as 'aic, bic', where they are both rounded to 1 decimal places

In [None]:
# CodeGrade step3

# Define and fit ARIMA model
model = None
model_fit = None

# Print model summary
aic = None
bic = None



Plot Model Diagnostics to analyze residuals and model fit.

In [None]:
# Model Diagnostics


### Step 4: Forecast Future Passenger Counts
* Forecast the next 12 months and revert to the original scale.
* Define the future time index.
* Create the future indiex
* Return the shape of 'future_dates'

In [None]:
# CodeGrade step4

# Forecast next 12 months
forecast = None

# Convert back from log scale
forecast_exp = None

# Create future index
future_dates = None



Plot predictions vs actual data.

In [None]:
# Plot predictions


### Step 5: Evaluate the Modelâ€™s Performance
* Compute Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).
* When computing these two values also round them to four decimal places
* Return 'mae, rmse'

In [None]:
# CodeGrade step5

# Get in-sample predictions
predictions = None  # Get predictions for the original time series range
# Instead of dropping the first value, align the predictions with true_values:
true_values = None
# predictions = predictions[true_values.index]  # Align predictions with true_values index
predictions = None

# Calculate error metrics
mae = None
rmse = None

mae, rmse