<style>
    @import url('https://fonts.googleapis.com/css2?family=Montserrat:wght@400;500;600&display=swap');

    .fadeIn {
        animation-name: fadeIn;
        animation-timing-function: ease-in;
        animation-duration: 1.5s;
    }

    @keyframes fadeIn {
        0% {opacity:0;}
        100% {opacity:1;}
    }
</style>

<div style="background-color: #f9fcfe; padding: 60px; border-radius: 30px; box-shadow: 0 10px 40px rgba(0,0,0,0.12); font-family: 'Montserrat', sans-serif; line-height: 1.8; border: 5px solid #2c3e50;">

<h3 class="fadeIn" style="text-align: center; color: #2c3e50; font-size: 2.2em; border-bottom: 4px solid #2980b9; padding-bottom: 15px; margin-bottom: 30px; font-weight: 600;">📈 Statistical Methods (Time Series)</h3>

<section style="background-color: #e6efff; padding: 30px; border-radius: 25px; margin-bottom: 25px; box-shadow: 0 5px 20px rgba(0,0,0,0.08);">
    <h4 class="fadeIn" style="color: #c0392b; font-size: 1.7em; margin-top: 0; border-bottom: 3px solid #c0392b; padding-bottom: 12px; font-weight: 500;">AR (Auto-Regressive Model)</h4>
    <p>The AR model assumes that a time series is dependent on its own past values.</p>
    <p>The model is represented as:</p>
    <p>\[ X_t = c + \phi_1 X_{t-1} + \phi_2 X_{t-2} + \dots + \phi_p X_{t-p} + \varepsilon_t \]</p>
    <p>Here, \( \phi \) represents the auto-regressive coefficients and \( p \) represents the number of lags.</p>
</section>

<section style="background-color: #e6efff; padding: 30px; border-radius: 25px; margin-bottom: 25px; box-shadow: 0 5px 20px rgba(0,0,0,0.08);">
    <h4 class="fadeIn" style="color: #e67e22; font-size: 1.7em; margin-top: 0; border-bottom: 3px solid #d35400; padding-bottom: 12px; font-weight: 500;">MA (Moving Average Model)</h4>
    <p>The MA model assumes that a time series is dependent on past white noise terms (i.e., random shocks).</p>
    <p>The model is represented as:</p>
    <p>\[ X_t = \mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \dots + \theta_q \varepsilon_{t-q} \]</p>
    <p>Here, \( \theta \) represents the moving average coefficients and \( q \) represents the number of lags.</p>
</section>

<section style="background-color: #e6efff; padding: 30px; border-radius: 25px; margin-bottom: 25px; box-shadow: 0 5px 20px rgba(0,0,0,0.08);">
    <h4 class="fadeIn" style="color: #8e44ad; font-size: 1.7em; margin-top: 0; border-bottom: 3px solid #8e44ad; padding-bottom: 12px; font-weight: 500;">ARMA (Auto-Regressive Moving Average Model)</h4>
    <p>It is a combination of the AR and MA models.</p>
    <p>The model is represented as:</p>
    <p>\[ X_t = c + \phi_1 X_{t-1} + \dots + \phi_p X_{t-p} + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \dots + \theta_q \varepsilon_{t-q} \]</p>
</section>

<section style="background-color: #e6efff; padding: 30px; border-radius: 25px; margin-bottom: 25px; box-shadow: 0 5px 20px rgba(0,0,0,0.08);">
    <h4 class="fadeIn" style="color: #27ae60; font-size: 1.7em; margin-top: 0; border-bottom: 3px solid #27ae60; padding-bottom: 12px; font-weight: 500;">ARIMA (Auto-Regressive Integrated Moving Average Model)</h4>
    <p>It is a general version of the ARMA model. The difference is that ARIMA incorporates an integrating (differencing) step. This is required to make the time series stationary.</p>
    <p>It is denoted as ARIMA(p, d, q) where:</p>
    <ul>
        <li>\( p \) = number of lags for the AR model</li>
        <li>\( d \) = number of differencing steps (degree of integration)</li>
        <li>\( q \) = number of lags for the MA model</li>
    </ul>
</section>

<section style="background-color: #e6efff; padding: 30px; border-radius: 25px; box-shadow: 0 5px 20px rgba(0,0,0,0.08);">
    <h4 class="fadeIn" style="color: #2980b9; font-size: 1.7em; margin-top: 0; border-bottom: 3px solid #2980b9; padding-bottom: 12px; font-weight: 500;">SARIMA (Seasonal Auto-Regressive Integrated Moving Average Model)</h4>
    <p>It is an adapted version of the ARIMA model for data that exhibits seasonality.</p>
    <p>It is denoted as SARIMA(p, d, q)(P, D, Q)[s] where:</p>
    <ul>
        <li>\( p, d, q \) = parameters for ARIMA</li>
        <li>\( P, D, Q \) = parameters for the seasonal components</li>
        <li>\( s \) = seasonal period (e.g., 12 for monthly data)</li>
    </ul>
</section>

</div>

<div style="background-color: #f4f8fa; padding: 30px; margin-bottom: 30px; border-radius: 10px; box-shadow: 0px 4px 15px rgba(0, 0, 0, 0.1);">
    <table style="width: 100%; border-collapse: collapse; font-size: 18px; font-family: 'Arial', sans-serif;">
        <thead>
            <tr style="background-color: #2c3e50; color: #ecf0f1; text-align: left;">
                <th style="padding: 15px 20px;">Feature 🔍</th>
                <th style="padding: 15px 20px;">Models 📈</th>
            </tr>
        </thead>
        <tbody>
            <tr style="background-color: #e7eef6;">
                <td style="padding: 15px 20px; border-bottom: 2px solid #d5dce4;">Stationary</td>
                <td style="padding: 15px 20px; border-bottom: 2px solid #d5dce4;">SES, AR, MA, ARMA</td>
            </tr>
            <tr style="background-color: #f2f6fc;">
                <td style="padding: 15px 20px; border-bottom: 2px solid #d5dce4;">Trend</td>
                <td style="padding: 15px 20px; border-bottom: 2px solid #d5dce4;">DES, ARIMA, SARIMA</td>
            </tr>
            <tr style="background-color: #e7eef6;">
                <td style="padding: 15px 20px;">Trend + Seasonality</td>
                <td style="padding: 15px 20px;">TES, SARIMA</td>
            </tr>
        </tbody>
    </table>
</div>

<style>
    @import url('https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;700&display=swap');

    .content-container {
        font-family: 'Roboto', sans-serif;
        background-color: #f4f4f9;
        padding: 60px;
        border-radius: 30px;
        box-shadow: 0 0 40px rgba(0,0,0,0.12);
        transition: transform 0.3s ease, box-shadow 0.3s ease;
    }

    .content-container:hover {
        transform: translateY(-5px);
        box-shadow: 0 10px 40px rgba(0,0,0,0.15);
    }

    h2 {
        text-align: center;
        color: #34495e;
        font-size: 2.5em;
        border-bottom: 4px solid #3498db;
        padding-bottom: 20px;
        margin-bottom: 35px;
    }

    span.highlight {
        background-color: #3498db;
        color: #fff;
        padding: 7px 20px;
        border-radius: 7px;
        box-shadow: 0 0 10px rgba(0,0,0,0.1);
    }

    h3 {
        color: #e67e22;
        font-size: 1.9em;
        margin-top: 25px;
        text-align:center;
    }

    p {
        font-size: 1.4em;
        color: #555;
        margin-top: 35px;
        text-align: justify;
        line-height: 1.7;
    }

    b {
        color: #2c3e50;
    }
</style>

<div class="content-container">
    <h2>🌍 <span class="highlight">Mauna Loa Observatory</span> in Hawaii, USA: Continuous Atmospheric CO2 from Air Samples</h2>
    <h3>📅 Recording Period: March 1958 - December 2001</h3>
    <p>This <b>valuable dataset</b> from the Mauna Loa Observatory illustrates how atmospheric CO2 has changed in Hawaii. In this study, we will analyze this dataset for the specified period using <b>Statistical Methods</b>. Our goal is to predict air pollution in Hawaii more accurately with these methods.</p>
</div>


<a id='Adjusting-Row-Column-Settings'></a>
# <b><div style='padding:15px;background-color:#004080;color:white;border-radius:15px;font-size:100%;text-align: center'>1 | Import Required Libraries</div></b>

In [None]:
import itertools
import warnings
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.metrics import mean_absolute_error
from statsmodels.tsa.arima.model import ARIMA
# from statsmodels.tsa.holtwinters import ExponentialSmoothing
# from statsmodels.tsa.holtwinters import SimpleExpSmoothing
from statsmodels.tsa.seasonal import seasonal_decompose
import statsmodels.tsa.api as smt
from statsmodels.tsa.statespace.sarimax import SARIMAX
warnings.filterwarnings('ignore')

<a id='Adjusting-Row-Column-Settings'></a>
# <b><div style='padding:15px;background-color:#004080;color:white;border-radius:15px;font-size:100%;text-align: center'>2 | Adjusting Row Column Settings</div></b>

In [None]:
pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

<a id='Adjusting-Row-Column-Settings'></a>
# <b><div style='padding:15px;background-color:#004080;color:white;border-radius:15px;font-size:100%;text-align: center'>3 | Loading Data Set</div></b>

In [None]:
data = sm.datasets.co2.load_pandas()

In [None]:
y = data.data

In [None]:
y = y['co2'].resample('MS').mean()

In [None]:
y.head(20)

<a id='Adjusting-Row-Column-Settings'></a>
# <b><div style='padding:15px;background-color:#004080;color:white;border-radius:15px;font-size:100%;text-align: center'>4 | Exploratory Data Analysis</div></b>

In [None]:
y.isnull().sum()

In [None]:
# We filled the missing observations in the series with the next observation.

y = y.fillna(y.bfill())

In [None]:
y.isnull().sum()

In [None]:
# We visualised the series

y.plot(figsize=(15, 6))
plt.show(block=True)

<a id='Adjusting-Row-Column-Settings'></a>
# <b><div style='padding:15px;background-color:#004080;color:white;border-radius:15px;font-size:100%;text-align: center'>5 | Holdout</div></b>

<style>
    @import url('https://fonts.googleapis.com/css2?family=Poppins:wght@300;400;600&display=swap');

    .insight-container {
        font-family: 'Poppins', sans-serif;
        background-color: #e8eaf6;  /* Light blue background */
        padding: 50px;
        border-radius: 30px;
        box-shadow: 0 10px 25px rgba(0,0,0,0.08);
        transition: transform 0.2s ease, box-shadow 0.2s ease;
    }

    .insight-container:hover {
        transform: translateY(-5px);
        box-shadow: 0 15px 30px rgba(0,0,0,0.12);
    }

    h3 {
        color: #5c6bc0;  /* Blue */
        font-size: 2em;
        margin-bottom: 25px;
        border-bottom: 3px solid #7986cb;  /* Light blue */
        display: inline-block;
        padding-bottom: 10px;
    }

    p {
        font-size: 1.3em;
        color: #3f51b5;  /* Dark blue */
        text-align: justify;
        line-height: 1.7;
        border-left: 4px solid #5c6bc0;  /* Blue */
        padding-left: 20px;
    }
</style>

<div class="insight-container">
    <h3>Time Series Analysis Insight</h3>
    <p>In time series problems, it might be perceived that model training, testing, and hyperparameter optimization are often conducted on the same dataset. However, this approach may not accurately evaluate the model's generalization capability. In best practices, the data is divided into training and test sets. The model is trained on the training set and its performance is evaluated on the test set. Hyperparameter optimization is typically performed on the training set or another set called validation...
</div>

In [None]:
y

In [None]:
train = y[:'1997-12-01']

In [None]:
len(train)

In [None]:
test = y['1998-01-01':]

In [None]:
len(test)

<a id='Adjusting-Row-Column-Settings'></a>
# <b><div style='padding:15px;background-color:#004080;color:white;border-radius:15px;font-size:100%;text-align: center'>6 | ARIMA(p, d, q): (Autoregressive Integrated Moving Average)</div></b>

In [None]:
arima_model = ARIMA(train, order=(1, 1, 1)).fit()

In [None]:
arima_model.summary()

In [None]:
y_pred = arima_model.forecast(48)[0]

In [None]:
y_pred

In [None]:
y_pred = pd.Series(y_pred, index=test.index)

In [None]:
y_pred

In [None]:
def plot_co2(train, test, y_pred, title):
    mae = mean_absolute_error(test, y_pred)
    train["1985":].plot(legend=True, label="TRAIN", title=f"{title}, MAE: {round(mae,2)}")
    test.plot(legend=True, label="TEST", figsize=(6, 4))
    y_pred.plot(legend=True, label="PREDICTION")
    plt.show(block=True)

In [None]:
plot_co2(train, test, y_pred, "ARIMA")

# Hyperparameter Optimization (Determining Model Degrees)

# Determining Model Degree Based on AIC & BIC Statistics

In [None]:
p = d = q = range(0, 4)

In [None]:
pdq = list(itertools.product(p, d, q))

In [None]:
def arima_optimizer_aic(train, orders):
    best_aic, best_params = float("inf"), None
    for order in orders:
        try:
            arima_model_result = ARIMA(train, order).fit(disp=0)
            aic = arima_model_result.aic
            if aic < best_aic:
                best_aic, best_params = aic, order
            print('ARIMA%s AIC=%.2f' % (order, aic))
        except:
            continue
    print('Best ARIMA%s AIC=%.2f' % (best_params, best_aic))
    return best_params

In [None]:
best_params_aic = arima_optimizer_aic(train, pdq)

# Final Model

In [None]:
arima_model = ARIMA(train, best_params_aic).fit()

In [None]:
y_pred = arima_model.forecast(48)[0]

In [None]:
y_pred

In [None]:
y_pred = pd.Series(y_pred, index=test.index)

In [None]:
y_pred

In [None]:
plot_co2(train, test, y_pred, "ARIMA")

<a id='Adjusting-Row-Column-Settings'></a>
# <b><div style='padding:15px;background-color:#004080;color:white;border-radius:15px;font-size:100%;text-align: center'>7 | SARIMA(p, d, q): (Seasonal Autoregressive Integrated Moving-Average)</div></b>

In [None]:
model = SARIMAX(train, order=(1, 0, 1), seasonal_order=(0, 0, 0, 12))

In [None]:
sarima_model = model.fit(disp=0)

In [None]:
y_pred_test = sarima_model.get_forecast(steps=48)

In [None]:
y_pred_test

In [None]:
y_pred = y_pred_test.predicted_mean

In [None]:
y_pred

In [None]:
y_pred = pd.Series(y_pred, index=test.index)

In [None]:
y_pred

In [None]:
plot_co2(train, test, y_pred, "SARIMA")

# Hyperparameter Optimization (Determining Model Degrees

In [None]:
p = d = q = range(0, 2)

In [None]:
pdq = list(itertools.product(p, d, q))

In [None]:
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]

In [None]:
def sarima_optimizer_aic(train, pdq, seasonal_pdq):
    best_aic, best_order, best_seasonal_order = float("inf"), None, None
    for param in pdq:
        for param_seasonal in seasonal_pdq:
            try:
                sarimax_model = SARIMAX(train, order=param, seasonal_order=param_seasonal)
                results = sarimax_model.fit(disp=0)
                aic = results.aic
                if aic < best_aic:
                    best_aic, best_order, best_seasonal_order = aic, param, param_seasonal
                print('SARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, aic))
            except:
                continue
    print('SARIMA{}x{}12 - AIC:{}'.format(best_order, best_seasonal_order, best_aic))
    return best_order, best_seasonal_order

In [None]:
best_order, best_seasonal_order = sarima_optimizer_aic(train, pdq, seasonal_pdq)

# Final Model

In [None]:
model = SARIMAX(train, order=best_order, seasonal_order=best_seasonal_order)

In [None]:
sarima_final_model = model.fit(disp=0)

In [None]:
y_pred_test = sarima_final_model.get_forecast(steps=48)

In [None]:
y_pred_test

In [None]:
y_pred = y_pred_test.predicted_mean

In [None]:
y_pred

In [None]:
y_pred = pd.Series(y_pred, index=test.index)

In [None]:
y_pred

In [None]:
plot_co2(train, test, y_pred, "SARIMA")

# SARIMA Optimization Based on MAE

In [None]:
p = d = q = range(0, 2)

In [None]:
pdq = list(itertools.product(p, d, q))

In [None]:
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]

In [None]:
def sarima_optimizer_mae(train, pdq, seasonal_pdq):
    best_mae, best_order, best_seasonal_order = float("inf"), None, None
    for param in pdq:
        for param_seasonal in seasonal_pdq:
            try:
                model = SARIMAX(train, order=param, seasonal_order=param_seasonal)
                sarima_model = model.fit(disp=0)
                y_pred_test = sarima_model.get_forecast(steps=48)
                y_pred = y_pred_test.predicted_mean
                mae = mean_absolute_error(test, y_pred)
                if mae < best_mae:
                    best_mae, best_order, best_seasonal_order = mae, param, param_seasonal
                print('SARIMA{}x{}12 - MAE:{}'.format(param, param_seasonal, mae))
            except:
                continue
    print('SARIMA{}x{}12 - MAE:{}'.format(best_order, best_seasonal_order, best_mae))
    return best_order, best_seasonal_order

In [None]:
best_order, best_seasonal_order = sarima_optimizer_mae(train, pdq, seasonal_pdq)

In [None]:
model = SARIMAX(train, order=best_order, seasonal_order=best_seasonal_order)

In [None]:
sarima_final_model = model.fit(disp=0)

In [None]:
y_pred_test = sarima_final_model.get_forecast(steps=48)

In [None]:
y_pred_test

In [None]:
y_pred = y_pred_test.predicted_mean

In [None]:
y_pred

In [None]:
y_pred = pd.Series(y_pred, index=test.index)

In [None]:
y_pred

In [None]:
plot_co2(train, test, y_pred, "SARIMA")

# Final Model (The Final Model was Built with All Data)

In [None]:
model = SARIMAX(y, order=best_order, seasonal_order=best_seasonal_order)

In [None]:
sarima_final_model = model.fit(disp=0)

In [None]:
feature_predict = sarima_final_model.get_forecast(steps=6)

In [None]:
feature_predict = feature_predict.predicted_mean

In [None]:
feature_predict