## 2 Some simple forecasting methods
***
Some forecasting methods are extremely simple and surprisingly effective. We will use four simple forecasting methods as benchmarks throughout this book. To illustrate them, we will use quarterly Australian clay brick production between 1970 and 2004.

In [None]:
# Import the libraries that we are going to use for the analysis:
import pandas as pd
import polars as pl

from sktime.utils import plot_series

import matplotlib.pyplot as plt

In [None]:
# Create a dataframe from a csv file:
df = pl.read_csv(
    "../Assets/aus-production.csv", separator=";", infer_schema_length=1000
)
df = df.filter(pl.col("Quarter").is_between(pl.lit("1970 Q1"), pl.lit("2004 Q4")))

bricks = (
    df.select(pl.col("Bricks").cast(pl.Int16).alias("y"))
    .to_pandas()
    .set_index(pd.date_range(start="1970-01-01", end="2004-12-31", freq="QE"))
)


## Mean method

Here, the forecasts of all future values are equal to the average (or “mean”) of the historical data. If we let the historical data be denoted by $y_{1},...,y_{T}$, then we can write the forecasts as

\begin{gather*} 
\hat{y}_{T+h|T}=\bar{y}=(y_{1}+⋯+y_{T})/T
\end{gather*}

The notation $\hat{y}_{T+h|T}$ is a short-hand for the estimate of $y_{T+h}$ based on the data $y_{1},...,y_{T}$





In [None]:
# HistoricAverage's usage example:
from sktime.forecasting.naive import NaiveForecaster

model = NaiveForecaster(strategy="mean", sp=None)
model = model.fit(y=bricks)
y_hat = model.predict(fh=range(1, 21))

In [None]:
fig, ax = plot_series(bricks, y_hat)

<p style="text-align: center;">
Figure 3: Mean (or average) forecasts applied to clay brick production in Australia.
</p>

## Naïve method

For naïve forecasts, we simply set all forecasts to be the value of the last observation. That is,

\begin{gather*} 
\hat{y}_{T+h|T}=y_{T}
\end{gather*}

This method works remarkably well for many economic and financial time series.

In [None]:
# HistoricAverage's usage example:
from sktime.forecasting.naive import NaiveForecaster

model = NaiveForecaster(strategy="last", sp=1)
model = model.fit(y=bricks)
y_hat = model.predict(fh=range(1, 21))
fig, ax = plot_series(bricks, y_hat)

<p style="text-align: center;">
Figure 4: Naïve forecasts applied to clay brick production in Australia.
</p>

Because a naïve forecast is optimal when data follow a random walk, these are also called random walk forecasts and the random walk model can be used instead of `NAIVE`.

## Seasonal naïve method

A similar method is useful for highly seasonal data. In this case, we set each forecast to be equal to the last observed value from the same season (e.g., the same month of the previous year). Formally, the forecast for time  $T+h$ is written as

\begin{gather*} 
\hat{y}_{T+h|T}=y_{T+h-m(k+1)}
\end{gather*}

where $m=$ the seasonal period, and $k$ is the integer part of  $(h−1)/m$ (i.e., the number of complete years in the forecast period prior to time $T+h)$. This looks more complicated than it really is. For example, with monthly data, the forecast for all future February values is equal to the last observed February value. With quarterly data, the forecast of all future Q2 values is equal to the last observed Q2 value (where Q2 means the second quarter). Similar rules apply for other months and quarters, and for other seasonal periods.

In [None]:
# HistoricAverage's usage example:
from sktime.forecasting.naive import NaiveForecaster

model = NaiveForecaster(strategy="last", sp=4)
model = model.fit(y=bricks)
y_hat = model.predict(fh=range(1, 21))
fig, ax = plot_series(bricks, y_hat)

<p style="text-align: center;">
Figure 5: Seasonal naïve forecasts applied to clay brick production in Australia.
</p>

## Drift method

A variation on the naïve method is to allow the forecasts to increase or decrease over time, where the amount of change over time (called the drift) is set to be the average change seen in the historical data. Thus the forecast for time $T+h$ is given by

\begin{gather*} 
\hat{y}_{T+h|T}=y_{T}+\frac{h}{T-1}\sum_{t=2}^{T}(y_{t}-y_{t-1})=y_{T}+h\frac{y_{T}-y_{1}}{T-1}
\end{gather*}

This is equivalent to drawing a line between the first and last observations, and extrapolating it into the future.

In [None]:
# HistoricAverage's usage example:
from sktime.forecasting.naive import NaiveForecaster

model = NaiveForecaster(strategy="drift", window_length=100)
model = model.fit(y=bricks)
y_hat = model.predict(fh=range(1, 21))
fig, ax = plot_series(bricks, y_hat)

<p style="text-align: center;">
Figure 6: Drift forecasts applied to clay brick production in Australia.
</p>