## 1. A tidy forecasting workflow
***

The process of producing forecasts for time series data can be broken down into a few steps.

![workflow-1](https://raw.githubusercontent.com/Nixtla/fpp3/main/Assets/workflow-1.png)

<img src="https://raw.githubusercontent.com/Nixtla/fpp3/main/Assets/workflow-1.png" alt="workflow" />


To illustrate the process, we will fit a ETS model to national GDP data stored in `global_economy`

## Data preparation (tidy)

The first step in forecasting is to prepare data in the correct format. This process may involve loading in data, identifying missing values, filtering the time series, and other pre-processing tasks. *Multiple pandas functions can be useful for this stage*

Many models have different data requirements; some require the series to be in time order, others require no missing values. Checking your data is an essential step to understanding its features and should always be done before models are estimated.

We will model GDP per capita over time; so first, we must compute the relevant variable.

In [None]:
# Import the libraries that we are going to use for the analysis:
import polars as pl
from utilsforecast.plotting import plot_series

In [None]:
# Create a dataframe from a csv file
df = pl.read_csv("Assets/global_economy_data.csv", separator=";").with_columns(
    pl.lit(1).alias("unique_id")
)

df.head()

## Plot the data (visualise)

As we have seen in Chapter 2, visualisation is an essential step in understanding the data. Looking at your data allows you to identify common patterns, and subsequently specify an appropriate model.

The data for one country in our example are plotted the next Figure:

In [None]:
plot_series(df, time_col="Year", target_col="Sweden_gdp_per_cap", engine="plotly")

<p style="text-align: center;">
Figure 1: GDP per capita data for Sweden from 1960 to 2017.
</p>

## Define a model (specify)

There are many different time series models that can be used for forecasting, and much of this book is dedicated to describing various models. Specifying an appropriate model for the data is essential for producing appropriate forecasts.

*For example, it´s possible to use an `ETS model` to study the time series of GDP per capita.*

*The exponential smoothing (ETS) algorithm is especially suited for data with seasonality and trend. ETS computes a weighted average over all observations in the input time series dataset as its prediction. In contrast to moving average methods with constant weights, ETS weights exponentially decrease over time, capturing long term dependencies while prioritizing new observations.*

In [None]:
# Import the model that we are going to use:
from statsforecast.models import AutoETS
from statsforecast import StatsForecast

*AutoETS model Automatically selects the best ETS (Error, Trend, Seasonality) model using an information criterion. Default is Akaike Information Criterion (AICc), while particular models are estimated using maximum likelihood. The state-space equations can be determined based on their `M` multiplicative, `A` additive, `Z` optimized or `N` ommited components. The model string parameter defines the ETS equations: E in `[M A Z]`, T in `[N A M Z]`, and S in `[N A M Z]`.*

*For example when model=‘ANN’ (additive error, no trend, and no seasonality), ETS will explore only a simple exponential smoothing.*

*If the component is selected as ‘Z’, it operates as a placeholder to ask the AutoETS model to figure out the best parameter.*

## Train the model (estimate)

Once an appropriate model is specified, we next train the model on some data. To estimate the model in our example, we use:

In [None]:
model = StatsForecast(
    models=[AutoETS(model="ZMZ")],
    freq=1,
)

In [None]:
model.fit(df, id_col="unique_id", time_col="Year", target_col="Sweden_gdp_per_cap")

This fits a ETS model to the GDP per capita data for Sweden 

## Check model performance (evaluate)

Once a model has been fitted, it is important to check how well it has performed on the data. There are several diagnostic tools available to check model behaviour, and also accuracy measures that allow one model to be compared against another. Sections 8 and 9 go into further details.

## Produce forecasts (forecast)

With an appropriate model specified, estimated and checked, it is time to produce the forecasts using `.predict()`. The easiest way to use this function is by specifying the number of future observations to forecast. For example, forecasts for the next 10 observations can be generated using `h = 10`. We can also use natural language; e.g., `h = "2 years"` can be used to predict two years into the future.

In other situations, it may be more convenient to provide a dataset of future time periods to forecast. This is commonly required when your model uses additional information from the data, such as exogenous regressors. Additional data required by the model can be included in the dataset of observations to forecast.

In [None]:
# Make predictions with ETS model:
y_hat = model.predict(h=3)
y_hat

The forecasts can be plotted along with the historical data using matplotlib as follows:

In [None]:
plot_series(
    df,
    forecasts_df=y_hat,
    time_col="Year",
    target_col="Sweden_gdp_per_cap",
    engine="plotly",
)

<p style="text-align: center;">
Figure 2: Forecasts of GDP per capita for Sweden using a simple trend model.
</p>