# Modern Time Series Forecasting with Python
# Packages & Datasets

Marco Zanotti

In [6]:
import sys
sys.path.insert(0, '../../src/Python/utils')
from utils import load_data 
from pytimetk import plot_timeseries

# Packages

The course is focused on Python toolkit organized by role.

For data wrangling we use pandas and polars, while pytimetk streamlines
time-aware feature engineering, visualization, and preprocessing with
Plotly for interactive charts.

For modeling we rely on scikit-learn for pipelines and preprocessing,
and the Nixtla ecosystem for forecasting:

-   statsforecast for classical/statistical models and efficient
    forecasting primitives  
-   mlforecast for machine learning models (linear, tree-based) with
    exogenous features  
-   neuralforecast for deep learning models  
-   utilsforecast and coreforecast for backtesting utilities,
    evaluation, and feature generators

For hosted models, Nixtla’s TimeGPT is accessed via the nixtla Python
client (API key required). Agents are explored via the timecopilot
package.

All required packages are managed with conda using the provided
environment file:  

```
conda env create -f src/env-setup/conda_env_setup.yml
conda activate modern_tsf
```

# Datasets

## Email Subscribers

A company decided to change the selling process of its products
converting from a completely physical store approach, to a more digital
and modern solution. Hence, it decided to open an online web store that
integrates an e-commerce platform, where its “virtual” customers can by
all the merchandise.  
In order to monitor this new business solution, it adopted few
well-known data analytics tools.

Google Analytics has been set up on the web store pages to collect data
related to page views, sessions and organic searches. This could
potentially help the company to understand whether its website is
gaining popularity.

Moreover, MailChimp is used to track all the customers that buy a
product and subscribe to the web store.

Finally, marketing events like discount programs and new product launch
are promoted through several social network channels.

All these data are stored into the company database and can be used to
analyze the factors that impacts on the web store sales.

In [7]:
load_data("../../data/email/", "email_prep").plot_timeseries('ds', 'y', smooth = False)

## M4 Competition Hourly

The M4 Competition is a well-known time series forecasting competition
organized by Spyros Makridakis. The competition provides a large dataset
of time series from various domains, including finance, economics, and
demographics. The goal of the competition is to develop accurate
forecasting models for these time series.

https://www.unic.ac.cy/iff/research/forecasting/m-competitions/m4/

We will use a sample of the M4 Hourly dataset, which consists of hourly
time series data. The dataset contains multiple time series, each
identified by a unique ID.

In [9]:
load_data("../../data/m4/", "m4_prep_sample").groupby('unique_id').plot_timeseries('ds', 'y', facet_ncol = 2, smooth = False)