# Time Series Forecasting with "Shallow Learning"

Before we get to ML techniques commonly called "deep learning" (e.g. **recurrent neural networks**), let's see what we can do with simpler ML methods. These techniques are the topic of our course [📓 Machine Learning with Python](../index/mlp2-machine-learning-python.ipynb), and the library `scikit-learn` provides most of the code needed. We assume that you are familiar with the concepts.

## Preamble

In [None]:
import pandas

In [None]:
import matplotlib.pyplot as plt

In [None]:
import data_science_learning_paths
import forecast_lab

In [None]:
data_science_learning_paths.setup_plot_style()

## Example: Forecasting Taxi Trips

In [None]:
taxi_trips = data_science_learning_paths.datasets.read_chicago_taxi_trips_daily()

In [None]:
taxi_trips.head()

## Transform to Supervised Learning Problem

It is possible to pack a time series forecasting problem into the traditional form for supervised machine learning: A set of labelled observations, more specifically a matrix $X$ of feature values and a vector $y$ of labels .

In [None]:
from forecast_lab import transform_to_labelled_points

In [None]:
transform_to_labelled_points??

In [None]:
X_train, y_train = transform_to_labelled_points(taxi_trips["Trips"][:1000], 10)

In [None]:
X_test, y_test = transform_to_labelled_points(taxi_trips["Trips"][1000:2000], 10)

We now have the data in the familiar format for supervised learning: A feature matrix $X$ and a label vector $y$ - only that the features are the preceding $w$ values of the time series.

In [None]:
X_train.head()

In [None]:
y_train.head()

We can now fit ML models, e.g. from **scikit-learn**:

In [None]:
import sklearn

In [None]:
from sklearn.ensemble import GradientBoostingRegressor

In [None]:
model = GradientBoostingRegressor().fit(X_train, y_train)

Let's use the model for predictions on the test feature matix and compare with the ground truth:

In [None]:
pandas.DataFrame(
    {
        "prediction": model.predict(X_test),
        "actual": y_test       
    }
).plot(ylim=(0,150))

Not bad - a point in the series can be predicted from the the preceding ones with some accuracy, and we see similar seasonal patterns. With time, the forecast deteriorates somewhat - which may be due to [**concept drift**](https://en.m.wikipedia.org/wiki/Concept_drift).

However, this is not yet a proper time series forecast as we have defined it. The model only forecasts one step ahead and is provided with the actual values from the time segment we want to predict. In the following, we discuss how to generate a forecast for arbitrary time steps with recursive forecasting.

## Using the Model for Recursive Forecasting

In order to do recursive forecasting, the model needs to be supplied with a rolling window of its own predictions. We have prepared some code for you in the `ScikitLearnWrapper` class: 

In [None]:
forecast_lab.ScikitLearnWrapper.forecast??

Now we can evaluate a true forecast for a given horizon. Have a look at the diagnostic plots:
- training, test and forecast segments
- residuals: test versus forecast
- pull plot: distribution of errors

In [None]:
forecast_lab.ForecastEvaluation(
        ts=taxi_trips["Trips"],
        forecasting=forecast_lab.ScikitLearnWrapper(
            GradientBoostingRegressor,
            sliding_window_size=20,
        ),
        test_window_size=60,
        train_window_size=365,
        metrics={
            "MAPE": data_science_learning_paths.mlp.mean_absolute_percentage_error,
            "RSME": data_science_learning_paths.mlp.root_mean_squared_error
        }
).evaluate(
    k=3, 
    plot_segments=True,
    plot_residuals=True,
    plot_pulls=True
).get_metrics().mean()

## Incorporating External Variables

A nice property of this approach to forecasting is that it is straightforward to add external variables or even multiple other time series: Just add them as additional feature columns to the feature columns that come from the time series itself.

In [None]:
forecast_lab.ScikitLearnWrapper.fit??

### Example

Do the seasons influence the demand for taxi rides in any way? Hard to say, but we are going to try it out: As external variable, we take the average monthly temperature into account. Let's look at one year of data:

In [None]:
usa_temp = data_science_learning_paths.datasets.read_usa_temperature()

In [None]:
usa_temp["2013"].head()

In [None]:
plt.plot(taxi_trips["2013"]["Trips"])

The external variables must be passed in the form of a `pandas.DataFrame` with an index matching the time series.

Unfortunately we have only monthly data for the temperature, but we can resample it to a daily frequency to make it compatible:

In [None]:
plt.plot(usa_temp["2013"]["Value"].resample("d").pad())

Let's see if the fitting step works:

In [None]:
temperature = pandas.DataFrame(
        usa_temp["2013-01":"2014-01"]["Value"].resample("d").pad(),
)[:-1]

In [None]:
forecast_lab.ScikitLearnWrapper(
    estimator_class=GradientBoostingRegressor,
    sliding_window_size=60,
).fit(
    ts=taxi_trips["2013"]["Trips"],
    ext_vars=temperature
)

Now this can be used in the `ForecastEvaluation`. We need to make sure to pass external data matching both the index for the training and test time series.

In [None]:
forecast_lab.ForecastEvaluation(
    ts=taxi_trips["2013"]["Trips"],
    ts_test=taxi_trips["2014-01"]["Trips"],
    ext_vars=pandas.DataFrame(
        usa_temp["2013-01":"2014-02"]["Value"].resample("d").pad(),
    ),
    forecasting=forecast_lab.ScikitLearnWrapper(
        GradientBoostingRegressor,
        sliding_window_size=120,
    ),
    metrics={
        "MAPE": data_science_learning_paths.mlp.mean_absolute_percentage_error,
        "RSME": data_science_learning_paths.mlp.root_mean_squared_error
    }
).evaluate(
    k=3, 
    plot_segments=True,
    plot_residuals=True,
    plot_pulls=True
).get_metrics().mean()

In [None]:
forecast_lab.ForecastEvaluation(
    ts=taxi_trips["2013"]["Trips"],
    ts_test=taxi_trips["2014-01"]["Trips"],
    forecasting=forecast_lab.ScikitLearnWrapper(
        GradientBoostingRegressor,
        sliding_window_size=120,
    ),
    metrics={
        "MAPE": data_science_learning_paths.mlp.mean_absolute_percentage_error,
        "RSME": data_science_learning_paths.mlp.root_mean_squared_error
    }
).evaluate(
    k=3, 
    plot_segments=True,
    plot_residuals=True,
    plot_pulls=True
).get_metrics().mean()

## Summary

**Pros**

+ apply any supervised learning regressor to forecasting
+ reuse well-known ML methods and tools
+ easily add external features or other time series as additional columns in the feature matrix

**Cons**

- order of data points matters only within the sliding window - a possible limit to learning long patterns
- recursive forecast may deteriorate quickly the longer the forecast horizon as errors accumulate

## References

- [Time Series for scikit-learn People](https://www.ethanrosenthal.com/2018/01/28/time-series-for-scikit-learn-people-part1/)

---
_This notebook is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright © 2018-2025 [Point 8 GmbH](https://point-8.de)_
