# Forecast reconciliation
*What if our forecasts should follow linear constraints?*

---


Imagine you work in a retail company and you are responsible for forecasting total sales in the following weeks. The company is spread through the country, and the forecast must be made for every city and state.

However, when the national and regional sales teams meet, your forecasts aren't consistent. "How can we sum state-level sales and find a number larger than the national one?". Which forecasts should they consider to base their decisions on?

Forecast reconciliation methods are simple and effective solutions to that problem. When you have an hierarchy of time-series that respect some linear constraints (such as the sum of state sales must be equal to national sales), you may forecast each level independently and _reconciliate_ them.

Let's suppose our company is based on Europe and has two branches: one in France and another in Italy, and we were asked for a seven-days-ahead estimation.


## The intuition

Consider this dataset containing two years of sales of a company:


In [1]:
import matplotlib.pyplot as pyplot
import numpy as np
import pandas as pd

In [2]:
import plotly.offline as pyo
import plotly.io as pio
import plotly.express as px

import os
#if os.environ.get('VSCODE_PID') is None:
pio.renderers.default = 'colab'


# A synthetic dataset of 1000 points for 2 countries (France and Italy), the index are pd.Period starting at 2020-01-01
# The data is a random walk with a drift
index = pd.date_range('2020-01-01', freq='D', periods=1000)
data = pd.DataFrame((1 + 1e-2*np.random.randn(1000, 2)).cumprod(axis=0)*1e6, index=index, columns=['France', 'Italy'])
data["Europe"] = data["France"] + data["Italy"]


#plot using plotly
fig = px.line(data, x=data.index, y=data.columns, title="Fig")
fig.show()


The total sales in Europe is the sum of French and Italian sales, and they should follow a linear constraint defined by

$$y_{europe}(t) = y_{france}(t) + y_{italy}(t)$$

Note that it can be regarded as a 2d plane in a three-dimensional space: to be considered coherent, our observations and forecasts must lie on that plane.


In [3]:
import pandas as pd
import plotly.graph_objs as go

# assuming df is your DataFrame
# df = pd.DataFrame({
#     'Italy': [...],
#     'Europe': [...],
#     'France': [...]
# })

trace = go.Scatter3d(
    x=data['Italy'],
    z=data['Europe'],
    y=data['France'],
    mode='markers',
    marker=dict(
        size=3,        # set color to an array/list of desired values
        colorscale='Viridis',   # choose a colorscale
        opacity=0.8
    )
)

plot_data = [trace]
layout = go.Layout(
    margin=dict(
        l=0,
        r=0,
        b=0,
        t=0
    )
)
fig = go.Figure(data=plot_data, layout=layout)
fig.show()


Applying well-known forecasting methods such as ARIMA and ETS to each country data won't necessarily generate estimates that adhere to that constraint. The figure below illustrates how forecasts of an ARIMA model can diverge from it.

In [4]:
from sktime.forecasting.arima import ARIMA, AutoARIMA
from sktime.forecasting.exp_smoothing import ExponentialSmoothing

model = ARIMA((3,1,3)).fit(data)
preds = model.predict(fh=list(range(1,10)))


Maximum Likelihood optimization failed to converge. Check mle_retvals



In [5]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go

# assuming df is your DataFrame
# df = pd.DataFrame({
#     'Italy': [...],
#     'Europe': [...],
#     'France': [...]
# })

# 3D Scatter plot
trace1 = go.Scatter3d(
    x=preds['Italy'],
    z=preds['Europe'],
    y=preds['France'],
    mode='markers',
    marker=dict(
        size=5,  # set color to an array/list of desired values
        color="green",
        opacity=0.8
    )
)

# Defining a grid of x and y values
x = np.linspace(preds['France'].min(), preds['France'].max(), num=10)
y = np.linspace(preds['Italy'].min(), preds['Italy'].max(), num=10)
xGrid, yGrid = np.meshgrid(y, x)
zGrid = xGrid + yGrid  # calculate corresponding z

# 3D Surface plot
trace2 = go.Surface(x=xGrid, y=yGrid, z=zGrid, opacity=0.6, colorscale=[ [0, 'blue'], [1, 'blue'] ], showscale=False)

# Package the traces and plot
plot_data = [trace1, trace2]
layout = go.Layout(
    margin=dict(
        l=0,
        r=0,
        b=0,
        t=0
    ),
    scene=dict(
        xaxis=dict(title='Italy'),
        zaxis=dict(title='Europe'),
        yaxis=dict(title='France'),
    )
)
fig = go.Figure(data=plot_data, layout=layout)
fig.show()


It seems that we can't rely only on those methods. Forecasting each level independently is not enough. How can we 
guarantee that we'll give good quality and coherent forecasts?

## Forecasting coherently

---

The literature provides two main approaches to forecast coherently: the single-level approach and reconciliation methods [@schafer2005shrinkage]
