# Reconciliation #
This example provides a minimal example of how to reconcile a set of existing forecasts.

This example is based on chapter 11 of:

Hyndman, R.J., & Athanasopoulos, G. (2021) *Forecasting: principles and practice, 
3rd edition*, OTexts: Melbourne, Australia. OTexts.com/fpp3. Accessed on 05/07/2022.

The data comes from the online book of Hyndman, see link below.

The code for this example can be found at:

* Jupyter Notebook: https://github.com/elephaint/hierts/tree/main/docs/notebooks
* Python script: https://github.com/elephaint/hierts/tree/main/examples

### Import packages & read data

In [1]:
#%% Read packages
import pandas as pd
import numpy as np
from hierts.reconciliation import calc_summing_matrix, apply_reconciliation_methods,\
                                  calc_level_method_rmse

In [2]:
#%% Read data
df = pd.read_csv("https://OTexts.com/fpp3/extrafiles/prison_population.csv")

In [3]:
# Let's look at the data
df.head(10)

Unnamed: 0,Date,State,Gender,Legal,Indigenous,Count
0,2005-03-01,ACT,Female,Remanded,ATSI,0
1,2005-03-01,ACT,Female,Remanded,Non-ATSI,2
2,2005-03-01,ACT,Female,Sentenced,ATSI,0
3,2005-03-01,ACT,Female,Sentenced,Non-ATSI,5
4,2005-03-01,ACT,Male,Remanded,ATSI,7
5,2005-03-01,ACT,Male,Remanded,Non-ATSI,58
6,2005-03-01,ACT,Male,Sentenced,ATSI,5
7,2005-03-01,ACT,Male,Sentenced,Non-ATSI,101
8,2005-03-01,NSW,Female,Remanded,ATSI,51
9,2005-03-01,NSW,Female,Remanded,Non-ATSI,131


As you can see, we have a column indicating the time index ('Date'), four columns that provide information on the hierarchy/aggregation of the data ('State', 'Gender', 'Legal', 'Indigenous'), and a target column ('Count').

### Set aggregations and calculate summing matrix
For this dataset, the aggregations are in the columns: ['State', 'Gender', 'Legal', 'Indigenous']. These columns will be in our aggregations.

In [4]:
aggregation_cols = ['State', 'Gender', 'Legal', 'Indigenous']

Next, we define the aggregations that we are interested in. In this case, we are interested in the following aggregations:

In [5]:
aggregations = [['State'],
                ['State', 'Gender'],
                ['State', 'Legal'],
                ['State', 'Indigenous'],
                ['Gender', 'Legal']]

Don't include the top (total) level and bottom-level: these will be added automatically later on. Now, we can create a summing matrix, which shows how each of the bottom-level series maps to our chosen aggregations:

In [6]:
df_S = calc_summing_matrix(df, aggregation_cols, aggregations)

Let's have a look at df_S:

In [7]:
df_S

Unnamed: 0_level_0,Unnamed: 1_level_0,ACT-Female-Remanded-ATSI,ACT-Female-Remanded-Non-ATSI,ACT-Female-Sentenced-ATSI,ACT-Female-Sentenced-Non-ATSI,ACT-Male-Remanded-ATSI,ACT-Male-Remanded-Non-ATSI,ACT-Male-Sentenced-ATSI,ACT-Male-Sentenced-Non-ATSI,NSW-Female-Remanded-ATSI,NSW-Female-Remanded-Non-ATSI,...,VIC-Male-Sentenced-ATSI,VIC-Male-Sentenced-Non-ATSI,WA-Female-Remanded-ATSI,WA-Female-Remanded-Non-ATSI,WA-Female-Sentenced-ATSI,WA-Female-Sentenced-Non-ATSI,WA-Male-Remanded-ATSI,WA-Male-Remanded-Non-ATSI,WA-Male-Sentenced-ATSI,WA-Male-Sentenced-Non-ATSI
Aggregation,Value,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Total,Total,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
State,ACT,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
State,NSW,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
State,NT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
State,QLD,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Bottom level,WA-Female-Sentenced-Non-ATSI,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
Bottom level,WA-Male-Remanded-ATSI,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
Bottom level,WA-Male-Remanded-Non-ATSI,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
Bottom level,WA-Male-Sentenced-ATSI,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


As you can see, df_S is a mapping between the bottom-level series (columns of df_S) and the aggregations specified. Some observations:

* 'Total' contains the total across all bottom-level series. Hence, this is a row consisting of ones - the total is the sum of all the bottom-level series.
* 'Bottom level' contains the bottom level series. This is an identity matrix, as each bottom-level series in the rows only maps to a single bottom level series in the columns
* Everything in between 'Bottom level' and 'Total' denotes how we can construct the aggregations. 

### Create random forecasts for each time series
Now we can create a forecasting model for each time series in the aggregation matrix df_S. In this minimal example, we'll just use random forecasts.

In [8]:
# Set target, time_index and split of train and test.
target = 'Count'
time_index = 'Date'
end_train = '2015-12-31'
start_test = '2016-01-01'

In [9]:
# Create random forecasts - in this case we just do random poisson sampling of the actual values
rng = np.random.default_rng(seed=0)
df[f'{target}_predicted'] = rng.poisson(lam=df[f'{target}'])

In [10]:
# Add bottom_timeseries identifier and create actuals & forecasts dataframe for all aggregations
df['bottom_timeseries'] = df[aggregation_cols].agg('-'.join, axis=1)
actuals_bottom_timeseries = df.set_index(['bottom_timeseries', time_index])[target]\
                              .unstack(1)\
                              .loc[df_S.columns]
forecasts_bottom_timeseries = df.set_index(['bottom_timeseries', time_index])[f'{target}_predicted']\
                                .unstack(1)\
                                .loc[df_S.columns]
actuals = df_S @ actuals_bottom_timeseries
forecasts = df_S @ forecasts_bottom_timeseries

### Reconciliation
We can now reconcile the forecasts. First, we need to compute the residuals (forecast errors). The residuals are necessary because some reconciliation methods use these. In case you don't have these or it is expensive to obtain them, you are limited to using 'ols' or 'wls_struct' as reconciliation method.

In [11]:
# Calculate residuals. We only need the in-sample (i.e. 'on the training set') residuals.
residuals = (forecasts - actuals)
residuals_train = residuals.loc[:, :end_train]

In [12]:
# Create forecast test set and apply a set of reconciliation methods.
forecasts_test = forecasts.loc[:, start_test:]
forecasts_reconciled = apply_reconciliation_methods(forecasts_test, df_S, \
                                                    residuals_train, \
                                                    methods=['ols', 'wls_var', 'mint_shrink'])

Method ols, reconciliation time: 0.0025s
Method wls_var, reconciliation time: 0.0006s
Method mint_shrink, reconciliation time: 0.1283s


Finally, we can compute the root mean-squared error on all of our methods.

In [13]:
# Calculate error for all levels and methods. We set bottom-up as the base method to compare against
# in the relative rmse
rmse, rel_rmse = calc_level_method_rmse(forecasts_reconciled, actuals, base='base')

Let's look at the errors:

In [14]:
rmse

Method,base,ols,wls_var,mint_shrink
Aggregation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Total,200.393987,200.395254,200.394177,200.393987
Gender-Legal,95.810686,95.810447,95.810493,95.810686
State,70.35557,70.355797,70.35543,70.35557
State-Gender,47.778624,47.778619,47.778519,47.778624
State-Indigenous,47.778624,47.778619,47.778519,47.778624
State-Legal,47.778624,47.778619,47.778519,47.778624
Bottom level,24.704899,24.704901,24.704931,24.704899
All series,46.053122,46.053171,46.053069,46.053122


In this case, the reconciliation methods don't do very much. This example is just to illustrate what constitutes a minimal working example to produce reconciled forecasts on a given set of existing forecasts.