# R√©conciliation des S√©ries Temporelles : Approche Bottom-Up

La r√©conciliation des s√©ries temporelles est un processus visant √† garantir que les pr√©visions ou observations √† diff√©rents niveaux d'agr√©gation respectent une structure hi√©rarchique ou de groupe coh√©rente. Cela est particuli√®rement utile lorsque vous avez des pr√©visions individuelles pour plusieurs entit√©s (par exemple, des magasins, des r√©gions ou des produits) qui doivent √™tre consolid√©es pour respecter des totaux au niveau global.

## Approche Bottom-Up

Dans l'approche **bottom-up**, les pr√©visions sont d'abord r√©alis√©es pour les niveaux les plus d√©taill√©s, puis agr√©g√©es pour obtenir des estimations aux niveaux sup√©rieurs. Par exemple, pour vos 5 clusters de magasins, les pr√©visions individuelles sont simplement additionn√©es pour obtenir une estimation globale :

$$\hat{y}_{global} = \sum_{i=1}^{n} \hat{y}_{cluster_i}$$

## √âtapes cl√©s

1. **Pr√©parer les donn√©es** 

2. **R√©aliser les pr√©visions** :

   -> Appliquer un mod√®le de pr√©vision pour chaque cluster (SARIMA, Prophet Online).

3. **Effectuer l'agr√©gation** 

4. **Valider les r√©sultats** 



## -> Limites
- **Biais potentiel** : Les erreurs des pr√©visions au niveau inf√©rieur se propagent et s'accumulent au niveau global.
- **Manque de flexibilit√©** : Cette approche ne corrige pas automatiquement les incoh√©rences entre les niveaux.

La r√©conciliation vise √† regler ces probl√®mes. 


### R√©conciliation des Pr√©dictions SARIMA par Clusters

Dans cette section, nous allons r√©concilier les pr√©dictions des mod√®les SARIMA s√©lectionn√©s par la fonction `auto_arima` pour chaque cluster. La r√©conciliation sera effectu√©e de mani√®re proportionnelle au sein de chaque cluster. 

L'objectif est d'√©valuer si cette approche permet d'am√©liorer les pr√©dictions pour chaque cluster de mani√®re individuelle.

In [114]:
import numpy as np
import pandas as pd
import plotly.express as px
from functions import *
import plotly.graph_objects as go

### üì• Importation et Pr√©traitement des Donn√©es

In [115]:
Cluster_0 = pd.read_csv('Cluster_0.csv')
Cluster_1 = pd.read_csv('Cluster_1.csv')
Cluster_2 = pd.read_csv('Cluster_2.csv')
Cluster_3 = pd.read_csv('Cluster_3.csv')
Cluster_4 = pd.read_csv('Cluster_4.csv')
test_clustered = pd.read_csv("all_clusters_time_series_test.csv")
All_prevision = pd.read_csv('All_prevision.csv')
All_prevision.head()

Unnamed: 0,date,sales,Test_Forecast,test_Forecast_prophet_online,test_Forecast_prophet_online_global,Test_Forecast_Sarimax_global,Test_Forecast_prophet_online
0,2016-01-01,16433.39,923070.383856,56233.784306,690101.175402,883290.7,692107.112131
1,2016-01-02,1066677.0,906475.369087,70606.134717,869241.978477,622992.8,871434.184993
2,2016-01-03,1226736.0,845104.39878,80512.550636,918947.551836,1115351.0,921342.024227
3,2016-01-04,955956.9,951790.485532,55072.849906,670285.418443,905668.4,672867.408305
4,2016-01-05,835320.4,884962.934882,55272.762496,637981.921649,930421.6,640742.555912


In [116]:
#--ajustement :  on prend les valeurs de prevision pour chaque cluster dumodel sarimax chosi par les auto_arima--
Cluster_0['Test_Forecast'] = Cluster_0['Test Forecast']
Cluster_1['Test_Forecast'] = Cluster_1['Test Forecast']
Cluster_2['Test_Forecast'] = Cluster_2['Test Forecast']
Cluster_3['Test_Forecast'] = Cluster_3['Test Forecast']
Cluster_4['Test_Forecast'] = Cluster_4['Test Forecast']

### üìä D√©finitions des variables

In [117]:
y_global = np.array(All_prevision['Test_Forecast_Sarimax_global'])

y_clusters = np.array([Cluster_0['Test_Forecast'],Cluster_1['Test_Forecast'],Cluster_2['Test_Forecast'],Cluster_3['Test_Forecast'],Cluster_4['Test_Forecast']])

y_clusters_sum = np.sum(y_clusters, axis=0)

$ \Delta = y_{\text{global}} - y_{\text{clusters\_sum}} $  

In [118]:
delta = y_global - y_clusters_sum

$ y_{\text{clusters\_adjusted}} = y_{\text{clusters}} + \left( \Delta \cdot \frac{y_{\text{clusters}}}{y_{\text{clusters\_sum}}} \right) $

In [119]:
y_clusters_adjusted = y_clusters + (delta * (y_clusters / y_clusters_sum))

In [120]:
All_prevision['Test_Forecast_reconciliation'] = [sum(sublist[i] for sublist in y_clusters_adjusted) for i in range(len(y_clusters_adjusted[0]))]

In [121]:
fig = px.line(All_prevision, x='date', y=['sales', 'Test_Forecast','Test_Forecast_Sarimax_global','Test_Forecast_reconciliation'],title="Predictions du test Sarima")
fig.show()

Apr√©s r√©conciliation on a $
\text{Pr√©vision globale} = \sum \text{Pr√©visions des clusters}
$

In [122]:
test_clustered.loc[test_clustered['cluster'] == 'Cluster 0']['sales']

0           0.000000
1       86627.277016
2      110534.310890
3       74601.247117
4       68187.265000
           ...      
603         0.000000
604         0.000000
605         0.000000
606         0.000000
607         0.000000
Name: sales, Length: 608, dtype: float64

In [123]:
df = pd.DataFrame(y_clusters_adjusted).T
df['date'] = All_prevision['date']
df.columns = ['after reconciliation 0','after reconciliation 1','after reconciliation 2','after reconciliation 3','after reconciliation 4','date']
df['before reconciliation 0'] = Cluster_0['Test_Forecast']
df['before reconciliation 1'] = Cluster_1['Test_Forecast']
df['before reconciliation 2'] = Cluster_2['Test_Forecast']
df['before reconciliation 3'] = Cluster_3['Test_Forecast']
df['before reconciliation 4'] = Cluster_4['Test_Forecast']
df

Unnamed: 0,after reconciliation 0,after reconciliation 1,after reconciliation 2,after reconciliation 3,after reconciliation 4,date,before reconciliation 0,before reconciliation 1,before reconciliation 2,before reconciliation 3,before reconciliation 4
0,88635.965084,188061.795644,166527.909988,238452.879175,201612.129664,2016-01-01,79084.115496,167795.327247,148582.039518,212756.018611,179885.410315
1,51556.439432,139543.336352,130762.266501,169755.862906,131374.879364,2016-01-02,69999.225206,189460.822642,177538.585714,230480.983724,178370.342644
2,94626.316702,254961.901478,242762.546019,314265.392584,208734.617462,2016-01-03,87532.289577,235847.698284,224562.914601,290705.274246,193086.021069
3,79591.126212,216922.198985,201929.694480,251363.476212,155861.925824,2016-01-04,98110.070532,267394.787023,248913.886647,309849.722627,192127.254176
4,77882.394919,215809.707956,191100.371199,263110.606586,182518.540275,2016-01-05,77741.496533,215419.282885,190754.648217,262634.608642,182188.342831
...,...,...,...,...,...,...,...,...,...,...,...
603,155444.756585,301014.664816,179998.070843,402593.816925,138069.862674,2017-08-27,189985.820727,367902.523074,219995.077157,492053.372586,168750.086875
604,191867.222005,372933.165785,229111.767248,501903.268112,169577.567742,2017-08-28,192338.622697,373849.429389,229674.674473,503136.399782,169994.204736
605,212243.901953,412946.706781,256707.889949,555754.325816,185869.230593,2017-08-29,194193.867844,377828.137652,234876.468047,508490.849950,170062.199525
606,267778.014984,518091.932751,311543.808678,695322.453099,237687.195759,2017-08-30,190903.550394,369356.645647,222104.937119,495707.330466,169451.288061


In [127]:
fig = px.line(df,x='date',y=df.columns,title='S√©ries temporelles regroup√©es par cluster reconcili√©es',)
fig.show()


In [125]:
fig = px.line(
    test_clustered,
    x='date',
    y='sales',
    color='cluster',
    title='S√©ries temporelles regroup√©es par cluster',
    labels={'date': 'Date', 'sales': 'Ventes', 'cluster': 'Cluster'}
)

fig.show()





Affichons toutes ces courbes sur le m√™me graphe pour pouvoir comparer.

In [128]:
fig = go.Figure()

for col in df.columns:
    fig.add_trace(go.Scatter(
        x=df['date'],
        y=df[col],
        mode='lines',
        name=col
    ))

for cluster in test_clustered['cluster'].unique():
    cluster_data = test_clustered[test_clustered['cluster'] == cluster]
    fig.add_trace(go.Scatter(
        x=cluster_data['date'],
        y=cluster_data['sales'],
        mode='lines',
        name=cluster
    ))

fig.update_layout(
    title='S√©ries temporelles regroup√©es par cluster et r√©concili√©es',
    xaxis_title='Date',
    yaxis_title='Ventes',
    legend_title='S√©ries'
)
fig.show()

### Analyse des R√©sultats apr√®s R√©conciliation

En cliquant sur une combinaison entre **"After Reconciliation $i$"**, **"Before Reconciliation $i$"**, et **"Cluster $i$"**, on observe que la r√©conciliation a permis d'obtenir une pr√©diction pour le $i^{\text{√®me}}$ cluster plus proche de la r√©alit√© (Cluster $i$).