## Objective: Final Prediction Computation

This notebook generates the final model predictions and formats them for submission on Codabench. 

The evaluation dataset comprises data from 39 stations included in the training set and 13 stations exclusive to the evaluation set.

<img src="../images/notebook-4.png" alt="Experiment Diagram" style="width:75%;" style="text-align:center;" />


### 1. Imports
Starts by importing the necessary libraries, configuring environment paths, and loading custom utility functions.


In [1]:
import sys
import pandas as pd
import os
import zipfile

import joblib
import pandas as pd

sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..','..','..')))

from src.utils.model import load_models_auto
from src.utils.analysis import create_predict_function, create_quantile_function
from src.utils.model import load_models_auto



Defines constants :
* *DATASET_DIR* must be the directory where you unzip the *zenodo* dataset.
* *EVAL_DIR* will be used to store inference / evaluation data it must be the same as the one defined in *01 Training > 01 - Modelisation* 
* *FINAL_MODEL* will be used to store inference / evaluation data

FINAL_MODEL describe the model that will be loaded if you use auto-loading

In [2]:
ALPHA = 0.1
NUMBER_OF_WEEK = 4
USE_AUTO_SCAN = True  # Toggle this to switch between the loading of the last model of the manual load of a specific model
FINAL_MODEL = "lgbm" 
MODEL_DIR = "../../../models/"
EVAL_DIR = "../../../data/evaluation/"
DATASET_SPEC = "soil_pca"


### 2. Data and models Loading

Loading of the inference dataset.

In [3]:
# load the dataset
inference_data = pd.read_csv(f"{EVAL_DIR}dataset_{DATASET_SPEC}.csv")
inference_data = inference_data.set_index("ObsDate")


Loading of the final models.

In [4]:
# Load models based on conditions
final_models = []
if FINAL_MODEL == "mapie":
    if USE_AUTO_SCAN:
        final_models = load_models_auto("mapie_quantile", f"{MODEL_DIR}final/")
    else:
        final_models.append(joblib.load(f"{MODEL_DIR}final/mapie_quantile_2025-01-17_15-15-04_week0.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/mapie_quantile_2025-01-17_15-15-11_week1.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/mapie_quantile_2025-01-17_15-15-17_week2.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/mapie_quantile_2025-01-17_15-15-17_week3.pkl"))
elif FINAL_MODEL == "qrf":

    if USE_AUTO_SCAN:
        final_models = load_models_auto("qrf_quantile", f"{MODEL_DIR}final/")
    else:
        final_models.append(joblib.load(f"{MODEL_DIR}final/qrf_quantile_2025-01-17_15-15-04_week0.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/qrf_quantile_2025-01-17_15-15-11_week1.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/qrf_quantile_2025-01-17_15-15-17_week2.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/qrf_quantile_2025-01-17_15-15-17_week3.pkl"))
elif FINAL_MODEL == "lgbm":

    if USE_AUTO_SCAN:
        models_low = load_models_auto("lgbm_quantile_q0.05", f"{MODEL_DIR}final/")
        models_med = load_models_auto("lgbm_quantile_q0.5", f"{MODEL_DIR}final/")
        models_high = load_models_auto("lgbm_quantile_q0.95", f"{MODEL_DIR}final/")
        final_models = [[] for _ in range(NUMBER_OF_WEEK)]
        final_models[0] = [models_low[0], models_med[0], models_high[0]]
        final_models[1] = [models_low[1], models_med[1], models_high[1]]
        final_models[2] = [models_low[2], models_med[2], models_high[2]]
        final_models[3] = [models_low[3], models_med[3], models_high[3]]
    else:
        final_models.append(joblib.load(f"{MODEL_DIR}final/qrf_quantile_2025-01-17_15-15-04_week0.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/qrf_quantile_2025-01-17_15-15-11_week1.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/qrf_quantile_2025-01-17_15-15-17_week2.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/qrf_quantile_2025-01-17_15-15-17_week3.pkl"))
elif FINAL_MODEL == "ebm_ensemble":
    print("Loading EBM Ensemble")
    if USE_AUTO_SCAN:
        final_models = load_models_auto("ebm_ensemble", f"{MODEL_DIR}final/")
    else:
        final_models.append(joblib.load(f"{MODEL_DIR}final/ebm_ensemble_2025-01-17_15-15-04_week0.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/ebm_ensemble_2025-01-17_15-15-11_week1.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/ebm_ensemble_2025-01-17_15-15-17_week2.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/ebm_ensemble_2025-01-17_15-15-17_week3.pkl"))
elif FINAL_MODEL == "deep_ensemble":
    if USE_AUTO_SCAN:
        final_models = load_models_auto("deep_ensemble", f"{MODEL_DIR}final/")
    else:
        final_models.append(joblib.load(f"{MODEL_DIR}final/deep_ensemble_2025-01-17_15-15-04_week0.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/deep_ensemble_2025-01-17_15-15-11_week1.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/deep_ensemble_2025-01-17_15-15-17_week2.pkl"))
        final_models.append(joblib.load(f"{MODEL_DIR}final/deep_ensemble_2025-01-17_15-15-17_week3.pkl"))


In [5]:
final_models

[[LGBMRegressor(alpha=0.05, max_depth=5, objective='quantile'),
  LGBMRegressor(alpha=0.5, max_depth=5, objective='quantile'),
  LGBMRegressor(alpha=0.95, max_depth=5, objective='quantile')],
 [LGBMRegressor(alpha=0.05, max_depth=5, objective='quantile'),
  LGBMRegressor(alpha=0.5, max_depth=5, objective='quantile'),
  LGBMRegressor(alpha=0.95, max_depth=5, objective='quantile')],
 [LGBMRegressor(alpha=0.05, max_depth=5, objective='quantile'),
  LGBMRegressor(alpha=0.5, max_depth=5, objective='quantile'),
  LGBMRegressor(alpha=0.95, max_depth=5, objective='quantile')],
 [LGBMRegressor(alpha=0.05, max_depth=5, objective='quantile'),
  LGBMRegressor(alpha=0.5, max_depth=5, objective='quantile'),
  LGBMRegressor(alpha=0.95, max_depth=5, objective='quantile')]]

### 3. Predictions computation

Evaluation data include a spatio-temporal split and a temporal only split.

<img src="../images/eval.png" alt="Experiment Diagram" style="width:50%;" />


In [6]:
predictions = inference_data[['station_code']].copy()
y_pred_test_quantile = {}
y_pred_test = {}
X_test = inference_data.drop(columns=['station_code'])
for i in range(NUMBER_OF_WEEK):
    
    if FINAL_MODEL == "qrf":
        # reorder the columns
        X_test = X_test[final_models[0].feature_names_in_]
    print(FINAL_MODEL)
    predict_adjusted = create_predict_function(final_models, i, FINAL_MODEL)
    quantile_adjusted = create_quantile_function(final_models, i, FINAL_MODEL, ALPHA)
    
    y_pred_test[i] = predict_adjusted(X_test)
    y_pred_test_quantile[i] = quantile_adjusted(X_test)

for i in range(NUMBER_OF_WEEK):
    predictions[f"week_{i}_pred"] = y_pred_test[i]
    predictions[f"week_{i}_sup"] = y_pred_test_quantile[i][:,1]
    predictions[f"week_{i}_inf"] = y_pred_test_quantile[i][:,0]


lgbm
lgbm
lgbm
lgbm


### 4. Saving of the predictions

Saving of the predictions as a csv file

> The file must be named `predictions.csv`

In [7]:
# save the predictions to a csv file
predictions["ObsDate"] = X_test.index
predictions.to_csv(f"{EVAL_DIR}lgbm/predictions.csv", index=False)

Compression of the submission file.

> The file need to be compress for Codabench.

In [8]:
# Create a ZIP file containing predictions.csv
with zipfile.ZipFile(f"{EVAL_DIR}lgbm/predictions.zip", 'w', zipfile.ZIP_DEFLATED) as zipf:
    zipf.write(f"{EVAL_DIR}lgbm/predictions.csv", "predictions.csv")


You are ready to submit go to codabench and submit the zip file that have been generated in My Submissions > Phase 1.


You don't have to use this notebook to submit but the file file format must includes the following columns:
* station_code: Identification code of the station.
* ObsDate: Date of the prediction.
* for every week of prediction i from 0 to 3 :
    * week_i_pred
    * week_i_inf
    * week_i_sup


Save the dataset as a CSV file named predictions.csv. 
> The file must be named predictions.csv, but the .zip file can have any name.


Compress the CSV file into a .zip archive. 
> You cannot submit an uncompressed file. Ensure that the software you use does not create a subfolder inside the archive.


Submit your file in [Codabench](https://www.codabench.org/competitions/4335): 
> My Submissions > Phase 1 (keep all the tasks selected):

<img src="../images/submissions.png" alt="Experiment Diagram" style="width:75%;" style="text-align:center;" />
