# |PIX Forecasting - Modeling| IP45D V2.0 - Training the V2 Artefact model for out of sample inference.

**Objective**: Train the pool of boosting models using the whole available data in order to perform out-of-sample inference.

- Train V2 model using both market and Suzano features.
- Save the models into the MLflow experiment and register the model.

## 1.0 Imports

### 1.1 Setting working directory

In [1]:
import sys
import os

if not os.getcwd().split("\\")[-1] == "ip_forecasting":
    # Get the directory of the current notebook
    notebook_dir = os.path.dirname(
        os.path.abspath("__file__")
    )  # Use __file__ for portability

    # Move up one level to the project root
    project_root = os.path.abspath(os.path.join(notebook_dir, ".."))

    # Change working directory
    os.chdir(project_root)

In [2]:
import pandas as pd
import numpy as np
import pandas_gbq
import locale

import src.utils.useful_functions as uf
from src.models.train import *
from src.models.evaluate import *

from src.visualization.data_viz import *
from scripts.run_train_and_predict import *
from src.data.data_loader import load_and_preprocess_model_dataset

%load_ext autoreload
%autoreload 2

pd.set_option('display.max_columns', None)



  from .autonotebook import tqdm as notebook_tqdm


In [3]:
type(get_models()['Ridge']).__name__

'Ridge'

### 1.2 Parameters setting

In [4]:
TARGET_COL = model_config["target_col"]
PREDICTED_COL = model_config["predicted_col"]
FORECAST_HORIZON = model_config["forecast_horizon"]
N_SPLITS = model_config["n_windows"]
MODEL_NAME = model_config["model_name"]
USE_TUNED_PARMS = model_config["use_tuned_params"]
EXPERIMENT_PATH = model_config["mlflow_experiment_path_production"]

## 2.0 Data Loading

In [5]:
feature_df = load_and_preprocess_model_dataset("featurized_df")
feature_df = feature_df.set_index("date")

## 4.0 Modeling: Multiple Tree-based models

### 4.1 Running Backtesting with rolling window

In [5]:
trained_models      = training_pipeline(
    model_df        = feature_df,
    models_list     = models_list,
    experiment_path = EXPERIMENT_PATH,
    model_name      = MODEL_NAME,
    load_best_params= False,
)

2025-01-27 11:42:25,333 - scripts.run_train_and_predict - INFO - Starting the training pipeline...
2025-01-27 11:42:25,383 - scripts.run_train_and_predict - INFO - Most recent date on training data: 2024-11-29 00:00:00
2025-01-27 11:42:25,805 - scripts.run_train_and_predict - INFO - Training model [RandomForestRegressor] of flavor [mix]...
Function 'train' executed in 0.83 seconds.


Downloading artifacts: 100%|██████████| 7/7 [00:00<00:00, 193.01it/s]


2025-01-27 11:42:35,956 - scripts.run_train_and_predict - INFO - Registering the model to MLflow...
2025-01-27 11:42:35,956 - scripts.run_train_and_predict - INFO - Model registration URI: runs:/6ba3afde7a5a42aabd31cb597e8c9581/get_models_RandomForestRegressor_mix_01_2025


Registered model 'fcst_RandomForestRegressor_mix' already exists. Creating a new version of this model...
Created version '10' of model 'fcst_RandomForestRegressor_mix'.


2025-01-27 11:42:36,462 - scripts.run_train_and_predict - INFO - Training model [XGBRegressor] of flavor [mix]...
[0]	validation_0-rmse:75.86817	validation_1-rmse:76.21745
[1]	validation_0-rmse:53.93809	validation_1-rmse:55.57516
[2]	validation_0-rmse:38.48476	validation_1-rmse:41.37324
[3]	validation_0-rmse:27.63600	validation_1-rmse:32.05943
[4]	validation_0-rmse:20.01516	validation_1-rmse:25.83018
[5]	validation_0-rmse:14.65725	validation_1-rmse:22.47275
[6]	validation_0-rmse:10.93058	validation_1-rmse:20.48332
[7]	validation_0-rmse:8.32154	validation_1-rmse:19.41821
[8]	validation_0-rmse:6.51430	validation_1-rmse:18.70125
[9]	validation_0-rmse:5.26194	validation_1-rmse:18.05239
[10]	validation_0-rmse:4.41267	validation_1-rmse:17.82256
[11]	validation_0-rmse:3.79670	validation_1-rmse:17.64857
[12]	validation_0-rmse:3.37282	validation_1-rmse:17.54501
[13]	validation_0-rmse:3.08870	validation_1-rmse:17.50244
[14]	validation_0-rmse:2.90694	validation_1-rmse:17.47780
[15]	validation_0-r

Downloading artifacts: 100%|██████████| 7/7 [00:00<00:00, 199.99it/s]


2025-01-27 11:42:44,999 - scripts.run_train_and_predict - INFO - Registering the model to MLflow...
2025-01-27 11:42:44,999 - scripts.run_train_and_predict - INFO - Model registration URI: runs:/2ea393c0cdd4473e897b80905be0b2d2/get_models_XGBRegressor_mix_01_2025


Registered model 'fcst_XGBRegressor_mix' already exists. Creating a new version of this model...
Created version '11' of model 'fcst_XGBRegressor_mix'.


2025-01-27 11:42:46,068 - scripts.run_train_and_predict - INFO - Training model [LGBMRegressor] of flavor [mix]...
Function 'train' executed in 0.43 seconds.


Downloading artifacts: 100%|██████████| 7/7 [00:00<00:00, 251.47it/s]


2025-01-27 11:42:54,338 - scripts.run_train_and_predict - INFO - Registering the model to MLflow...
2025-01-27 11:42:54,338 - scripts.run_train_and_predict - INFO - Model registration URI: runs:/bbc8b83fbd164a989996f8a0def1a399/get_models_LGBMRegressor_mix_01_2025


Registered model 'fcst_LGBMRegressor_mix' already exists. Creating a new version of this model...
Created version '11' of model 'fcst_LGBMRegressor_mix'.


2025-01-27 11:42:54,884 - scripts.run_train_and_predict - INFO - Training model [CatBoostRegressor] of flavor [mix]...
Learning rate set to 0.039032
0:	learn: 103.6015012	test: 103.6015012	test1: 101.5443889	best: 101.5443889 (0)	total: 156ms	remaining: 2m 36s
999:	learn: 2.7271588	test: 2.7271588	test1: 16.5292452	best: 16.5181977 (671)	total: 3.1s	remaining: 0us

bestTest = 16.51819767
bestIteration = 671

Function 'train' executed in 3.45 seconds.


Downloading artifacts: 100%|██████████| 7/7 [00:00<00:00, 353.61it/s] 


2025-01-27 11:43:05,386 - scripts.run_train_and_predict - INFO - Registering the model to MLflow...
2025-01-27 11:43:05,386 - scripts.run_train_and_predict - INFO - Model registration URI: runs:/4dfbb1a07fc040afad5cad5f7d1814b2/get_models_CatBoostRegressor_mix_01_2025


Registered model 'fcst_CatBoostRegressor_mix' already exists. Creating a new version of this model...
Created version '11' of model 'fcst_CatBoostRegressor_mix'.


2025-01-27 11:43:05,801 - scripts.run_train_and_predict - INFO - Training model [auto_arima] of flavor [mix]...
Function 'train' executed in 3.12 seconds.


Downloading artifacts: 100%|██████████| 7/7 [00:00<00:00, 315.61it/s]


2025-01-27 11:43:16,524 - scripts.run_train_and_predict - INFO - Registering the model to MLflow...
2025-01-27 11:43:16,525 - scripts.run_train_and_predict - INFO - Model registration URI: runs:/20f398c5686b48afae50376ea6930e33/get_models_auto_arima_mix_01_2025


Registered model 'fcst_auto_arima_mix' already exists. Creating a new version of this model...
Created version '8' of model 'fcst_auto_arima_mix'.


2025-01-27 11:43:16,900 - scripts.run_train_and_predict - INFO - Training model [ExtraTreeRegressor] of flavor [mix]...
Function 'train' executed in 0.01 seconds.


Downloading artifacts: 100%|██████████| 7/7 [00:00<00:00, 506.70it/s]


2025-01-27 11:43:22,755 - scripts.run_train_and_predict - INFO - Registering the model to MLflow...
2025-01-27 11:43:22,765 - scripts.run_train_and_predict - INFO - Model registration URI: runs:/84cf5a48647749e78d7432544dc73137/get_models_ExtraTreeRegressor_mix_01_2025


Registered model 'fcst_ExtraTreeRegressor_mix' already exists. Creating a new version of this model...
Created version '10' of model 'fcst_ExtraTreeRegressor_mix'.


2025-01-27 11:43:23,321 - scripts.run_train_and_predict - INFO - Training model [Ridge] of flavor [mix]...
Function 'train' executed in 0.01 seconds.


Downloading artifacts: 100%|██████████| 7/7 [00:00<00:00, 395.32it/s] 

2025-01-27 11:43:30,160 - scripts.run_train_and_predict - INFO - Registering the model to MLflow...





2025-01-27 11:43:30,160 - scripts.run_train_and_predict - INFO - Model registration URI: runs:/063f4c4bed8b4c2b8a7c7119bd263e3f/get_models_Ridge_mix_01_2025


Registered model 'fcst_Ridge_mix' already exists. Creating a new version of this model...
Created version '2' of model 'fcst_Ridge_mix'.


2025-01-27 11:43:30,409 - scripts.run_train_and_predict - INFO - Training model [SVR] of flavor [mix]...
Function 'train' executed in 0.04 seconds.


Downloading artifacts: 100%|██████████| 7/7 [00:00<00:00, 306.99it/s] 


2025-01-27 11:43:36,453 - scripts.run_train_and_predict - INFO - Registering the model to MLflow...
2025-01-27 11:43:36,455 - scripts.run_train_and_predict - INFO - Model registration URI: runs:/98499425fb21460ba458a19253824e93/get_models_SVR_mix_01_2025
2025-01-27 11:43:36,597 - scripts.run_train_and_predict - INFO - Training Pipeline completed successfully!
Function 'training_pipeline' executed in 71.26 seconds.


Registered model 'fcst_SVR_mix' already exists. Creating a new version of this model...
Created version '2' of model 'fcst_SVR_mix'.
