# |PIX Forecasting - Cross-validation| IP45D - Cross-validation and Feature Selection

**Objective**: Perform cross-validation on the past data in order to benchmark the proposed models and figure out which are the best features.

- In this version we will add and/or remove variables.
- We test new Lags of the Target and the External variables.
- We test moving averages of the Target and the External variables.
- We test Bollinger Bands.

## 1.0 Imports

### 1.1 Setting working directory

In [11]:
import sys
import os

if not os.getcwd().split("\\")[-1] == "ip_forecasting":
    # Get the directory of the current notebook
    notebook_dir = os.path.dirname(
        os.path.abspath("__file__")
    )  # Use __file__ for portability

    # Move up one level to the project root
    project_root = os.path.abspath(os.path.join(notebook_dir, "../"))

    # Change working directory
    os.chdir(project_root)

In [12]:
import pandas as pd
import numpy as np
import pandas_gbq
import locale

import warnings

import src.utils.useful_functions as uf
from src.models.train import *
from src.models.evaluate import *

from src.visualization.data_viz import *
from scripts.run_cross_validation import *
from src.data.data_loader import load_and_preprocess_model_dataset

%load_ext autoreload
%autoreload 2

pd.set_option('display.max_columns', None)
warnings.filterwarnings("ignore")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### 1.2 Parameters setting

In [13]:
TARGET_COL = model_config["target_col"]
PREDICTED_COL = model_config["predicted_col"]
FORECAST_HORIZON = model_config["forecast_horizon"]
MODEL_NAME = model_config["model_name"]
USE_TUNED_PARMS = model_config["use_tuned_params"]

## 2.0 Data Loading

In [17]:
feature_df = load_and_preprocess_model_dataset("featurized_df")
feature_df = feature_df.set_index("date")

## 4.0 Modeling: Multiple Tree-based models

### 4.1 Running Backtesting with expanding window

In [18]:
feature_df.tail(12)

Unnamed: 0_level_0,inventories,imports,europulp,second_market_price,final_product_price,index_price
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2024-09-13,2074.61775,1335467000.0,1516.22325,575.96,6306.666667,572.66
2024-09-20,2011.2895,1399020000.0,1534.5655,576.27,6316.666667,571.84
2024-09-27,1947.96125,1462574000.0,1552.90775,579.91,6306.666667,561.5
2024-10-04,1884.633,1526127000.0,1571.25,580.84,6300.0,561.05
2024-10-11,1890.7235,1526127000.0,1553.76125,572.93,6300.0,560.94
2024-10-18,1896.814,1526127000.0,1536.2725,564.79,6283.333333,561.75
2024-10-25,1902.9045,1526127000.0,1518.78375,556.41,6266.666667,559.28
2024-11-01,1908.995,1526127000.0,1501.295,552.6,6233.333333,555.99
2024-11-08,1918.8868,1526127000.0,1501.295,543.56,6150.0,554.88
2024-11-15,1928.7786,1526127000.0,1501.295,536.15,6150.0,554.8


In [19]:
validation_report_df, _ = walk_forward_validation_ml(
    model_df        = feature_df,
    test_start_date = model_config["tuning_holdout_date"],
    step_size       = model_config["cross_validation_step_size"],
    run_name        = f"{desc}",
    table_name      = f"{desc}",
    write_to_table  = True,
    run_description = """
                    Testing the hyper parameter tuning using HyperOPT
                    instead of RandomSearch or GridSearchCV.

    {}""".format(
        "\n".join(list(lags_exog_dict.keys()))
    ),
)

2025-01-29 15:18:45,038 - scripts.run_cross_validation - INFO - Iteration [1] out of [2] end training date: 2024-09-27 00:00:00...
2025-01-29 15:19:02,306 - scripts.run_cross_validation - INFO - Performing cross validation for [RandomForestRegressor]...
Function 'train' executed in 0.93 seconds.
2025-01-29 15:19:12,361 - scripts.run_cross_validation - INFO - Performing cross validation for [XGBRegressor]...
[0]	validation_0-rmse:76.12761	validation_1-rmse:73.26144
[10]	validation_0-rmse:4.28601	validation_1-rmse:19.31985
[20]	validation_0-rmse:2.14170	validation_1-rmse:18.79831
[30]	validation_0-rmse:1.69375	validation_1-rmse:18.73866
[40]	validation_0-rmse:1.38165	validation_1-rmse:18.89270
[50]	validation_0-rmse:1.16225	validation_1-rmse:18.88328
[60]	validation_0-rmse:0.94025	validation_1-rmse:18.85129
[70]	validation_0-rmse:0.84100	validation_1-rmse:18.83329
[80]	validation_0-rmse:0.75845	validation_1-rmse:18.81546
[90]	validation_0-rmse:0.68595	validation_1-rmse:18.80834
[99]	vali