## Goals: Hyper Parameter Optimisation of *QRF* model

This notebook propose different methods of hyper parameter optimisation based on X-Validation :
* Random Search
* Genetic algorithm [Not yet included]

# 1. Data Import and Setup

Imports necessary libraries, sets up environment paths.

In [53]:
# Standard library imports
import os
import sys
import json

# Third-party imports
from functools import partial
import pandas as pd
from quantile_forest import RandomForestQuantileRegressor
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import GradientBoostingRegressor

# Append project root to sys.path for local imports
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..', '..', '..', '..')))

# Local application imports
from src.utils.model import get_station_stats, custom_log_likelihood
from src.utils.SpatioTemporalSplit import SpatioTemporalSplit
from src.utils.custom_models import SnowIndexComputeTransformer


Defines constants :
* INPUT_DIR must be the same as the one defined in *00 Preprocessing/Feature Engineering*.
* MODEL_DIR is the directory where the exploration models will be saved.

In [11]:
INPUT_DIR = "../../../../data/input/"
MODEL_DIR = "../../../../models/exploration/"

SEED = 42 
ALPHA = 0.1
WEEK_TO_PREDICT=4 

DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "clust_hydro",
    # "scl_feat",
    "scl_feat_wl", # Scale all except waterflow lag
    "rm_wl", # remove custom generated water_flow_lag 3w & 4w ---> Need USE_CUSTOM_PREPROCESS = True
    "slct_ma", # keep only specific mobile average 2w or/and 3w or/and 4w ---> Need USE_CUSTOM_PREPROCESS = True
    "lag_slope" # add an indicator that is calculated between water_flow_lag 1w and 2w 
]

PCA_THRESHOLD = 0.98
N_CLUSTER = 10

DATASET_SPEC = "_".join(DATASET_TRANSFORMS)

if "pca" in DATASET_TRANSFORMS:
    DATASET_SPEC += f"_pct_{PCA_THRESHOLD}"

if "clust_index" in DATASET_TRANSFORMS:
    DATASET_SPEC += f"_geocl_{N_CLUSTER}"

if "clust_hydro" in DATASET_TRANSFORMS:
    DATASET_SPEC += f"_hydcl_{N_CLUSTER}"

DATASET_SPEC = "dataset_custom_6"

# columns to drop : target at different horizon, station_code, and features removed from Feature Selection
TO_DROP = ["water_flow_week1", "station_code", "water_flow_week2", "water_flow_week3", "water_flow_week4"]

# 2. Data Loading
Load in the baseline datasets, create the directory to save models.

In [12]:
# load the dataset
ds_train = pd.read_csv(f"{INPUT_DIR}{DATASET_SPEC}.csv")
train_data = ds_train.copy()
train_data.reset_index(inplace=True)
train_data = train_data.loc[:, ~train_data.columns.duplicated()]
ds_train = ds_train.set_index("ObsDate")
y_train = train_data[f"water_flow_week{WEEK_TO_PREDICT}"]
cv_data = train_data.copy()


### 3. Model preparation

Compute station statistics (usefull for scalling)

In [13]:
station_stats = get_station_stats(
    y_train.to_numpy(),
    train_data["station_code"].to_numpy()
)

Create a custom Pipeline to keep track of the station code

In [15]:
from sklearn.feature_selection import SelectKBest, f_regression


cols_to_drop = TO_DROP.copy()
cols_to_drop += ["ObsDate"]
predictor_cols = [col for col in cv_data.columns if col not in cols_to_drop]
preprocessor = ColumnTransformer(transformers=[
    ('select', 'passthrough', predictor_cols)
], remainder='drop')

snowIndexer = SnowIndexComputeTransformer()

qrf_week1 = RandomForestQuantileRegressor(n_estimators=10, max_depth=10, min_samples_leaf=10)
# qrf_week1 = GradientBoostingRegressor()

pipeline = Pipeline(steps=[
    # ('snowindexer', SnowIndexComputeTransformer(temp_col_name="tempartures_pca_1", rain_col_name="precipitations_pca_1",)),
    ('preprocessor', preprocessor),
    ("selector", SelectKBest(score_func=f_regression)),
    ('model', qrf_week1)
])

Initialisation of the log likelihood scorer

In [16]:
def inverted_log_likelihood(estimator, X, y_true, cv_data, station_stats, alpha=0.1):
    return -custom_log_likelihood(estimator, X, y_true, cv_data, station_stats, alpha=alpha)

In [17]:
scorer = partial(inverted_log_likelihood,
                 cv_data=cv_data,
                 station_stats=station_stats,
                 alpha=ALPHA)

Initialisation of the SpatioTemporal Splitter

In [18]:
cv = SpatioTemporalSplit(
    n_splits=10,
    date_col='ObsDate',
    station_col='station_code',
    temporal_frac=0.68,
    spatial_frac=0.68,
    random_state=42
)


### 4. Hyper parameter tuning

Define the hyperparameter distributions for random search, take care the parameters presented here are choosen so that the search is fast you need to explore wider parameters range.

#### a. Random Search

In [34]:
cv_data["water_flow_evolve_slope"].fillna(cv_data["water_flow_evolve_slope"].mean(), inplace=True)

Unnamed: 0,index,ObsDate,precipitations_lag_1w_pca_1,precipitations_lag_1w_pca_2,precipitations_pca_1,precipitations_pca_2,tempartures_lag_1w_pca_1,tempartures_pca_1,tempartures_pca_2,soil_moisture_pca_1,...,water_flow_week4,north_hemisphere,snow_index,month_sin,month_cos,season_sin,season_cos,region_cluster,hydro_cluster,water_flow_evolve_slope


In [35]:
# param_distributions = {
#     'model__n_estimators': [2, 10, 20, 45, 60, 85, 100],
#     'model__max_depth': [2, 7, 13, 20, 30, 50],
#     'model__min_samples_leaf': [1, 4, 9, 15, 20, 30],
#     'model__min_samples_split': [2, 5, 10, 20],
#     'model__max_features': ['sqrt', 'log2', 0.3, 0.7, None],
#     'model__bootstrap': [True, False],
#     'snowindexer__altitude_weight': [0.9, 1, 1.1, 1.2],
#     'snowindexer__temp_weight': [0.9, 1, 1.1, 1.2],
#     'snowindexer__precip_weight': [0.1, 0.15, 0.2, 0.25],
# }

# 'model__n_estimators': 35, 'model__min_samples_split': 9, 'model__min_samples_leaf': 25, 'model__max_features': None, 'model__max_depth': 50, 'model__bootstrap': True
# param_distributions = {
#     'model__n_estimators': [20, 25, 30, 35, 40, 45, 50],
#     'model__max_depth': [35, 40, 45, 50, 55, 60, 65],
#     'model__min_samples_leaf': [20, 23, 25, 28, 31],
#     'model__min_samples_split': [6, 7, 8, 9, 10, 11, 12],
#     'model__max_features': [None],
#     'model__bootstrap': [True],
# }

import numpy as np


param_distributions = {
    "selector__k": np.arange(3, len(cv_data.columns) - 3, 3),
    'model__n_estimators': [5, 15, 35, 55, 85, 100, 130, 150, 190],
    'model__max_depth': [2, 7, 13, 20, 30, 50, 70, 100],
    'model__min_samples_leaf': [9, 15, 25, 35, 50, 60, 80, 105],
    'model__min_samples_split': [9, 15, 25, 35, 50, 60, 80, 105],
    'model__max_features': ['sqrt', 'log2', 0.3, 0.7, None],
    'model__bootstrap': [True, False]
}

# 9. Set up RandomizedSearchCV.
random_search = RandomizedSearchCV(
    estimator=pipeline,
    param_distributions=param_distributions,
    n_iter=60,            # Number of parameter settings sampled
    scoring=scorer,       # Use our custom scorer
    cv=cv,                # Our custom spatio-temporal splitter
    random_state=42,
    n_jobs=-1,             # Use all available cores
    verbose=3,
    error_score='raise'
)

random_search.fit(cv_data, y_train)

Fitting 10 folds for each of 60 candidates, totalling 600 fits


  _data = np.array(data, dtype=dtype, copy=copy,


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ("selector", SelectKBest(score_func=f_regression)),
    ('model', qrf_week1)
])


DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "clust_hydro",
    # "scl_feat",
    "scl_feat_wl", # Scale all except waterflow lag
    "rm_wl", # remove custom generated water_flow_lag 3w & 4w ---> Need USE_CUSTOM_PREPROCESS = True
    "slct_ma", # keep only specific mobile average 2w or/and 3w or/and 4w ---> Need USE_CUSTOM_PREPROCESS = True
    "lag_slope" # add an indicator that is calculated between water_flow_lag 1w and 2w 
]

PCA_THRESHOLD = 0.98
N_CLUSTER = 10
SELECTED_MOBILE_AVERAGE = [
    # "water_flow_ma_4w_lag_1w",
    # "water_flow_ma_3w_lag_1w",
    # "water_flow_ma_2w_lag_1w",
    "water_flow_ma_4w_lag_1w_gauss",
    # "water_flow_ma_3w_lag_1w_gauss",
    # "water_flow_ma_2w_lag_1w_gauss"
]

```


In [36]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'selector__k': np.int64(27), 'model__n_estimators': 85, 'model__min_samples_split': 80, 'model__min_samples_leaf': 15, 'model__max_features': None, 'model__max_depth': 13, 'model__bootstrap': True}
Best Score: -1.9577376316109347


In [64]:
best_selector = random_search.best_estimator_.named_steps["selector"]

selected_mask = best_selector.get_support()
adjusted_data = cv_data.drop(columns=cols_to_drop)
selected_features = adjusted_data.columns[selected_mask]

print("Selected features:")
print(selected_features.tolist())
with open(f"{DATASET_SPEC}_selected_features_week{WEEK_TO_PREDICT}.json", 'w') as json_file:
    json.dump(selected_features.tolist(), json_file, indent=4)


Selected features:
['index', 'precipitations_lag_1w_pca_2', 'precipitations_pca_1', 'precipitations_pca_2', 'tempartures_lag_1w_pca_1', 'tempartures_pca_1', 'tempartures_pca_2', 'soil_moisture_pca_1', 'soil_moisture_pca_2', 'soil_moisture_pca_3', 'evaporation_lag_1w_pca_1', 'evaporation_pca_1', 'soil_composition_pca_1', 'soil_composition_pca_4', 'soil_composition_pca_6', 'soil_composition_pca_7', 'soil_composition_pca_9', 'latitude', 'longitude', 'catchment', 'altitude', 'water_flow_lag_1w', 'water_flow_lag_2w', 'water_flow_ma_4w_lag_1w_gauss', 'north_hemisphere', 'snow_index', 'month_cos']


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])


DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "clust_hydro",
    "scl_feat",
    # "scl_feat_wl", # Scale all except waterflow lag
    "rm_wl", # remove custom generated water_flow_lag 3w & 4w ---> Need USE_CUSTOM_PREPROCESS = True
    "slct_ma", # keep only specific mobile average 2w or/and 3w or/and 4w ---> Need USE_CUSTOM_PREPROCESS = True
    "lag_slope" # add an indicator that is calculated between water_flow_lag 1w and 2w 
]

PCA_THRESHOLD = 0.98
N_CLUSTER = 10
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 35, 'model__min_samples_split': 9, 'model__min_samples_leaf': 25, 'model__max_features': None, 'model__max_depth': 50, 'model__bootstrap': True}
Best Score: -1.8825202404996986


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])


DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "clust_hydro",
    "scl_feat",
    # "scl_feat_wl", # Scale all except waterflow lag
    "rm_wl", # remove custom generated water_flow_lag 3w & 4w ---> Need USE_CUSTOM_PREPROCESS = True
    "slct_ma", # keep only specific mobile average 2w or/and 3w or/and 4w ---> Need USE_CUSTOM_PREPROCESS = True
    "lag_slope" # add an indicator that is calculated between water_flow_lag 1w and 2w 
]

PCA_THRESHOLD = 0.98
N_CLUSTER = 10
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 100, 'model__min_samples_split': 50, 'model__min_samples_leaf': 25, 'model__max_features': None, 'model__max_depth': 70, 'model__bootstrap': True}
Best Score: -1.9117332654881167


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])


DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "clust_hydro",
    # "scl_feat",
    "scl_feat_wl", # Scale all except waterflow lag
    "rm_wl", # remove custom generated water_flow_lag 3w & 4w ---> Need USE_CUSTOM_PREPROCESS = True
    "slct_ma", # keep only specific mobile average 2w or/and 3w or/and 4w ---> Need USE_CUSTOM_PREPROCESS = True
    "lag_slope" # add an indicator that is calculated between water_flow_lag 1w and 2w 
]

PCA_THRESHOLD = 0.98
N_CLUSTER = 10
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 35, 'model__min_samples_split': 11, 'model__min_samples_leaf': 23, 'model__max_features': None, 'model__max_depth': 40, 'model__bootstrap': True}
Best Score: -1.8653183950465695


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])


DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "clust_hydro",
    # "scl_feat",
    "scl_feat_wl", # Scale all except waterflow lag
    "rm_wl", # remove custom generated water_flow_lag 3w & 4w ---> Need USE_CUSTOM_PREPROCESS = True
    "slct_ma", # keep only specific mobile average 2w or/and 3w or/and 4w ---> Need USE_CUSTOM_PREPROCESS = True
    "lag_slope" # add an indicator that is calculated between water_flow_lag 1w and 2w 
]

PCA_THRESHOLD = 0.98
N_CLUSTER = 10
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 35, 'model__min_samples_split': 9, 'model__min_samples_leaf': 25, 'model__max_features': None, 'model__max_depth': 50, 'model__bootstrap': True}
Best Score: -1.8821667848494648


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])


DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "clust_hydro",
    # "scl_feat",
    "scl_feat_wl", # Scale all except waterflow lag
    # "rm_wl", # remove custom generated water_flow_lag 3w & 4w
    # "rm_ma", # remove custom generated mobile average 2w & 3w & 4w
    "lag_slope" # add an indicator that is calculated between water_flow_lag 1w and 2w 
]

PCA_THRESHOLD = 0.98
N_CLUSTER = 10
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 160, 'model__min_samples_split': 66, 'model__min_samples_leaf': 42, 'model__max_features': None, 'model__max_depth': 25, 'model__bootstrap': True}
Best Score: -1.9426425655553738


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])

DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "clust_hydro",
    # "scl_feat",
    "scl_feat_wl", # Scale all except waterflow lag
]

PCA_THRESHOLD = 0.98
N_CLUSTER = 5
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 150, 'model__min_samples_split': 30, 'model__min_samples_leaf': 18, 'model__max_features': None, 'model__max_depth': 23, 'model__bootstrap': True}
Best Score: -1.9362015137618989


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('snowindexer', SnowIndexComputeTransformer(temp_col_name="tempartures_pca_1", rain_col_name="precipitations_pca_1",)),
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])

DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    # "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "clust_hydro",
    # "scl_feat",
    "scl_feat_wl", # Scale all except waterflow lag
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'snowindexer__temp_weight': 1.2, 'snowindexer__precip_weight': 0.15, 'snowindexer__altitude_weight': 0.9, 'model__n_estimators': 100, 'model__min_samples_split': 5, 'model__min_samples_leaf': 30, 'model__max_features': None, 'model__max_depth': 50, 'model__bootstrap': False}
Best Score: -2.1058580480136575


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])

DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    "oh_enc_date",
    # "cyc_enc_date",
    # "clust_index",
    "clust_hydro",
    # "scl_feat",
    "scl_feat_wl", # Scale all except waterflow lag
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 100, 'model__min_samples_split': 10, 'model__min_samples_leaf': 40, 'model__max_features': None, 'model__max_depth': 13, 'model__bootstrap': True}
Best Score: -1.9542643097106727


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])


DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    "oh_enc_date",
    # "cyc_enc_date",
    "clust_index",
    "clust_hydro",
    "scl_feat",
    # "scl_feat_wl", # Scale all except waterflow lag
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 45, 'model__min_samples_split': 40, 'model__min_samples_leaf': 20, 'model__max_features': 0.7, 'model__max_depth': 30, 'model__bootstrap': False}
Best Score: -1.9515434055944696


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])


DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "clust_hydro",
    "scl_feat",
    # "scl_feat_wl", # Scale all except waterflow lag
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 130, 'model__min_samples_split': 30, 'model__min_samples_leaf': 20, 'model__max_features': None, 'model__max_depth': 30, 'model__bootstrap': True}
Best Score: -1.935871280814628


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])


DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "clust_hydro",
    # "scl_feat",
    "scl_feat_wl", # Scale all except waterflow lag
    "scl_catch",
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 135, 'model__min_samples_split': 27, 'model__min_samples_leaf': 27, 'model__max_features': None, 'model__max_depth': 30, 'model__bootstrap': True}
Best Score: -1.9339345588124381


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])


DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "clust_hydro",
    # "scl_feat",
    "scl_feat_wl", # Scale all except waterflow lag
    "scl_catch",
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 130, 'model__min_samples_split': 30, 'model__min_samples_leaf': 20, 'model__max_features': None, 'model__max_depth': 30, 'model__bootstrap': True}
Best Score: -1.9395931333152727


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])

DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "scl_feat",
    # "scl_feat_wl", # Scale all except waterflow lag
    "scl_catch",
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 23, 'model__min_samples_split': 15, 'model__min_samples_leaf': 21, 'model__max_features': 0.7, 'model__max_depth': 23, 'model__bootstrap': True}
Best Score: -1.9359472748030164


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])

DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "scl_feat",
    # "scl_feat_wl", # Scale all except waterflow lag
    "scl_catch",
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 20, 'model__min_samples_split': 10, 'model__min_samples_leaf': 20, 'model__max_features': 0.7, 'model__max_depth': 20, 'model__bootstrap': True}
Best Score: -1.95292479637325


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])

DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "scl_feat",
    # "scl_feat_wl", # Scale all except waterflow lag
    "scl_catch",
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 50, 'model__min_samples_split': 18, 'model__min_samples_leaf': 16, 'model__max_features': None, 'model__max_depth': 16, 'model__bootstrap': True}
Best Score: -1.9292546040045893


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('snowindexer', SnowIndexComputeTransformer(temp_col_name="tempartures_pca_1", rain_col_name="precipitations_pca_1",)),
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])

DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    # "oh_enc_date",
    "cyc_enc_date",
    "clust_index",
    "scl_feat",
    # "scl_feat_wl", # Scale all except waterflow lag
    "scl_catch",
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 60, 'model__min_samples_split': 20, 'model__min_samples_leaf': 9, 'model__max_features': None, 'model__max_depth': 13, 'model__bootstrap': True}
Best Score: -1.9860207069874143


### params for config : 

```python
pipeline = Pipeline(steps=[
    ('snowindexer', SnowIndexComputeTransformer(temp_col_name="tempartures_pca_1", rain_col_name="precipitations_pca_1",)),
    ('preprocessor', preprocessor),
    ('model', qrf_week1)
])

DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    # "snow_index",
    "oh_enc_date",
    "scl_wtr_flows",
    "scl_catch"
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'snowindexer__temp_weight': 1, 'snowindexer__precip_weight': 0.2, 'snowindexer__altitude_weight': 1, 'model__n_estimators': 52, 'model__min_samples_split': 19, 'model__min_samples_leaf': 13, 'model__max_features': None, 'model__max_depth': 25, 'model__bootstrap': True}
Best Score: -2.0340065806334726


### params for config : 
```python
DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    "oh_enc_date",
    "scl_wtr_flows",
    "scl_catch"
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 55, 'model__min_samples_split': 20, 'model__min_samples_leaf': 11, 'model__max_features': None, 'model__max_depth': 23, 'model__bootstrap': True}
Best Score: -1.9197539988339372


### params for config : 
```python
DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    "snow_index",
    "oh_enc_date",
    "scl_wtr_flows",
    "scl_catch"
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 60, 'model__min_samples_split': 20, 'model__min_samples_leaf': 9, 'model__max_features': None, 'model__max_depth': 13, 'model__bootstrap': True}
Best Score: -1.9379500224596236


### params for config : 
```python
DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    # "snow_index",
    "oh_enc_date",
    "scl_wtr_flows",
    "scl_catch"
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 59, 'model__min_samples_split': 18, 'model__min_samples_leaf': 14, 'model__max_features': None, 'model__max_depth': 25, 'model__bootstrap': True}
Best Score: -1.923900362267943


### params for config : 
```python
DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    # "snow_index",
    "oh_enc_date",
    "scl_wtr_flows",
    "scl_catch"
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 60, 'model__min_samples_split': 20, 'model__min_samples_leaf': 9, 'model__max_features': None, 'model__max_depth': 13, 'model__bootstrap': True}
Best Score: -1.9528121007193104


### params for config : 
```python
DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    # "snow_index",
    "oh_enc_date",
    "scl_wtr_flows",
    # "rm_st_id"
]
```


In [None]:
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 57, 'model__min_samples_split': 20, 'model__min_samples_leaf': 14, 'model__max_features': None, 'model__max_depth': 22, 'model__bootstrap': True}
Best Score: -2.067613593686774


### params for config : 
```python
DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    # "snow_index",
    "oh_enc_date",
    "scl_wtr_flows",
    # "rm_st_id"
]
```


In [None]:
# print("Best Parameters:", random_search.best_params_)
# print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 55, 'model__min_samples_split': 22, 'model__min_samples_leaf': 16, 'model__max_features': None, 'model__max_depth': 25, 'model__bootstrap': True}
Best Score: -1.9144132951754593


### params for config : 
```python
DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    # "snow_index",
    "oh_enc_date",
    "scl_wtr_flows",
    # "rm_st_id"
]
```


In [None]:
# print("Best Parameters:", random_search.best_params_)
# print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 56, 'model__min_samples_split': 19, 'model__min_samples_leaf': 13, 'model__max_features': None, 'model__max_depth': 23, 'model__bootstrap': True}
Best Score: -1.9230515071418086


### params for config : 
```python
DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    # "snow_index",
    "oh_enc_date",
    "scl_wtr_flows",
    # "rm_st_id"
]
```


In [None]:
# print("Best Parameters:", random_search.best_params_)
# print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 57, 'model__min_samples_split': 20, 'model__min_samples_leaf': 11, 'model__max_features': None, 'model__max_depth': 17, 'model__bootstrap': True}
Best Score: -1.9198867119123673


### params for config : 
```python
DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    # "snow_index",
    "oh_enc_date",
    "scl_wtr_flows",
    # "rm_st_id"
]
```


In [None]:
# print("Best Parameters:", random_search.best_params_)
# print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 60, 'model__min_samples_split': 20, 'model__min_samples_leaf': 9, 'model__max_features': None, 'model__max_depth': 13, 'model__bootstrap': True}
Best Score: -2.1232930047642933


### params for config : 
```python
DATASET_TRANSFORMS = [
    "rm_gnv_st",
    "pca",
    # "snow_index",
    "oh_enc_date",
    "scl_wtr_flows",
    # "rm_st_id"
]
```


In [None]:
# print("Best Parameters:", random_search.best_params_)
# print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 60, 'model__min_samples_split': 20, 'model__min_samples_leaf': 9, 'model__max_features': None, 'model__max_depth': 13, 'model__bootstrap': True}
Best Score: -2.1232930047642933


### params for config : 
```python
DATASET_TRANSFORMS = [
    "remove_geneve_station",
    "full_pca",
    # "snow_index",
    "one_hot_encode_month_season",
    "scale_train_waterflows",
]
```


In [None]:
# print("Best Parameters:", random_search.best_params_)
# print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 60, 'model__min_samples_split': 20, 'model__min_samples_leaf': 9, 'model__max_features': None, 'model__max_depth': 13, 'model__bootstrap': True}
Best Score: -2.1232930047642933


### params for qrf pca_and_sin_season_encode

In [None]:
# print("Best Parameters:", random_search.best_params_)
# print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 60, 'model__min_samples_split': 20, 'model__min_samples_leaf': 9, 'model__max_features': None, 'model__max_depth': 13, 'model__bootstrap': True}
Best Score: -2.1232930047642933


### params for qrf full_pca

In [None]:
# print("Best Parameters:", random_search.best_params_)
# print("Best Score:", random_search.best_score_)

Best Parameters: {'model__n_estimators': 60, 'model__min_samples_split': 20, 'model__min_samples_leaf': 9, 'model__max_features': None, 'model__max_depth': 13, 'model__bootstrap': True}
Best Score: -2.1232930047642933


#### b. GA

COMMING SOON