# Randomized Grid Search for Hyperparameter Optimization of the Random Forest Regressor
In this notebook, after we found out that the RF regressor is the best model for our dataset, we will perform a randomized grid search to find the best hyperparameters for the RF regressor. We will use the `RandomizedSearchCV` class from the `sklearn` library to perform the randomized grid search. We will use the `mean_squared_error` as the scoring metric for the grid search.

## Imports

In [None]:
# Loop printing
from tqdm import tqdm

# Data management
import pandas as pd

# Test and train split and mean squared error metric
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Randomized search for hyperparameter tuning
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint

# Random forest regressor
from sklearn.ensemble import RandomForestRegressor

## Data Loading

In [None]:
df = pd.read_csv('../../data/no_outliers.csv', sep=';', index_col=1)
df = df.rename(columns={'Unnamed: 0': 'Timestamp'})
df['Timestamp'] = pd.to_datetime(df['Timestamp'])

df['Month'] = df['Timestamp'].dt.month
df['Day'] = df['Timestamp'].dt.day
df['Hour'] = df['Timestamp'].dt.hour + df['Timestamp'].dt.minute / 60

df = df.drop(columns=df.columns[:9])
df = df.drop(columns=df.columns[1:10])
df

In [None]:
target = df.drop(columns=df.columns[1:])
target

In [None]:
features = df.drop(columns=['Power_Total'])
features

## Splitting into Train, Validation and Test Sets

In [None]:
features_train, features_temp, target_train, target_temp = train_test_split(features, target, test_size=0.25, random_state=42)
features_val, features_test, target_val, target_test = train_test_split(features_temp, target_temp, test_size=0.5, random_state=42)

print('Training features shape:', features_train.shape)
print('Validation features shape:', features_val.shape)
print('Testing features shape:', features_test.shape)
print('Training target shape:', target_train.shape)
print('Validation target shape:', target_val.shape)
print('Testing target shape:', target_test.shape)

## Randomized Grid Search

In [None]:
# Ravel the target arrays
target_train = target_train.values.ravel()
target_val = target_val.values.ravel()
target_test = target_test.values.ravel()

# Define the parameter grid with an expanded hyperparameter space
param_dist = {
    'n_estimators': randint(100, 2000),         # Increased number of trees in the forest
    'max_features': ['auto', 'sqrt', 'log2'],   # Number of features to consider at every split
    'max_depth': [None] + list(randint(10, 200).rvs(50)),  # Expanded range for maximum depth
    'min_samples_split': randint(2, 50),        # Increased range for minimum samples required to split a node
    'min_samples_leaf': randint(1, 50),         # Increased range for minimum samples required at each leaf node
    'bootstrap': [True, False]                  # Method of selecting samples for training each tree
}

# Increase the number of iterations
n_iter_search = 500

# Increase the number of CV folds if computational resources allow
cv_folds = 10  # You can increase this for more robust cross-validation

# Instantiate a random forest regressor
rf_regressor = RandomForestRegressor()

# Create randomized search with adjusted parameters
random_search = RandomizedSearchCV(estimator=rf_regressor, 
                                   param_distributions=param_dist, 
                                   n_iter=n_iter_search,
                                   cv=cv_folds, 
                                   verbose=2, 
                                   random_state=42, 
                                   n_jobs=-1)

random_search.fit(features_train, target_train)

best_params = random_search.best_params_
best_estimator = random_search.best_estimator_

print('Best hyperparameters:', best_params)
print('Best estimator:', best_estimator)