<a href="https://colab.research.google.com/github/ihsanenginbal/3ECEES_Exercises/blob/main/Hyperparameter_Selection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Dr. İhsan Engin Bal
04.08.2022
Hyperparameter selection for RandomForestRegerssor model

Hyperparameters are values that affect the model but are not related to the data itself. There is no straightforward way of defining those parameters. A try-error method is the needed. Here a grid-search is applied to define the best combination of hyperparameters among the pre-defined ones. However, please note that this is a simple code to find the best hyperparameters still from among the user-defined ranges of values. These user-defined ranges would depend on your own particular problem. It needs trying, observing and adapting.

We first import the packages we need.
We require pandas and numpy for data handling, and sklearn for the grid-search.  

In [1]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor

Import the OS for handling files

In [2]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

This part below is needed to access the data files, you need to give permission to access the data files in your GoogleDrive.

In [3]:
from google.colab import drive
drive.mount("/content/gdrive")

Mounted at /content/gdrive


Read the CSV file with the data (also can be found in the GitHub link). The target data is the frame displacements.

In [5]:
data = pd.read_csv('/content/gdrive/MyDrive/Colab Notebooks/Test_Table_Long.csv')

data.head()

data.isnull().sum()

#Get Target data 
y = data['ADE_frame_disp_m']

#Load X Variables into a Pandas Dataframe with columns 
X = data.drop(['ADE_frame_disp_m'], axis = 1)

print(f'X : {X.shape}')

X : (362850, 30)


Here import train and test splitting function from sklearn.
Spare 20% of the data for testing.
Here also define ranges of values for the hyperparameters. 

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=101)

print(f'X_train : {X_train.shape}')
print(f'y_train : {y_train.shape}')
print(f'X_test : {X_test.shape}')
print(f'y_test : {y_test.shape}')

# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 100, stop = 300, num = 3)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [20,30]
# Minimum number of samples required to split a node
min_samples_split = [5, 15]
# Minimum number of samples required at each leaf node
min_samples_leaf = [8, 12]
# Method of selecting samples for training each tree
bootstrap = [True, False]

X_train : (290280, 30)
y_train : (290280,)
X_test : (72570, 30)
y_test : (72570,)


Here create a grid for the grid search.

In [7]:
# Create the param grid
param_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'bootstrap': bootstrap}
print(param_grid)

{'n_estimators': [100, 200, 300], 'max_features': ['auto', 'sqrt'], 'max_depth': [20, 30], 'min_samples_split': [5, 15], 'min_samples_leaf': [8, 12], 'bootstrap': [True, False]}


Define the ML method for the grid-search success evaluation.

In [8]:
rf_Model = RandomForestRegressor()

from sklearn.model_selection import GridSearchCV
rf_Grid = GridSearchCV(estimator = rf_Model, param_grid = param_grid, cv = 3, verbose=2, n_jobs = 4)

Finally run the tests for finding the best parameter combo.

In [None]:
rf_Grid.fit(X_train, y_train)



rf_Grid.best_params_
# used data
# bootstrap = True
# max_depth=20
# max_features = 'auto'
# min_samples_leaf = 8
# min_samples_split = 10
# n_estimators = 200


Fitting 3 folds for each of 96 candidates, totalling 288 fits
