<a href="https://colab.research.google.com/github/sandipanpaul21/Tree-Based-Models-in-Python/blob/master/10_Grid_Search_in_RF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Hyperparameter Tuning the Random Forest in Python**

  - The best way to think about hyperparameters is like the settings of an algorithm that can be adjusted to optimize performance, just as we might *turn the knobs of an AM radio to get a clear signal* (or your parents might have!). 
  - While 
    1. **model parameters** are learned during training — such as the slope and intercept in a linear regression
    2. **hyperparameters** must be set by the programmer before training.
      
      i. In the case of a random forest, hyperparameters include the number of decision trees in the forest and the number of features considered by each tree when splitting a node. 

      ii. The parameters of a random forest are the variables and thresholds used to split each node learned during training. 

  - Hyperparameter tuning relies more on experimental results than theory, and thus the best method to determine the optimal settings is to try many different combinations evaluate the performance of each model.
  - However, evaluating each model only on the training set can lead to one of the most fundamental problems in machine learning: **overfitting.**
  - Overfitting is if we optimize the model for the training data, then our model will score very well on the training set, but will not be able to generalize to new data, such as in a test set. When a model performs highly on the training set but poorly on the test set, this is known as overfitting, or essentially creating a model that knows the training set very well but cannot be applied to new problems. 
  - It’s like a student who has memorized the simple problems in the textbook but has no idea how to apply concepts in the messy real world.
  - An overfit model may look impressive on the training set, but will be useless in a real application. Therefore, the standard procedure for hyperparameter optimization accounts for overfitting through cross validation.

### **Cross Validation**

- When we approach a machine learning problem, we make sure to split our data into a training and a testing set. 
- In K-Fold CV, we further split our training set into K number of subsets, called folds. We then iteratively fit the model K times, each time training the data on K-1 of the folds and evaluating on the Kth fold (called the validation data). 
- As an example, 

  i. consider fitting a model with K = 5. The first iteration we train on the first four folds and evaluate on the fifth.
  
  ii. The second time we train on the first, second, third, and fifth fold and evaluate on the fourth. We repeat this procedure 3 more times, each time evaluating on a different fold. 
  
  iii. At the very end of training, we average the performance on each of the folds to come up with final validation metrics for the model.

![K Cross Validation](https://miro.medium.com/max/1400/0*KH3dnbGNcmyV_ODL.png)



- For hyperparameter tuning, we perform many iterations of the entire K-Fold CV process, each time using different model settings.
- We then compare all of the models, select the best one, train it on the full training set, and then evaluate on the testing set.
- Each time we want to assess a different set of hyperparameters, we have to split our training data into K fold and train and evaluate K times. 
- If we have 10 sets of hyperparameters and are using 5-Fold CV, that represents 50 training loops.

In [None]:
# Libraries
from sklearn import datasets
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

In [None]:
# Lets Take another Dataset : IRIS Dataset
# Loading the Dataset

boston = datasets.load_boston()
boston_data = pd.DataFrame(boston.data)
boston_data.columns = boston.feature_names
boston_data['Type']=boston.target
boston_data.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,Type
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


In [None]:
print('Feature Columns are ')
features = boston_data.iloc[:,:13]
features.head()

Feature Columns are 


Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


In [None]:
print("Target Variable is")
labels = boston_data['Type']
labels.head()

Target Variable is


0    24.0
1    21.6
2    34.7
3    33.4
4    36.2
Name: Type, dtype: float64

In [None]:
# Convert to numpy arrays
import numpy as np

features = np.array(features)
labels = np.array(labels)

# Training and Testing Sets
from sklearn.model_selection import train_test_split

train_features, test_features, train_labels, test_labels = train_test_split(features, labels, 
                                                                            test_size = 0.25, 
                                                                            random_state = 42)
print('Training Features Shape:', train_features.shape)
print('Training Labels Shape:', train_labels.shape)
print('Testing Features Shape:', test_features.shape)
print('Testing Labels Shape:', test_labels.shape)

Training Features Shape: (379, 13)
Training Labels Shape: (379,)
Testing Features Shape: (127, 13)
Testing Labels Shape: (127,)


In [None]:
# Examine the Default Random Forest to Determine Parameters

from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(random_state = 42)

from pprint import pprint

# Look at parameters used by our current forest
print('Parameters currently in use:\n')
pprint(rf.get_params())

Parameters currently in use:

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'criterion': 'squared_error',
 'max_depth': None,
 'max_features': 'auto',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 100,
 'n_jobs': None,
 'oob_score': False,
 'random_state': 42,
 'verbose': 0,
 'warm_start': False}


- **n_estimators =** number of trees in the foreset
- **max_features =** max number of features considered for splitting a node
- **max_depth =** max number of levels in each decision tree
- **min_samples_split =** min number of data points placed in a node before the node is split
- **min_samples_leaf =** min number of data points allowed in a leaf node
- **bootstrap =** method for sampling data points (with or without replacement)

In [None]:
print("Random Search with Cross Validation")
from sklearn.model_selection import RandomizedSearchCV

# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4]
# Method of selecting samples for training each tree
bootstrap = [True, False]

# Create the random grid
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'bootstrap': bootstrap}

pprint(random_grid)

Random Search with Cross Validation
{'bootstrap': [True, False],
 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None],
 'max_features': ['auto', 'sqrt'],
 'min_samples_leaf': [1, 2, 4],
 'min_samples_split': [2, 5, 10],
 'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}


On each iteration, the algorithm will choose a difference combination of the features. Altogether, there are 2 * 12 * 2 * 3 * 3 * 10 = 4320 settings! However, the benefit of a random search is that we are not trying every combination, but selecting at random to sample a wide range of values.

In [None]:
# Use the random grid to search for best hyperparameters

# First create the base model to tune
rf = RandomForestRegressor(random_state = 42)

# Random search of parameters, using 3 fold cross validation, 
# search across 100 different combinations, and use all available cores
rf_random = RandomizedSearchCV(estimator=rf, param_distributions=random_grid,
                              n_iter = 100, scoring='neg_mean_absolute_error', 
                              cv = 3, verbose=2, random_state=42, n_jobs=-1,
                              return_train_score=True)

# Fit the random search model
rf_random.fit(train_features, train_labels)

Fitting 3 folds for each of 100 candidates, totalling 300 fits


RandomizedSearchCV(cv=3, estimator=RandomForestRegressor(random_state=42),
                   n_iter=100, n_jobs=-1,
                   param_distributions={'bootstrap': [True, False],
                                        'max_depth': [10, 20, 30, 40, 50, 60,
                                                      70, 80, 90, 100, 110,
                                                      None],
                                        'max_features': ['auto', 'sqrt'],
                                        'min_samples_leaf': [1, 2, 4],
                                        'min_samples_split': [2, 5, 10],
                                        'n_estimators': [200, 400, 600, 800,
                                                         1000, 1200, 1400, 1600,
                                                         1800, 2000]},
                   random_state=42, return_train_score=True,
                   scoring='neg_mean_absolute_error', verbose=2)

In [None]:
print("Best Parameter")
rf_random.best_params_

Best Parameter


{'bootstrap': False,
 'max_depth': None,
 'max_features': 'sqrt',
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'n_estimators': 400}

- The most important arguments in RandomizedSearchCV are n_iter, which controls the number of different combinations to try, and cv which is the number of folds to use for cross validation (we use 100 and 3 respectively). 
- More iterations will cover a wider search space and more cv folds reduces the chances of overfitting, but raising each will increase the run time. Machine learning is a field of trade-offs, and performance vs time is one of the most fundamental.

- Evaluate Random Search
  
  To determine if random search yielded a better model, we compare the base model with the best random search model.



In [None]:
def evaluate(model, test_features, test_labels):
  predictions = model.predict(test_features)
  errors = abs(predictions - test_labels)
  mape = 100 * np.mean(errors / test_labels)
  accuracy = 100 - mape
  print('Model Performance')
  print('Average Error: {:0.4f} degrees.'.format(np.mean(errors)))
  print('Accuracy = {:0.2f}%.'.format(accuracy))
  return accuracy

base_model = RandomForestRegressor(n_estimators = 10, random_state = 42)
base_model.fit(train_features, train_labels)
base_accuracy = evaluate(base_model, test_features, test_labels)

print("\n")
best_random = rf_random.best_estimator_
random_accuracy = evaluate(best_random, test_features, test_labels)

print("\n")
print('Improvement of {:0.2f}%.'.format( 100 * (random_accuracy - base_accuracy) / base_accuracy))

Model Performance
Average Error: 2.2394 degrees.
Accuracy = 88.33%.


Model Performance
Average Error: 1.9170 degrees.
Accuracy = 90.41%.


Improvement of 2.35%.


### **Grid Search with Cross Validation**

- Random search allowed us to narrow down the range for each hyperparameter. 
- Now that we know where to concentrate our search, we can explicitly specify every combination of settings to try. 
- We do this with GridSearchCV, a method that, instead of sampling randomly from a distribution, evaluates all combinations we define. 
- To use Grid Search, we make another grid based on the best values provided by random search:

In [None]:
from sklearn.model_selection import GridSearchCV
# Create the parameter grid based on the results of random search 
param_grid = {
    'bootstrap': [True],
    'max_depth': [80, 90, 100, 110],
    'max_features': [2, 3],
    'min_samples_leaf': [3, 4, 5],
    'min_samples_split': [8, 10, 12],
    'n_estimators': [100, 200, 300, 1000]
}
# Create a based model
rf = RandomForestRegressor()
# Instantiate the grid search model
grid_search = GridSearchCV(estimator = rf, param_grid = param_grid, 
                          cv = 3, n_jobs = -1, verbose = 2)
grid_search

GridSearchCV(cv=3, estimator=RandomForestRegressor(), n_jobs=-1,
             param_grid={'bootstrap': [True], 'max_depth': [80, 90, 100, 110],
                         'max_features': [2, 3], 'min_samples_leaf': [3, 4, 5],
                         'min_samples_split': [8, 10, 12],
                         'n_estimators': [100, 200, 300, 1000]},
             verbose=2)

This will try out 1 * 4 * 2 * 3 * 3 * 4 = 288 combinations of settings. We can fit the model, display the best hyperparameters, and evaluate performance:

In [None]:
# Fit the grid search to the data
grid_search.fit(train_features, train_labels)
grid_search.best_params_

print("\n")
best_grid = grid_search.best_estimator_
grid_accuracy = evaluate(best_grid, test_features, test_labels)

print("\n")
print('Improvement of {:0.2f}%.'.format( 100 * (grid_accuracy - base_accuracy) / base_accuracy))

Fitting 3 folds for each of 288 candidates, totalling 864 fits


Model Performance
Average Error: 2.1252 degrees.
Accuracy = 89.03%.


Improvement of 0.79%.
