# Analysis of time-to-resolve Get It Done San Diego requests
## Notebook 4b: Machine Learning (Part 2)
### Tuning model hyperparameters
The final model chosen is the Random Forest Classifier (for multi-label data). This notebook will document hyperparameter tuning with this model. The first steps are the same as in Machine Learning (Part 1) in order to have the same pre-processing flow.

NOTE: In Part 1 I upsampled the data so that there were equal numbers of records in all 'closed_bin' categories. Using that upsampled dataset reliably causes an out-of-memory error on my laptop when attempting the grid search in this part. So instead I will use one of the smaller category of requests and either down- or up-sample the other categories to match this one.

### 1 - Import libraries, import data and do some final cleaning

In [4]:
# import libraries
import pandas as pd
import numpy as np

import os
notebook_path = os.path.abspath("Notebook4a-Machine_Learning_Part1.ipynb")

from sklearn.utils import resample
from sklearn.metrics import confusion_matrix, classification_report

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score

#### *Discarding some requests* 
We will only use 'closed' requests because none of the other types (i.e. 'referred' or 'cancelled') have updated datetime information in the database. Almost all request types, from all sources (whether initiated by the public or city officials) are included in this model. The one exception is for 'Graffiti Removal' requests entered by City workers, because those requests are marked 'closed' as soon as they are entered into the database. My interpretation is that in this one case, City workers use the Get It Done database to log work that they've already completed, rather than to create records for work that needs to be done in the future.

In [5]:
filename_csv = os.path.join(os.path.dirname(notebook_path), "gid_with_demo.csv")
gid_with_demo = pd.read_csv(filename_csv, index_col=0)
gid_with_demo.set_index('service_request_id', inplace=True)
predictor_df = gid_with_demo.copy()

# Only use closed requests
predictor_df = predictor_df[predictor_df.status=='Closed']

# Remove Crew/Self Generated Graffiti Removal requests
predictor_df = predictor_df.drop(predictor_df[(predictor_df.case_origin == 'Crew/Self Generated') & 
                             (predictor_df.service_name == 'Graffiti Removal')].index)

# Get rid of service_name categories with low number of requests
predictor_df = predictor_df[predictor_df.groupby('service_name')['service_name'].transform('count').ge(100)]

# Drop all unneeded columns
predictor_df = predictor_df.drop(columns=['requested_datetime','updated_datetime','status',
                                          'lat','long','datetime_till_closed','resolved',
                                          'zipcode','housing_value','median_age'],axis=1)

# Get rid of any NaN values that may have slipped through
predictor_df = predictor_df.dropna(axis=0, how='any', subset=['service_name','case_record_type',
                                          'district','case_origin','days_till_closed','load_by_service',
                                          'load_by_service_zip','household_income','pop_density'])

### 2 - Preparation for model training

#### *Add a column ('closed_bin') to use as labels for the classifier model*
These are the labels that will be used to train the model, and are based on the total number of days it takes to close a request. It's easier to add it to the dataframe here before normalization and splitting into test and training sets. The values for these particular labels were settled on after a bit of trial and error. Originally I'd used quintile values of the days_till_closed distribution, then tried quartile values. Given that those split points are arbitrary, and that the useful information to the user is more likely along the lines of "Will this request be resolved quickly or not?" I decided to use split points as follows:
- 0 = within a business week (5 days or less)
- 1 = between 1 week to 1 month
- 2 = between 1-2 months
- 3 = longer than 2 months

In [6]:
predictor_df['closed_bin'] = ''
predictor_df.loc[(predictor_df.days_till_closed < 6), 'closed_bin'] = 0
predictor_df.loc[((predictor_df.days_till_closed >= 6) & 
                  (predictor_df.days_till_closed < 31)), 'closed_bin'] = 1
predictor_df.loc[((predictor_df.days_till_closed >= 31) & 
                  (predictor_df.days_till_closed < 61)), 'closed_bin'] = 2
predictor_df.loc[(predictor_df.days_till_closed >= 61), 'closed_bin'] = 3

#### *Get dummy variables for categorical values*

In [7]:
predictor_df = pd.get_dummies(predictor_df, prefix=['service_name','case_record_type','district','case_origin'], 
                              columns=['service_name','case_record_type','district','case_origin'])

#### *Separate the training and test sets*
This is done *before* upsampling. We'll upsample the training set only so there is no 'bleed' through to the test set; in other words, no chance that the same record appears in both the training and testing set. I'm doing the train-test split manually below instead of using the sci-kit learn function, and *then* will separate into X and y sets.

In [8]:
import random

def my_train_test_split(dataset, split):
    train = pd.DataFrame()
    test = pd.DataFrame()
    indices = np.array(range(len(predictor_df)))
    random.Random(7).shuffle(indices)    # Random(XX) is the random seed for reproducibility
    split_idx = int(len(indices) * split)
    train = dataset.iloc[indices[:split_idx+1]]
    test = dataset.iloc[indices[split_idx+1:]]
    return train, test

train, test = my_train_test_split(predictor_df, 0.67)

#### *Upsample the training data*

In [9]:
# Separate majority and minority classes
train_bin0 = train[train.closed_bin==0]
train_bin1 = train[train.closed_bin==1]
train_bin2 = train[train.closed_bin==2]
train_bin3 = train[train.closed_bin==3]

# The following n_majority will cause an out-of-memory error in code below.
#n_majority = len(train[train.closed_bin==0])

# Instead, all classes will be either up- or down-sampled so that each class
# has exactly 20,000 requests (total of 80,000 requests)
# Upsample minority classes
n_majority = 20000

train_bin0_resampled = resample(train_bin0,
                                  replace=True,          # sample with replacement
                                  n_samples=n_majority,  # to match majority class
                                  random_state=123)      # for reproducibility

train_bin1_resampled = resample(train_bin1, replace=True, n_samples=n_majority, random_state=123)
train_bin2_resampled = resample(train_bin2, replace=True, n_samples=n_majority, random_state=123)
train_bin3_resampled = resample(train_bin3, replace=True, n_samples=n_majority, random_state=123)

# Combine majority class with upsampled minority class
train_resampled = pd.concat([train_bin0_resampled, train_bin1_resampled, train_bin2_resampled, 
                                 train_bin3_resampled])

print(train.closed_bin.value_counts())
print(train_resampled.closed_bin.value_counts())

1    46819
0    41465
3    19215
2    10724
Name: closed_bin, dtype: int64
3    20000
2    20000
1    20000
0    20000
Name: closed_bin, dtype: int64


### 3 - First hyperparameter estimation
In this section, I use RandomizedSearchCV to search over a fairly coarse grid of hyperparameters for the RandomForestClassifier model. I'll repeat this process in a more focused way in the next section to fine-tune the hyperparameters found in this search.

#### *Create X and y datasets*

In [10]:
X_train = train_resampled.copy()
X_train = X_train.drop(columns=['days_till_closed', 'closed_bin'], axis=1).values
y_train = train_resampled['closed_bin'].values

X_test = test.copy()
X_test = X_test.drop(columns=['days_till_closed', 'closed_bin'], axis=1).values
y_test = test['closed_bin'].values

#### *Create a very coarse grid of values to search over*

In [11]:
from sklearn.model_selection import RandomizedSearchCV

rf_clf = RandomForestClassifier()
rf_clf

# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4]
# Method of selecting samples for training each tree
bootstrap = [True]

# Create the random grid
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'bootstrap': bootstrap}

# Use the random grid to search for best hyperparameters
# First create the base model to tune
rf = RandomForestClassifier()
# Random search of parameters, using 3 fold cross validation, 
# search across 100 different combinations, and use all available cores
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, 
                               n_iter = 100, cv = 3, verbose=2, random_state=42, n_jobs = -1)
# Fit the random search model
rf_random.fit(X_train, y_train)

Fitting 3 folds for each of 100 candidates, totalling 300 fits
[CV] n_estimators=800, min_samples_split=2, min_samples_leaf=4, max_features=sqrt, max_depth=50, bootstrap=True 
[CV] n_estimators=800, min_samples_split=2, min_samples_leaf=4, max_features=sqrt, max_depth=50, bootstrap=True 
[CV] n_estimators=800, min_samples_split=2, min_samples_leaf=4, max_features=sqrt, max_depth=50, bootstrap=True 
[CV] n_estimators=1800, min_samples_split=2, min_samples_leaf=1, max_features=sqrt, max_depth=70, bootstrap=True 
[CV] n_estimators=1800, min_samples_split=2, min_samples_leaf=1, max_features=sqrt, max_depth=70, bootstrap=True 
[CV] n_estimators=1800, min_samples_split=2, min_samples_leaf=1, max_features=sqrt, max_depth=70, bootstrap=True 
[CV] n_estimators=400, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=90, bootstrap=True 
[CV] n_estimators=400, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=90, bootstrap=True 
[CV]  n_estimators=400, min_sa

[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed: 13.1min


[CV] n_estimators=1800, min_samples_split=10, min_samples_leaf=2, max_features=sqrt, max_depth=None, bootstrap=True 
[CV]  n_estimators=800, min_samples_split=2, min_samples_leaf=1, max_features=sqrt, max_depth=None, bootstrap=True, total= 3.2min
[CV] n_estimators=400, min_samples_split=5, min_samples_leaf=2, max_features=sqrt, max_depth=70, bootstrap=True 
[CV]  n_estimators=1400, min_samples_split=5, min_samples_leaf=4, max_features=sqrt, max_depth=70, bootstrap=True, total= 4.0min
[CV] n_estimators=400, min_samples_split=5, min_samples_leaf=2, max_features=sqrt, max_depth=70, bootstrap=True 
[CV]  n_estimators=800, min_samples_split=10, min_samples_leaf=1, max_features=sqrt, max_depth=110, bootstrap=True, total= 2.6min
[CV] n_estimators=400, min_samples_split=5, min_samples_leaf=2, max_features=sqrt, max_depth=70, bootstrap=True 
[CV]  n_estimators=800, min_samples_split=10, min_samples_leaf=1, max_features=sqrt, max_depth=110, bootstrap=True, total= 2.7min
[CV] n_estimators=1000, m

[CV] n_estimators=1200, min_samples_split=2, min_samples_leaf=2, max_features=sqrt, max_depth=60, bootstrap=True 
[CV]  n_estimators=1200, min_samples_split=5, min_samples_leaf=2, max_features=sqrt, max_depth=60, bootstrap=True, total= 3.8min
[CV] n_estimators=1200, min_samples_split=2, min_samples_leaf=2, max_features=sqrt, max_depth=60, bootstrap=True 
[CV]  n_estimators=200, min_samples_split=5, min_samples_leaf=4, max_features=sqrt, max_depth=10, bootstrap=True, total=  18.2s
[CV] n_estimators=1200, min_samples_split=2, min_samples_leaf=2, max_features=sqrt, max_depth=60, bootstrap=True 
[CV]  n_estimators=1200, min_samples_split=5, min_samples_leaf=2, max_features=sqrt, max_depth=60, bootstrap=True, total= 3.8min
[CV] n_estimators=1800, min_samples_split=10, min_samples_leaf=1, max_features=sqrt, max_depth=50, bootstrap=True 
[CV]  n_estimators=800, min_samples_split=10, min_samples_leaf=4, max_features=sqrt, max_depth=100, bootstrap=True, total= 2.3min
[CV] n_estimators=1800, min

[CV] n_estimators=1600, min_samples_split=2, min_samples_leaf=1, max_features=sqrt, max_depth=40, bootstrap=True 
[CV]  n_estimators=400, min_samples_split=10, min_samples_leaf=2, max_features=sqrt, max_depth=10, bootstrap=True, total=  39.4s
[CV] n_estimators=1600, min_samples_split=2, min_samples_leaf=1, max_features=sqrt, max_depth=40, bootstrap=True 
[CV]  n_estimators=2000, min_samples_split=2, min_samples_leaf=2, max_features=sqrt, max_depth=None, bootstrap=True, total= 7.3min
[CV] n_estimators=1000, min_samples_split=10, min_samples_leaf=4, max_features=sqrt, max_depth=20, bootstrap=True 
[CV]  n_estimators=400, min_samples_split=10, min_samples_leaf=2, max_features=sqrt, max_depth=10, bootstrap=True, total=  39.1s
[CV] n_estimators=1000, min_samples_split=10, min_samples_leaf=4, max_features=sqrt, max_depth=20, bootstrap=True 
[CV]  n_estimators=2000, min_samples_split=5, min_samples_leaf=2, max_features=sqrt, max_depth=80, bootstrap=True, total= 6.9min
[CV] n_estimators=1000, 

[CV] n_estimators=600, min_samples_split=2, min_samples_leaf=1, max_features=sqrt, max_depth=None, bootstrap=True 
[CV]  n_estimators=1000, min_samples_split=10, min_samples_leaf=2, max_features=sqrt, max_depth=110, bootstrap=True, total= 3.0min
[CV] n_estimators=2000, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=20, bootstrap=True 
[CV]  n_estimators=1600, min_samples_split=5, min_samples_leaf=2, max_features=sqrt, max_depth=80, bootstrap=True, total= 5.3min
[CV] n_estimators=2000, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=20, bootstrap=True 
[CV]  n_estimators=1600, min_samples_split=5, min_samples_leaf=2, max_features=sqrt, max_depth=80, bootstrap=True, total= 5.3min
[CV] n_estimators=2000, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=20, bootstrap=True 
[CV]  n_estimators=1000, min_samples_split=10, min_samples_leaf=2, max_features=sqrt, max_depth=110, bootstrap=True, total= 2.9min
[CV] n_estimators=1800,

[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed: 78.4min


[CV] n_estimators=800, min_samples_split=5, min_samples_leaf=2, max_features=sqrt, max_depth=60, bootstrap=True 
[CV]  n_estimators=600, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=30, bootstrap=True, total= 1.9min
[CV] n_estimators=800, min_samples_split=5, min_samples_leaf=2, max_features=sqrt, max_depth=60, bootstrap=True 
[CV]  n_estimators=1600, min_samples_split=2, min_samples_leaf=4, max_features=sqrt, max_depth=90, bootstrap=True, total= 4.5min
[CV] n_estimators=800, min_samples_split=5, min_samples_leaf=2, max_features=sqrt, max_depth=60, bootstrap=True 
[CV]  n_estimators=1600, min_samples_split=2, min_samples_leaf=4, max_features=sqrt, max_depth=90, bootstrap=True, total= 4.5min
[CV] n_estimators=400, min_samples_split=10, min_samples_leaf=1, max_features=sqrt, max_depth=50, bootstrap=True 
[CV]  n_estimators=800, min_samples_split=2, min_samples_leaf=1, max_features=sqrt, max_depth=70, bootstrap=True, total= 3.1min
[CV] n_estimators=400, min_sample

[CV] n_estimators=600, min_samples_split=2, min_samples_leaf=4, max_features=sqrt, max_depth=80, bootstrap=True 
[CV]  n_estimators=800, min_samples_split=2, min_samples_leaf=1, max_features=sqrt, max_depth=60, bootstrap=True, total= 2.9min
[CV] n_estimators=600, min_samples_split=2, min_samples_leaf=4, max_features=sqrt, max_depth=80, bootstrap=True 
[CV]  n_estimators=800, min_samples_split=2, min_samples_leaf=1, max_features=sqrt, max_depth=60, bootstrap=True, total= 2.9min
[CV]  n_estimators=600, min_samples_split=5, min_samples_leaf=4, max_features=sqrt, max_depth=100, bootstrap=True, total= 1.6min
[CV] n_estimators=2000, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=40, bootstrap=True 
[CV] n_estimators=2000, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=40, bootstrap=True 
[CV]  n_estimators=600, min_samples_split=5, min_samples_leaf=4, max_features=sqrt, max_depth=None, bootstrap=True, total= 1.6min
[CV] n_estimators=2000, min_sam

[CV] n_estimators=1600, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=50, bootstrap=True 
[CV]  n_estimators=1000, min_samples_split=10, min_samples_leaf=1, max_features=sqrt, max_depth=None, bootstrap=True, total= 3.3min
[CV] n_estimators=800, min_samples_split=10, min_samples_leaf=1, max_features=sqrt, max_depth=20, bootstrap=True 
[CV]  n_estimators=1000, min_samples_split=10, min_samples_leaf=1, max_features=sqrt, max_depth=None, bootstrap=True, total= 3.3min
[CV] n_estimators=800, min_samples_split=10, min_samples_leaf=1, max_features=sqrt, max_depth=20, bootstrap=True 
[CV]  n_estimators=1400, min_samples_split=10, min_samples_leaf=2, max_features=sqrt, max_depth=70, bootstrap=True, total= 4.3min
[CV] n_estimators=800, min_samples_split=10, min_samples_leaf=1, max_features=sqrt, max_depth=20, bootstrap=True 
[CV]  n_estimators=1400, min_samples_split=10, min_samples_leaf=2, max_features=sqrt, max_depth=70, bootstrap=True, total= 4.3min
[CV] n_estimators=18

[CV] n_estimators=400, min_samples_split=10, min_samples_leaf=4, max_features=sqrt, max_depth=40, bootstrap=True 
[CV]  n_estimators=2000, min_samples_split=10, min_samples_leaf=2, max_features=sqrt, max_depth=10, bootstrap=True, total= 3.2min
[CV] n_estimators=400, min_samples_split=10, min_samples_leaf=4, max_features=sqrt, max_depth=40, bootstrap=True 
[CV]  n_estimators=1600, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=20, bootstrap=True, total= 4.2min
[CV] n_estimators=400, min_samples_split=10, min_samples_leaf=4, max_features=sqrt, max_depth=40, bootstrap=True 
[CV]  n_estimators=1600, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=20, bootstrap=True, total= 4.2min
[CV] n_estimators=1800, min_samples_split=10, min_samples_leaf=4, max_features=sqrt, max_depth=80, bootstrap=True 
[CV]  n_estimators=1600, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=20, bootstrap=True, total= 4.2min
[CV] n_estimators=1800, mi

[CV] n_estimators=1600, min_samples_split=5, min_samples_leaf=4, max_features=sqrt, max_depth=100, bootstrap=True 
[CV]  n_estimators=2000, min_samples_split=5, min_samples_leaf=4, max_features=sqrt, max_depth=50, bootstrap=True, total= 6.1min
[CV] n_estimators=1600, min_samples_split=5, min_samples_leaf=4, max_features=sqrt, max_depth=100, bootstrap=True 
[CV]  n_estimators=2000, min_samples_split=5, min_samples_leaf=4, max_features=sqrt, max_depth=50, bootstrap=True, total= 6.1min
[CV] n_estimators=1000, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=60, bootstrap=True 
[CV]  n_estimators=2000, min_samples_split=5, min_samples_leaf=4, max_features=sqrt, max_depth=50, bootstrap=True, total= 6.1min
[CV] n_estimators=1000, min_samples_split=5, min_samples_leaf=1, max_features=sqrt, max_depth=60, bootstrap=True 
[CV]  n_estimators=2000, min_samples_split=5, min_samples_leaf=2, max_features=sqrt, max_depth=None, bootstrap=True, total= 6.8min
[CV] n_estimators=1000, 

[Parallel(n_jobs=-1)]: Done 300 out of 300 | elapsed: 149.9min finished


RandomizedSearchCV(cv=3, error_score='raise',
          estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False),
          fit_params=None, iid=True, n_iter=100, n_jobs=-1,
          param_distributions={'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000], 'max_features': ['sqrt'], 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 2, 4], 'bootstrap': [True]},
          pre_dispatch='2*n_jobs', random_state=42, refit=True,
          return_train_score='warn', scoring=None, verbose=2)

#### *What are the best hyperparameters from this first search?*

In [12]:
# best hyperparameters (1st try)
rf_random.best_params_

{'n_estimators': 1600,
 'min_samples_split': 2,
 'min_samples_leaf': 1,
 'max_features': 'sqrt',
 'max_depth': 40,
 'bootstrap': True}

#### *Do these new hyperparameters improve performance?*

In [13]:
# compare basic model with random search model
base_model = RandomForestClassifier(random_state=42)
base_model.fit(X_train, y_train)
y_pred = base_model.predict(X_test)
print('Basic model performance')
print(pd.crosstab(y_test, y_pred, rownames=['Actual Bin'], colnames=['Predicted Bin'], normalize='index'))
print('Accuracy:', accuracy_score(y_test, y_pred))
print('f1 scores:', f1_score(y_test, y_pred, average=None))

best_random = rf_random.best_estimator_
best_random.fit(X_train, y_train)

y_pred = best_random.predict(X_test)
print('\nRandom search model performance')
print(pd.crosstab(y_test, y_pred, rownames=['Actual Bin'], colnames=['Predicted Bin'], normalize='index'))
print('Accuracy:', accuracy_score(y_test, y_pred))
print('f1 scores:', f1_score(y_test, y_pred, average=None))

Basic model performance
Predicted Bin         0         1         2         3
Actual Bin                                           
0              0.630172  0.215515  0.072101  0.082211
1              0.224834  0.488096  0.152808  0.134262
2              0.138407  0.279799  0.373251  0.208543
3              0.107401  0.191833  0.173224  0.527541
Accuracy: 0.533368825994367
f1 scores: [0.63858071 0.53664319 0.28593884 0.48966096]

Random search model performance
Predicted Bin         0         1         2         3
Actual Bin                                           
0              0.615081  0.217636  0.074271  0.093012
1              0.197360  0.492539  0.158501  0.151600
2              0.116396  0.262824  0.389666  0.231114
3              0.091344  0.176202  0.161740  0.570715
Accuracy: 0.5383664216528131
f1 scores: [0.64275407 0.54271118 0.29522329 0.50133109]


### 4 - Second hyperparameter tuning
NOTE: Takes 9 hrs to run

Granted, the first tuning didn't improve performance any, but I might as well go all the way. In this section, I do a more careful gridsearch using the best parameters returned from the section above. These were:
- 'n_estimators': 1600,
- 'min_samples_split': 2,
- 'min_samples_leaf': 1,
- 'max_features': 'sqrt',
- 'max_depth': 40,
- 'bootstrap': True}

In [25]:
from sklearn.model_selection import GridSearchCV

# Create the parameter grid based on the results of previous random search 
param_grid = {
    'bootstrap': [True],
    'max_depth': [30, 35, 40, 45, 50],
    'max_features': [5, 10, 15],
    'min_samples_leaf': [1, 2],
    'min_samples_split': [2, 3, 4],
    'n_estimators': [1500, 1600, 1700]
}

# Create a base model
rf = RandomForestClassifier()
# Instantiate the grid search model
grid_search = GridSearchCV(estimator = rf, param_grid = param_grid, 
                          cv = 3, n_jobs = -1, verbose = 2)

# Fit the grid search to the data
grid_search.fit(X_train, y_train)

Fitting 3 folds for each of 270 candidates, totalling 810 fits
[CV] bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1500 
[CV] bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1500 
[CV] bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1500 
[CV] bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1600 
[CV] bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1600 
[CV] bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1600 
[CV] bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1700 
[CV] bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1700 
[CV]  bootstrap=True, max_depth=30, max_features=

[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed: 18.7min


[CV] bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1600 
[CV]  bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1500, total= 3.3min
[CV] bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1700 
[CV]  bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1500, total= 3.2min
[CV] bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1700 
[CV]  bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1700, total= 4.2min
[CV] bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1700 
[CV]  bootstrap=True, max_depth=30, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1600, total= 3.6min
[CV] bootstrap=True, max_depth=30, max_features=5, m

[CV] bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1600 
[CV]  bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1700, total= 6.1min
[CV]  bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1700, total= 6.1min
[CV] bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1600 
[CV] bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1700 
[CV]  bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1500, total= 5.3min
[CV] bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1700 
[CV]  bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1700, total= 6.0min
[CV] bootstrap=True, max_depth=30, max_featu

[CV]  bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1600, total= 5.2min
[CV] bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=2, min_samples_split=4, n_estimators=1600 
[CV]  bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1600, total= 5.3min
[CV] bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=2, min_samples_split=4, n_estimators=1600 
[CV]  bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1700, total= 5.9min
[CV] bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=2, min_samples_split=4, n_estimators=1600 
[CV]  bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=2, min_samples_split=4, n_estimators=1500, total= 5.3min
[CV] bootstrap=True, max_depth=30, max_features=10, min_samples_leaf=2, min_samples_split=4, n_estimators=1700 
[CV]  bootstrap=True, max_depth=30, max_feat

[CV] bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1500 
[CV]  bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1600, total= 6.3min
[CV] bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1500 
[CV]  bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1600, total= 6.5min
[CV] bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1600 
[CV]  bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1600, total= 6.5min
[CV] bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1600 
[CV]  bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1700, total= 6.9min
[CV]  bootstrap=True, max_depth=30, max_feat

[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed: 106.8min


[CV] bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1500 
[CV]  bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=2, min_samples_split=3, n_estimators=1500, total= 5.1min
[CV] bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1500 
[CV]  bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=2, min_samples_split=3, n_estimators=1600, total= 5.5min
[CV] bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1500 
[CV]  bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=2, min_samples_split=3, n_estimators=1600, total= 5.6min
[CV] bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1600 
[CV]  bootstrap=True, max_depth=30, max_features=15, min_samples_leaf=2, min_samples_split=3, n_estimators=1600, total= 5.5min
[CV] bootstrap=True, max_depth=30, max_featu

[CV]  bootstrap=True, max_depth=35, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1500, total= 4.2min
[CV] bootstrap=True, max_depth=35, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1700 
[CV]  bootstrap=True, max_depth=35, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1500, total= 4.1min
[CV] bootstrap=True, max_depth=35, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1500 
[CV]  bootstrap=True, max_depth=35, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1500, total= 4.0min
[CV] bootstrap=True, max_depth=35, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1500 
[CV]  bootstrap=True, max_depth=35, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1600, total= 4.3min
[CV] bootstrap=True, max_depth=35, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1500 
[CV]  bootstrap=True, max_depth=35, max_features=5, 

[CV]  bootstrap=True, max_depth=35, max_features=5, min_samples_leaf=2, min_samples_split=4, n_estimators=1700, total= 4.1min
[CV] bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1700 
[CV]  bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1500, total= 5.7min
[CV] bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1700 
[CV]  bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1500, total= 5.7min
[CV] bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1500 
[CV]  bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1500, total= 5.6min
[CV] bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1500 
[CV]  bootstrap=True, max_depth=35, max_featu

[CV]  bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=2, min_samples_split=2, n_estimators=1700, total= 5.5min
[CV] bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1600 
[CV] bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1700 
[CV]  bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=2, min_samples_split=2, n_estimators=1700, total= 5.5min
[CV] bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1700 
[CV]  bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1500, total= 4.9min
[CV] bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1700 
[CV]  bootstrap=True, max_depth=35, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1500, total= 4.9min
[CV] bootstrap=True, max_depth=35, max_featu

[CV]  bootstrap=True, max_depth=35, max_features=15, min_samples_leaf=1, min_samples_split=3, n_estimators=1600, total= 7.1min
[CV] bootstrap=True, max_depth=35, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1600 
[CV]  bootstrap=True, max_depth=35, max_features=15, min_samples_leaf=1, min_samples_split=3, n_estimators=1700, total= 7.5min
[CV] bootstrap=True, max_depth=35, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1600 
[CV]  bootstrap=True, max_depth=35, max_features=15, min_samples_leaf=1, min_samples_split=3, n_estimators=1700, total= 7.5min
[CV] bootstrap=True, max_depth=35, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1700 
[CV]  bootstrap=True, max_depth=35, max_features=15, min_samples_leaf=1, min_samples_split=3, n_estimators=1700, total= 7.5min
[CV] bootstrap=True, max_depth=35, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1700 
[CV]  bootstrap=True, max_depth=35, max_feat

[CV] bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1500 
[CV]  bootstrap=True, max_depth=35, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1600, total= 6.2min
[CV] bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1600 
[CV]  bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1500, total= 4.7min
[CV] bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1600 
[CV]  bootstrap=True, max_depth=35, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1600, total= 6.3min
[CV] bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1600 
[CV]  bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1500, total= 4.8min
[CV] bootstrap=True, max_depth=40, max_features=5,

[Parallel(n_jobs=-1)]: Done 349 tasks      | elapsed: 259.4min


[CV] bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1600 
[CV]  bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1700, total= 4.9min
[CV]  bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1700, total= 5.0min
[CV]  bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1700, total= 4.9min
[CV] bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1700 
[CV] bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1700 
[CV] bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1700 
[CV]  bootstrap=True, max_depth=40, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1500, total= 3.6min
[CV] bootstrap=True, max_depth=40, max_features=5, m

[CV] bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1600 
[CV]  bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1700, total= 6.3min
[CV] bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1600 
[CV]  bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1700, total= 6.3min
[CV] bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1700 
[CV]  bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1700, total= 6.3min
[CV] bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1700 
[CV]  bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1500, total= 5.3min
[CV] bootstrap=True, max_depth=40, max_featu

[CV]  bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1600, total= 4.7min
[CV] bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=2, min_samples_split=4, n_estimators=1600 
[CV]  bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1600, total= 4.7min
[CV] bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=2, min_samples_split=4, n_estimators=1600 
[CV]  bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=2, min_samples_split=4, n_estimators=1500, total= 4.5min
[CV] bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=2, min_samples_split=4, n_estimators=1600 
[CV]  bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1700, total= 5.1min
[CV] bootstrap=True, max_depth=40, max_features=10, min_samples_leaf=2, min_samples_split=4, n_estimators=1700 
[CV]  bootstrap=True, max_depth=40, max_feat

[CV] bootstrap=True, max_depth=40, max_features=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1500 
[CV]  bootstrap=True, max_depth=40, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1600, total= 6.4min
[CV] bootstrap=True, max_depth=40, max_features=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1500 
[CV]  bootstrap=True, max_depth=40, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1600, total= 6.4min
[CV] bootstrap=True, max_depth=40, max_features=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1600 
[CV]  bootstrap=True, max_depth=40, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1600, total= 6.4min
[CV] bootstrap=True, max_depth=40, max_features=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1600 
[CV]  bootstrap=True, max_depth=40, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1700, total= 6.8min
[CV]  bootstrap=True, max_depth=40, max_feat

[CV]  bootstrap=True, max_depth=40, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1700, total= 6.3min
[CV] bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=1, min_samples_split=3, n_estimators=1500 
[CV]  bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1500, total= 4.5min
[CV] bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=1, min_samples_split=3, n_estimators=1500 
[CV]  bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1600, total= 4.8min
[CV] bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=1, min_samples_split=3, n_estimators=1500 
[CV]  bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=1, min_samples_split=2, n_estimators=1600, total= 4.8min
[CV] bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=1, min_samples_split=3, n_estimators=1600 
[CV]  bootstrap=True, max_depth=45, max_features=5,

[CV]  bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=2, min_samples_split=3, n_estimators=1500, total= 3.4min
[CV] bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=2, min_samples_split=3, n_estimators=1700 
[CV]  bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=2, min_samples_split=3, n_estimators=1500, total= 3.4min
[CV] bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=2, min_samples_split=4, n_estimators=1500 
[CV]  bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=2, min_samples_split=3, n_estimators=1500, total= 3.3min
[CV] bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=2, min_samples_split=4, n_estimators=1500 
[CV]  bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=2, min_samples_split=3, n_estimators=1600, total= 3.6min
[CV] bootstrap=True, max_depth=45, max_features=5, min_samples_leaf=2, min_samples_split=4, n_estimators=1500 
[CV]  bootstrap=True, max_depth=45, max_features=5, 

[CV] bootstrap=True, max_depth=45, max_features=10, min_samples_leaf=1, min_samples_split=4, n_estimators=1700 
[CV]  bootstrap=True, max_depth=45, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1700, total= 6.5min
[CV]  bootstrap=True, max_depth=45, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1700, total= 6.6min
[CV] bootstrap=True, max_depth=45, max_features=10, min_samples_leaf=1, min_samples_split=4, n_estimators=1700 
[CV] bootstrap=True, max_depth=45, max_features=10, min_samples_leaf=1, min_samples_split=4, n_estimators=1700 
[CV]  bootstrap=True, max_depth=45, max_features=10, min_samples_leaf=1, min_samples_split=4, n_estimators=1500, total= 5.7min
[CV] bootstrap=True, max_depth=45, max_features=10, min_samples_leaf=2, min_samples_split=2, n_estimators=1500 
[CV]  bootstrap=True, max_depth=45, max_features=10, min_samples_leaf=1, min_samples_split=4, n_estimators=1500, total= 5.5min
[CV] bootstrap=True, max_depth=45, max_featu

[CV]  bootstrap=True, max_depth=45, max_features=10, min_samples_leaf=2, min_samples_split=4, n_estimators=1700, total= 5.2min
[CV] bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=1, min_samples_split=2, n_estimators=1600 
[CV]  bootstrap=True, max_depth=45, max_features=10, min_samples_leaf=2, min_samples_split=4, n_estimators=1700, total= 5.2min
[CV] bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=1, min_samples_split=2, n_estimators=1700 
[CV]  bootstrap=True, max_depth=45, max_features=10, min_samples_leaf=2, min_samples_split=4, n_estimators=1700, total= 5.1min
[CV] bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=1, min_samples_split=2, n_estimators=1700 
[CV]  bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=1, min_samples_split=2, n_estimators=1500, total= 6.2min
[CV] bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=1, min_samples_split=2, n_estimators=1700 
[CV]  bootstrap=True, max_depth=45, max_feat

[CV] bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=3, n_estimators=1600 
[CV]  bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1600, total= 5.8min
[CV] bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=3, n_estimators=1600 
[CV]  bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1700, total= 6.2min
[CV] bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=3, n_estimators=1600 
[CV]  bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1700, total= 6.2min
[CV] bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=3, n_estimators=1700 
[CV]  bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=2, n_estimators=1700, total= 6.2min
[CV] bootstrap=True, max_depth=45, max_featu

[Parallel(n_jobs=-1)]: Done 632 tasks      | elapsed: 476.8min


[CV] bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1500 
[CV]  bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=3, n_estimators=1500, total= 5.4min
[CV] bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1500 
[CV]  bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=3, n_estimators=1600, total= 5.8min
[CV] bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1500 
[CV]  bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=3, n_estimators=1600, total= 5.9min
[CV] bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1600 
[CV]  bootstrap=True, max_depth=45, max_features=15, min_samples_leaf=2, min_samples_split=3, n_estimators=1600, total= 5.9min
[CV] bootstrap=True, max_depth=45, max_featu

[CV]  bootstrap=True, max_depth=50, max_features=5, min_samples_leaf=1, min_samples_split=3, n_estimators=1700, total= 5.2min
[CV] bootstrap=True, max_depth=50, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1700 
[CV]  bootstrap=True, max_depth=50, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1500, total= 4.5min
[CV] bootstrap=True, max_depth=50, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1500 
[CV]  bootstrap=True, max_depth=50, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1500, total= 4.4min
[CV] bootstrap=True, max_depth=50, max_features=5, min_samples_leaf=2, min_samples_split=2, n_estimators=1500 
[CV]  bootstrap=True, max_depth=50, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1600, total= 4.7min
[CV]  bootstrap=True, max_depth=50, max_features=5, min_samples_leaf=1, min_samples_split=4, n_estimators=1600, total= 4.7min
[CV] bootstrap=True, max_depth=50, ma

[CV]  bootstrap=True, max_depth=50, max_features=5, min_samples_leaf=2, min_samples_split=4, n_estimators=1700, total= 3.8min
[CV] bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1700 
[CV]  bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1500, total= 5.5min
[CV] bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1700 
[CV]  bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1500, total= 5.6min
[CV] bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1500 
[CV]  bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=1, min_samples_split=2, n_estimators=1500, total= 5.7min
[CV] bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=1, min_samples_split=3, n_estimators=1500 
[CV]  bootstrap=True, max_depth=50, max_featu

[CV] bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1600 
[CV]  bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=2, min_samples_split=2, n_estimators=1700, total= 5.2min
[CV] bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1700 
[CV]  bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=2, min_samples_split=2, n_estimators=1700, total= 5.2min
[CV] bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1700 
[CV]  bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1500, total= 4.7min
[CV] bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=2, min_samples_split=3, n_estimators=1700 
[CV]  bootstrap=True, max_depth=50, max_features=10, min_samples_leaf=2, min_samples_split=2, n_estimators=1700, total= 5.3min
[CV] bootstrap=True, max_depth=50, max_featu

[CV]  bootstrap=True, max_depth=50, max_features=15, min_samples_leaf=1, min_samples_split=3, n_estimators=1600, total= 6.9min
[CV] bootstrap=True, max_depth=50, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1600 
[CV]  bootstrap=True, max_depth=50, max_features=15, min_samples_leaf=1, min_samples_split=3, n_estimators=1700, total= 7.2min
[CV] bootstrap=True, max_depth=50, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1600 
[CV]  bootstrap=True, max_depth=50, max_features=15, min_samples_leaf=1, min_samples_split=3, n_estimators=1700, total= 7.3min
[CV]  bootstrap=True, max_depth=50, max_features=15, min_samples_leaf=1, min_samples_split=3, n_estimators=1700, total= 7.2min
[CV] bootstrap=True, max_depth=50, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1700 
[CV] bootstrap=True, max_depth=50, max_features=15, min_samples_leaf=1, min_samples_split=4, n_estimators=1700 
[CV]  bootstrap=True, max_depth=50, max_feat

[CV]  bootstrap=True, max_depth=50, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1700, total= 4.7min
[CV]  bootstrap=True, max_depth=50, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1700, total= 4.6min
[CV]  bootstrap=True, max_depth=50, max_features=15, min_samples_leaf=2, min_samples_split=4, n_estimators=1700, total= 4.5min


[Parallel(n_jobs=-1)]: Done 810 out of 810 | elapsed: 610.7min finished


GridSearchCV(cv=3, error_score='raise',
       estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False),
       fit_params=None, iid=True, n_jobs=-1,
       param_grid={'bootstrap': [True], 'max_depth': [30, 35, 40, 45, 50], 'max_features': [5, 10, 15], 'min_samples_leaf': [1, 2], 'min_samples_split': [2, 3, 4], 'n_estimators': [1500, 1600, 1700]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=2)

#### *Best hyperparameters from this search*

In [26]:
# best hyperparameters (Final finetuning)
grid_search.best_params_

{'bootstrap': True,
 'max_depth': 45,
 'max_features': 15,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'n_estimators': 1600}

#### *How does this latest version perform?*

In [27]:
best_grid = grid_search.best_estimator_
best_grid.fit(X_train, y_train)

y_pred = best_grid.predict(X_test)
print('\nGrid search model performance')
print(pd.crosstab(y_test, y_pred, rownames=['Actual Bin'], colnames=['Predicted Bin'], normalize='index'))
print('Accuracy:', accuracy_score(y_test, y_pred))
print('f1 scores:', f1_score(y_test, y_pred, average=None))


Grid search model performance
Predicted Bin         0         1         2         3
Actual Bin                                           
0              0.618040  0.218622  0.073236  0.090102
1              0.198913  0.495903  0.157078  0.148107
2              0.114904  0.263757  0.392091  0.229248
3              0.092088  0.180668  0.159188  0.568056
Accuracy: 0.5405303290513155
f1 scores: [0.64426908 0.54454179 0.29843118 0.50310793]
