### Grid Vs Random Search Cross Validation

In [1]:
import warnings
warnings.filterwarnings("ignore")

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv', header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,60
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094,R


In [4]:
len(df)

208

In [5]:
X, y = df.iloc[:, :-1], df.iloc[:, -1]

In [6]:
# Classification problem 

# Random Forest
#   Parameters to evaluate
#     1. n_estimators
#     2. max_depth
#     3. n_iter

# Use a 1. GridSearchCV 2. RandomSearchCV

In [7]:
from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

In [8]:
#help(RandomForestClassifier)

In [9]:
clf = RandomForestClassifier()

parameters = {
    'rf': {
        'n_estimators': [10, 20, 30, 40, 50],
        'max_depth': [5, 10, 15],
        'min_samples_split': [40, 30, 20]
    },
    'xgb': {
        'max_depth': [5, 10, 15],
        'max_leaf_nodes': [50, 40, 30, 20]
    }
}

In [10]:
X.shape, y.shape

((208, 60), (208,))

In [11]:
# pip install xgboost==1.5.0

In [12]:
classifiers = {
    'rf': RandomForestClassifier,
    'xgb': xgb.XGBClassifier
}

In [14]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

In [16]:
for clf_name, params in parameters.items():

    clf = classifiers[clf_name]()
    search = GridSearchCV(
        clf,
        params,
        scoring='accuracy',
        cv=5,
        n_jobs=-1)
    search.fit(X.values, y.values)

    print(f"Best Estimator: {search.best_params_}")
    print(f"Best Score: {search.best_score_}")

Best Estimator: {'max_depth': 15, 'min_samples_split': 40, 'n_estimators': 30}
Best Score: 0.7315911730545877




Parameters: { "max_leaf_nodes" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Best Estimator: {'max_depth': 5, 'max_leaf_nodes': 50}
Best Score: 0.721835075493612




Parameters: { "max_leaf_nodes" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "max_leaf_nodes" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "max_leaf_nodes" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


Parameters: { "max_leaf_nodes" } migh