> I show you how you can use different hyperparameter optimization techniques and libraries to tune hyperparameters of almost any kind of model or just to optimize any function! 

Contents:-
1. GridSearchCV - score(0.8805)
2. RandomizedSearchCV - score(0.8825) 
3.  Grid/Random Search with Pipelines. - score(0.45990
4. Bayesian optimization using Gaussian Processes. - score(0.9050)
5. Hyperopt - score(0.9075)
6. Optuna - score(0.9085) ---Best

In [1]:
import pandas as pd
import numpy as np
from sklearn import ensemble 
from sklearn import metrics
from sklearn import model_selection


In [2]:
df = pd.read_csv("../input/mobile-price-classification/train.csv")
# features are all columns without price_range
# note that there is no id column in this dataset
# here we have training features
X = df.drop("price_range", axis=1).values
# and the targets
y = df.price_range.values
X.shape, y.shape

((2000, 20), (2000,))

In [3]:
"""
RandomForestClassifier( 
    n_estimators=100, 
    criterion='gini', 
    max_depth=None, 
    min_samples_split=2,
    min_samples_leaf=1, 
    min_weight_fraction_leaf=0.0,
    max_features='auto',
    max_leaf_nodes=None,
    min_impurity_decrease=0.0, 
    min_impurity_split=None, 
    bootstrap=True,
    oob_score=False,
    n_jobs=None,
    random_state=None,
    verbose=0,
    warm_start=False, 
    class_weight=None,
    ccp_alpha=0.0, 
    max_samples=None,
)

There are nineteen parameters,
and all the combinations of all 
these parameters for all the values
they can assume are going to be infinite. 
Normally, we don’t have the resource and
time to do this. Thus, we specify a
grid of parameters. A search over this
grid to find the best combination of parameters 
is known as grid search. 
We can say that n_estimators can be
100, 200, 250, 300, 400, 500; 
max_depth can be 1, 2, 5, 7, 11, 15 and 
criterion can be gini or entropy. 
These may not look like a lot of parameters, 
but it would take a lot of time for computation
if the dataset is too large.
"""

"\nRandomForestClassifier( \n    n_estimators=100, \n    criterion='gini', \n    max_depth=None, \n    min_samples_split=2,\n    min_samples_leaf=1, \n    min_weight_fraction_leaf=0.0,\n    max_features='auto',\n    max_leaf_nodes=None,\n    min_impurity_decrease=0.0, \n    min_impurity_split=None, \n    bootstrap=True,\n    oob_score=False,\n    n_jobs=None,\n    random_state=None,\n    verbose=0,\n    warm_start=False, \n    class_weight=None,\n    ccp_alpha=0.0, \n    max_samples=None,\n)\n\nThere are nineteen parameters,\nand all the combinations of all \nthese parameters for all the values\nthey can assume are going to be infinite. \nNormally, we don’t have the resource and\ntime to do this. Thus, we specify a\ngrid of parameters. A search over this\ngrid to find the best combination of parameters \nis known as grid search. \nWe can say that n_estimators can be\n100, 200, 250, 300, 400, 500; \nmax_depth can be 1, 2, 5, 7, 11, 15 and \ncriterion can be gini or entropy. \nThese may 

# 1. Grid_search. </br>
Grid-search is used to find the optimal hyperparameters of a model which results in the most ‘accurate’ predictions.

In [4]:
# define the model here
# i am using random forest with n_jobs=-1
# n_jobs=-1 => use all cores
classifier = ensemble.RandomForestClassifier(n_jobs=-1)
# define a grid of parameters
# this can be a dictionary or a list of 
# dictionaries
param_grid = {
    "n_estimators" : [100, 200, 300, 400],
    "max_depth" : [1, 4, 6, 9],
    "criterion" : ["gini", "entropy"],
}

# initialize grid search
# estimator is the model that we have defined
# param_grid is the grid of parameters
# we use accuracy as our metric. you can define your own
# higher value of verbose implies a lot of details are printed
# cv=5 means that we are using 5 fold cv (not stratified)

model = model_selection.GridSearchCV(
    estimator=classifier,
    param_grid=param_grid,
    scoring="accuracy",
    verbose=10,
    n_jobs=1,
    cv=5
)

# fit the model and extract best score
model.fit(X, y)


Fitting 5 folds for each of 32 candidates, totalling 160 fits
[CV] criterion=gini, max_depth=1, n_estimators=100 ...................


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  criterion=gini, max_depth=1, n_estimators=100, score=0.542, total=   2.3s
[CV] criterion=gini, max_depth=1, n_estimators=100 ...................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.3s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=100, score=0.615, total=   0.3s
[CV] criterion=gini, max_depth=1, n_estimators=100 ...................


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    2.6s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=100, score=0.580, total=   0.3s
[CV] criterion=gini, max_depth=1, n_estimators=100 ...................


[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    3.0s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=100, score=0.580, total=   0.3s
[CV] criterion=gini, max_depth=1, n_estimators=100 ...................


[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    3.3s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=100, score=0.610, total=   0.3s
[CV] criterion=gini, max_depth=1, n_estimators=200 ...................


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    3.6s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=200, score=0.593, total=   0.4s
[CV] criterion=gini, max_depth=1, n_estimators=200 ...................


[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    4.0s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=200, score=0.580, total=   0.4s
[CV] criterion=gini, max_depth=1, n_estimators=200 ...................


[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:    4.5s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=200, score=0.608, total=   0.5s
[CV] criterion=gini, max_depth=1, n_estimators=200 ...................


[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:    4.9s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=200, score=0.593, total=   0.4s
[CV] criterion=gini, max_depth=1, n_estimators=200 ...................


[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    5.4s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=200, score=0.598, total=   0.4s
[CV] criterion=gini, max_depth=1, n_estimators=300 ...................
[CV]  criterion=gini, max_depth=1, n_estimators=300, score=0.552, total=   0.6s
[CV] criterion=gini, max_depth=1, n_estimators=300 ...................
[CV]  criterion=gini, max_depth=1, n_estimators=300, score=0.547, total=   0.7s
[CV] criterion=gini, max_depth=1, n_estimators=300 ...................
[CV]  criterion=gini, max_depth=1, n_estimators=300, score=0.627, total=   0.7s
[CV] criterion=gini, max_depth=1, n_estimators=300 ...................
[CV]  criterion=gini, max_depth=1, n_estimators=300, score=0.588, total=   0.7s
[CV] criterion=gini, max_depth=1, n_estimators=300 ...................
[CV]  criterion=gini, max_depth=1, n_estimators=300, score=0.575, total=   0.6s
[CV] criterion=gini, max_depth=1, n_estimators=400 ...................
[CV]  criterion=gini, max_depth=1, n_estimators=400, score=0.608, total=   0.7s
[CV] criterion

[Parallel(n_jobs=1)]: Done 160 out of 160 | elapsed:  1.8min finished


GridSearchCV(cv=5, estimator=RandomForestClassifier(n_jobs=-1), n_jobs=1,
             param_grid={'criterion': ['gini', 'entropy'],
                         'max_depth': [1, 4, 6, 9],
                         'n_estimators': [100, 200, 300, 400]},
             scoring='accuracy', verbose=10)

In [5]:
print(model.best_score_)
print(model.best_estimator_.get_params())

0.881
{'bootstrap': True, 'ccp_alpha': 0.0, 'class_weight': None, 'criterion': 'entropy', 'max_depth': 9, 'max_features': 'auto', 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_impurity_split': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 200, 'n_jobs': -1, 'oob_score': False, 'random_state': None, 'verbose': 0, 'warm_start': False}


we see that our best five fold accuracy score was 0.8805 and we have the best parameters from our grid search. 

WOW, it is amazing to see model itself choose its best values. </br>These are the best hyperparameter values for this model. </br>
You can try with different parameters values and see if the accuracy score goes high or not, play with parameters.

- It must also be noted that if you have k- fold cross-validation, you need even more loops which implies even more time to find the perfect parameters. Grid search is therefore not very popular.

# 2. Randomized Search CV </br>
RandomizedSearchCV is very useful when we have many parameters to try and the training time is very long. For this example, I use a random-forest classifier, so I suppose you already know how this kind of algorithm works.

In random search, we randomly select a combination of parameters and calculate the cross-validation score. The time consumed here is less than grid search because we do not evaluate over all different combinations of parameters. We choose how many times we want to evaluate our models, and that’s what decides how much time the search takes.

In [6]:
# define the model here
# i am using random forest with n_jobs=-1
# n_jobs=-1 => use all cores
classifier = ensemble.RandomForestClassifier(n_jobs=-1)

# define a grid of parameters
# this can be a dictionary or a list of # dictionaries
param_grid = {
    "n_estimators" : np.arange(100, 1500, 100),
    "max_depth" : np.arange(1, 20),
    "criterion" : ["gini", "entropy"],
}

# initialize random search
# estimator is the model that we have defined
# param_distributions is the grid/distribution of parameters
# we use accuracy as our metric. you can define your own
# higher value of verbose implies a lot of details are printed
# cv=5 means that we are using 5 fold cv (not stratified)
# n_iter is the number of iterations we want
# if param_distributions has all the values as list,
# random search will be done by sampling without replacement
# if any of the parameters come from a distribution,
# random search uses sampling with replacement
model = model_selection.RandomizedSearchCV(
    estimator=classifier,
    param_distributions=param_grid,
    scoring="accuracy",
    n_iter=20,
    verbose=10,
    n_jobs=1,
    cv=5
)
model.fit(X, y)
print(f"Best score: {model.best_score_}")

print("Best parameters set:")
best_parameters = model.best_estimator_.get_params()
for param_name in sorted(param_grid.keys()):
    print(f"\t{param_name}: {best_parameters[param_name]}")

Fitting 5 folds for each of 20 candidates, totalling 100 fits
[CV] n_estimators=400, max_depth=15, criterion=gini ..................


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  n_estimators=400, max_depth=15, criterion=gini, score=0.882, total=   1.2s
[CV] n_estimators=400, max_depth=15, criterion=gini ..................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.2s remaining:    0.0s


[CV]  n_estimators=400, max_depth=15, criterion=gini, score=0.877, total=   1.1s
[CV] n_estimators=400, max_depth=15, criterion=gini ..................


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    2.3s remaining:    0.0s


[CV]  n_estimators=400, max_depth=15, criterion=gini, score=0.897, total=   1.0s
[CV] n_estimators=400, max_depth=15, criterion=gini ..................


[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    3.3s remaining:    0.0s


[CV]  n_estimators=400, max_depth=15, criterion=gini, score=0.880, total=   1.1s
[CV] n_estimators=400, max_depth=15, criterion=gini ..................


[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    4.5s remaining:    0.0s


[CV]  n_estimators=400, max_depth=15, criterion=gini, score=0.870, total=   1.1s
[CV] n_estimators=1100, max_depth=17, criterion=gini .................


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    5.6s remaining:    0.0s


[CV]  n_estimators=1100, max_depth=17, criterion=gini, score=0.880, total=   2.6s
[CV] n_estimators=1100, max_depth=17, criterion=gini .................


[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    8.2s remaining:    0.0s


[CV]  n_estimators=1100, max_depth=17, criterion=gini, score=0.882, total=   2.8s
[CV] n_estimators=1100, max_depth=17, criterion=gini .................


[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:   11.0s remaining:    0.0s


[CV]  n_estimators=1100, max_depth=17, criterion=gini, score=0.892, total=   2.9s
[CV] n_estimators=1100, max_depth=17, criterion=gini .................


[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:   13.9s remaining:    0.0s


[CV]  n_estimators=1100, max_depth=17, criterion=gini, score=0.877, total=   3.0s
[CV] n_estimators=1100, max_depth=17, criterion=gini .................


[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:   16.9s remaining:    0.0s


[CV]  n_estimators=1100, max_depth=17, criterion=gini, score=0.870, total=   2.6s
[CV] n_estimators=1100, max_depth=9, criterion=gini ..................
[CV]  n_estimators=1100, max_depth=9, criterion=gini, score=0.877, total=   2.4s
[CV] n_estimators=1100, max_depth=9, criterion=gini ..................
[CV]  n_estimators=1100, max_depth=9, criterion=gini, score=0.877, total=   2.4s
[CV] n_estimators=1100, max_depth=9, criterion=gini ..................
[CV]  n_estimators=1100, max_depth=9, criterion=gini, score=0.887, total=   2.4s
[CV] n_estimators=1100, max_depth=9, criterion=gini ..................
[CV]  n_estimators=1100, max_depth=9, criterion=gini, score=0.865, total=   2.4s
[CV] n_estimators=1100, max_depth=9, criterion=gini ..................
[CV]  n_estimators=1100, max_depth=9, criterion=gini, score=0.873, total=   2.4s
[CV] n_estimators=100, max_depth=6, criterion=entropy ................
[CV]  n_estimators=100, max_depth=6, criterion=entropy, score=0.823, total=   0.4s
[CV]

[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:  3.1min finished


Best score: 0.8859999999999999
Best parameters set:
	criterion: entropy
	max_depth: 18
	n_estimators: 1400


We have changed the grid of parameters for random search, and it seems like we even improved the results a little bit.

# 3. You can also do same thing with some kind of custom.

- You can also use some kind of pipelines.

Sometimes, you might want to use a pipeline. For example, let’s say that we are dealing with a multiclass classification problem. In this problem, the training data consists of two text columns, and you are required to build a model to predict the class. 

In [7]:
from sklearn import decomposition
from sklearn import preprocessing
from sklearn import pipeline

In [8]:
rf = ensemble.RandomForestClassifier(n_jobs=-1)


# Here I am creating pipeline
scl = preprocessing.StandardScaler()
pca = decomposition.PCA()
rf = ensemble.RandomForestClassifier(n_jobs=-1)

classifier = pipeline.Pipeline(
    [
        ("scaling", scl),
        ("pca", pca),
        ("rf", rf),
    ]

)

param_grid = {
    # Here we are using our pipeline key 
    # like #pca__, #scaling__, #rf__
    "pca__n_components" : np.arange(5, 10),
    "rf__n_estimators" : np.arange(100, 1500, 100),
    "rf__max_depth" : np.arange(1, 20),
    "rf__criterion" : ["gini", "entropy"],
}

model = model_selection.RandomizedSearchCV(
    estimator=classifier,
    param_distributions=param_grid,
    scoring="accuracy",
    n_iter=10,
    verbose=10,
    n_jobs=1,
    cv=5
)
model.fit(X, y)
print(model.best_score_)
print(model.best_estimator_.get_params())

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV] rf__n_estimators=1100, rf__max_depth=19, rf__criterion=gini, pca__n_components=8 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  rf__n_estimators=1100, rf__max_depth=19, rf__criterion=gini, pca__n_components=8, score=0.403, total=   3.1s
[CV] rf__n_estimators=1100, rf__max_depth=19, rf__criterion=gini, pca__n_components=8 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    3.2s remaining:    0.0s


[CV]  rf__n_estimators=1100, rf__max_depth=19, rf__criterion=gini, pca__n_components=8, score=0.420, total=   2.9s
[CV] rf__n_estimators=1100, rf__max_depth=19, rf__criterion=gini, pca__n_components=8 


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    6.1s remaining:    0.0s


[CV]  rf__n_estimators=1100, rf__max_depth=19, rf__criterion=gini, pca__n_components=8, score=0.365, total=   3.0s
[CV] rf__n_estimators=1100, rf__max_depth=19, rf__criterion=gini, pca__n_components=8 


[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    9.1s remaining:    0.0s


[CV]  rf__n_estimators=1100, rf__max_depth=19, rf__criterion=gini, pca__n_components=8, score=0.417, total=   2.9s
[CV] rf__n_estimators=1100, rf__max_depth=19, rf__criterion=gini, pca__n_components=8 


[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:   12.1s remaining:    0.0s


[CV]  rf__n_estimators=1100, rf__max_depth=19, rf__criterion=gini, pca__n_components=8, score=0.405, total=   3.3s
[CV] rf__n_estimators=500, rf__max_depth=17, rf__criterion=gini, pca__n_components=8 


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   15.4s remaining:    0.0s


[CV]  rf__n_estimators=500, rf__max_depth=17, rf__criterion=gini, pca__n_components=8, score=0.403, total=   1.5s
[CV] rf__n_estimators=500, rf__max_depth=17, rf__criterion=gini, pca__n_components=8 


[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:   16.9s remaining:    0.0s


[CV]  rf__n_estimators=500, rf__max_depth=17, rf__criterion=gini, pca__n_components=8, score=0.415, total=   1.5s
[CV] rf__n_estimators=500, rf__max_depth=17, rf__criterion=gini, pca__n_components=8 


[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:   18.4s remaining:    0.0s


[CV]  rf__n_estimators=500, rf__max_depth=17, rf__criterion=gini, pca__n_components=8, score=0.383, total=   1.5s
[CV] rf__n_estimators=500, rf__max_depth=17, rf__criterion=gini, pca__n_components=8 


[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:   19.9s remaining:    0.0s


[CV]  rf__n_estimators=500, rf__max_depth=17, rf__criterion=gini, pca__n_components=8, score=0.412, total=   1.5s
[CV] rf__n_estimators=500, rf__max_depth=17, rf__criterion=gini, pca__n_components=8 


[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:   21.4s remaining:    0.0s


[CV]  rf__n_estimators=500, rf__max_depth=17, rf__criterion=gini, pca__n_components=8, score=0.407, total=   1.5s
[CV] rf__n_estimators=800, rf__max_depth=6, rf__criterion=gini, pca__n_components=9 
[CV]  rf__n_estimators=800, rf__max_depth=6, rf__criterion=gini, pca__n_components=9, score=0.420, total=   2.0s
[CV] rf__n_estimators=800, rf__max_depth=6, rf__criterion=gini, pca__n_components=9 
[CV]  rf__n_estimators=800, rf__max_depth=6, rf__criterion=gini, pca__n_components=9, score=0.472, total=   1.9s
[CV] rf__n_estimators=800, rf__max_depth=6, rf__criterion=gini, pca__n_components=9 
[CV]  rf__n_estimators=800, rf__max_depth=6, rf__criterion=gini, pca__n_components=9, score=0.495, total=   2.0s
[CV] rf__n_estimators=800, rf__max_depth=6, rf__criterion=gini, pca__n_components=9 
[CV]  rf__n_estimators=800, rf__max_depth=6, rf__criterion=gini, pca__n_components=9, score=0.517, total=   1.9s
[CV] rf__n_estimators=800, rf__max_depth=6, rf__criterion=gini, pca__n_components=9 
[CV]  rf_

[Parallel(n_jobs=1)]: Done  50 out of  50 | elapsed:  2.4min finished


0.46749999999999997
{'memory': None, 'steps': [('scaling', StandardScaler()), ('pca', PCA(n_components=9)), ('rf', RandomForestClassifier(max_depth=6, n_estimators=800, n_jobs=-1))], 'verbose': False, 'scaling': StandardScaler(), 'pca': PCA(n_components=9), 'rf': RandomForestClassifier(max_depth=6, n_estimators=800, n_jobs=-1), 'scaling__copy': True, 'scaling__with_mean': True, 'scaling__with_std': True, 'pca__copy': True, 'pca__iterated_power': 'auto', 'pca__n_components': 9, 'pca__random_state': None, 'pca__svd_solver': 'auto', 'pca__tol': 0.0, 'pca__whiten': False, 'rf__bootstrap': True, 'rf__ccp_alpha': 0.0, 'rf__class_weight': None, 'rf__criterion': 'gini', 'rf__max_depth': 6, 'rf__max_features': 'auto', 'rf__max_leaf_nodes': None, 'rf__max_samples': None, 'rf__min_impurity_decrease': 0.0, 'rf__min_impurity_split': None, 'rf__min_samples_leaf': 1, 'rf__min_samples_split': 2, 'rf__min_weight_fraction_leaf': 0.0, 'rf__n_estimators': 800, 'rf__n_jobs': -1, 'rf__oob_score': False, 'rf

The score is not good after using randomizedsearchcv with pipeline.

# 4. Bayesian optimization using Gaussian Processes.

</br>
You can read here : https://scikit-optimize.github.io/stable/modules/generated/skopt.gp_minimize.html

In [9]:
from functools import partial
from skopt import space
from skopt import gp_minimize
# Function to minimize. 
# Should take a single list of parameters
# and return the objective value.
def optimize(params, param_names, x, y):
    params = dict(zip(param_names, params)) # This you cannot use when you tuning multiple params things  
    model = ensemble.RandomForestClassifier(**params)
    kf = model_selection.StratifiedKFold(n_splits=5)
    accuracies = []
    for idx in kf.split(X=x, y=y):
        train_idx, valid_idx = idx[0], idx[1]
        xtrain = x[train_idx]
        ytrain = y[train_idx]
        
        xvalid = x[valid_idx]
        yvalid = y[valid_idx]
        
        model.fit(xtrain, ytrain)
        preds = model.predict(xvalid)
        fold_acc = metrics.accuracy_score(yvalid, preds)
        accuracies.append(fold_acc)
        
    # We need to return minimize 
    return -1.0 * np.mean(accuracies)

param_space = [
    # Order is matter
    space.Integer(3, 15, name="max_depth"),
    space.Integer(100, 600, name="n_estimators"),
    space.Categorical(["gini", "entropy"], name="criterion"),
    space.Real(0.01, 1, prior="uniform", name="max_features")
]
param_names = [
    "max_depth",
    "n_estimators",
    "criterion",
    "max_features"
]
optimization_function = partial(
    optimize, 
    param_names=param_names,
    x=X,
    y=y
)

result = gp_minimize(
    optimization_function,
    dimensions=param_space,
    n_calls=15,
    n_random_starts=10,
    verbose=10,
)
print(dict(zip(param_names, result.x)))


Iteration No: 1 started. Evaluating function at random point.
Iteration No: 1 ended. Evaluation done at random point.
Time taken: 5.9935
Function value obtained: -0.7680
Current minimum: -0.7680
Iteration No: 2 started. Evaluating function at random point.
Iteration No: 2 ended. Evaluation done at random point.
Time taken: 25.6126
Function value obtained: -0.9070
Current minimum: -0.9070
Iteration No: 3 started. Evaluating function at random point.
Iteration No: 3 ended. Evaluation done at random point.
Time taken: 6.6283
Function value obtained: -0.8965
Current minimum: -0.9070
Iteration No: 4 started. Evaluating function at random point.
Iteration No: 4 ended. Evaluation done at random point.
Time taken: 5.6276
Function value obtained: -0.8735
Current minimum: -0.9070
Iteration No: 5 started. Evaluating function at random point.
Iteration No: 5 ended. Evaluation done at random point.
Time taken: 18.4096
Function value obtained: -0.8865
Current minimum: -0.9070
Iteration No: 6 started

WOW...with using gp_minimize we got improved our score.

# 5. Hyperopt: Distributed Asynchronous Hyper-parameter Optimization


In [10]:
from hyperopt import hp, fmin, tpe, Trials
from hyperopt.pyll.base import scope
from functools import partial
from skopt import space
from skopt import gp_minimize
# Function to minimize. 
# Should take a single list of parameters
# and return the objective value.
def optimize(params, x, y):
    model = ensemble.RandomForestClassifier(**params)
    kf = model_selection.StratifiedKFold(n_splits=5)
    accuracies = []
    for idx in kf.split(X=x, y=y):
        train_idx, valid_idx = idx[0], idx[1]
        xtrain = x[train_idx]
        ytrain = y[train_idx]
        
        xvalid = x[valid_idx]
        yvalid = y[valid_idx]
        
        model.fit(xtrain, ytrain)
        preds = model.predict(xvalid)
        fold_acc = metrics.accuracy_score(yvalid, preds)
        accuracies.append(fold_acc)
        
    # We need to return minimize 
    return -1.0 * np.mean(accuracies)

param_space = {
    # Order is matter
    "max_depth" : scope.int(hp.quniform("max_depth", 3, 15, 1)),
    "n_estimators" : scope.int(hp.quniform("n_estimators",100, 600, 1)),
    "criterion" : hp.choice("criterion", ["gini", "entropy"]),
    "max_features" : hp.uniform("max_features", 0.01, 1)
}

optimization_function = partial(
    optimize, 
    x=X,
    y=y
)

trials = Trials()

result = fmin(
    fn=optimization_function,
    space=param_space,
    algo=tpe.suggest,
    max_evals=15,
    trials=trials,
)
print(result)

100%|██████████| 15/15 [02:56<00:00, 11.74s/trial, best loss: -0.908]
{'criterion': 1, 'max_depth': 9.0, 'max_features': 0.48578161454504437, 'n_estimators': 385.0}


- WOW...with using hyperopt we improved our score futhure more.

# 6. Optuna
</br>
Take a look : https://optuna.org/

In [11]:
import optuna
from functools import partial
from skopt import space
from skopt import gp_minimize
from hyperopt import Trials

# Function to minimize. 
# Should take a single list of parameters
# and return the objective value.
def optimize(trails, x, y):
    criterion = trails.suggest_categorical("criterion", ["gini", "entropy"])
    n_estimators = trails.suggest_int("n_estimators", 100, 1500)
    max_depth = trails.suggest_int("max_depth", 3, 15)
    max_features = trails.suggest_uniform("max_features", 0.01, 1.0)
    
    model = ensemble.RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        max_features=max_features,
        criterion=criterion
    )
    kf = model_selection.StratifiedKFold(n_splits=5)
    accuracies = []
    for idx in kf.split(X=x, y=y):
        train_idx, valid_idx = idx[0], idx[1]
        xtrain = x[train_idx]
        ytrain = y[train_idx]
        
        xvalid = x[valid_idx]
        yvalid = y[valid_idx]
        
        model.fit(xtrain, ytrain)
        preds = model.predict(xvalid)
        fold_acc = metrics.accuracy_score(yvalid, preds)
        accuracies.append(fold_acc)
        
    # We need to return minimize 
    return -1.0 * np.mean(accuracies)

trails = Trials()
optimization_function = partial(optimize, x=X, y=y)

study = optuna.create_study(direction="minimize")

study.optimize(optimization_function, n_trials=15)

[32m[I 2021-09-01 09:59:01,684][0m A new study created in memory with name: no-name-a3ea1951-5e66-4884-9549-a0d43a2252e5[0m
[32m[I 2021-09-01 09:59:22,089][0m Trial 0 finished with value: -0.8019999999999999 and parameters: {'criterion': 'entropy', 'n_estimators': 1392, 'max_depth': 3, 'max_features': 0.2574186771130284}. Best is trial 0 with value: -0.8019999999999999.[0m
[32m[I 2021-09-01 09:59:26,360][0m Trial 1 finished with value: -0.8714999999999999 and parameters: {'criterion': 'gini', 'n_estimators': 186, 'max_depth': 5, 'max_features': 0.6416579712277222}. Best is trial 1 with value: -0.8714999999999999.[0m
[32m[I 2021-09-01 09:59:33,915][0m Trial 2 finished with value: -0.884 and parameters: {'criterion': 'gini', 'n_estimators': 294, 'max_depth': 6, 'max_features': 0.6355328535218027}. Best is trial 2 with value: -0.884.[0m
[32m[I 2021-09-01 09:59:38,210][0m Trial 3 finished with value: -0.8560000000000001 and parameters: {'criterion': 'gini', 'n_estimators': 29

Wow... again we improved our accuracy.
So, for this situation the Optuna scored high.
0.9050

#################################################</br>
I like to try to tune my hyper parameters manually  first and then choose a range of values and then 
throw in some kind of Optimization Algorithm.</br>
##################################################