> I show you how you can use different hyperparameter optimization techniques and libraries to tune hyperparameters of almost any kind of model or just to optimize any function! 

Contents:-
1. GridSearchCV - score(0.8805)
2. RandomizedSearchCV - score(0.8825) 
3.  Grid/Random Search with Pipelines. - score(0.45990
4. Bayesian optimization using Gaussian Processes. - score(0.9050)
5. Hyperopt - score(0.9075)
6. Optuna - score(0.9085) ---Best

In [None]:
import pandas as pd
import numpy as np
from sklearn import ensemble 
from sklearn import metrics
from sklearn import model_selection


In [None]:
df = pd.read_csv("../input/mobile-price-classification/train.csv")
# features are all columns without price_range
# note that there is no id column in this dataset
# here we have training features
X = df.drop("price_range", axis=1).values
# and the targets
y = df.price_range.values
X.shape, y.shape

In [None]:
"""
RandomForestClassifier( 
    n_estimators=100, 
    criterion='gini', 
    max_depth=None, 
    min_samples_split=2,
    min_samples_leaf=1, 
    min_weight_fraction_leaf=0.0,
    max_features='auto',
    max_leaf_nodes=None,
    min_impurity_decrease=0.0, 
    min_impurity_split=None, 
    bootstrap=True,
    oob_score=False,
    n_jobs=None,
    random_state=None,
    verbose=0,
    warm_start=False, 
    class_weight=None,
    ccp_alpha=0.0, 
    max_samples=None,
)

There are nineteen parameters,
and all the combinations of all 
these parameters for all the values
they can assume are going to be infinite. 
Normally, we don’t have the resource and
time to do this. Thus, we specify a
grid of parameters. A search over this
grid to find the best combination of parameters 
is known as grid search. 
We can say that n_estimators can be
100, 200, 250, 300, 400, 500; 
max_depth can be 1, 2, 5, 7, 11, 15 and 
criterion can be gini or entropy. 
These may not look like a lot of parameters, 
but it would take a lot of time for computation
if the dataset is too large.
"""

# 1. Grid_search. </br>
Grid-search is used to find the optimal hyperparameters of a model which results in the most ‘accurate’ predictions.

In [None]:
# define the model here
# i am using random forest with n_jobs=-1
# n_jobs=-1 => use all cores
classifier = ensemble.RandomForestClassifier(n_jobs=-1)
# define a grid of parameters
# this can be a dictionary or a list of 
# dictionaries
param_grid = {
    "n_estimators" : [100, 200, 300, 400],
    "max_depth" : [1, 4, 6, 9],
    "criterion" : ["gini", "entropy"],
}

# initialize grid search
# estimator is the model that we have defined
# param_grid is the grid of parameters
# we use accuracy as our metric. you can define your own
# higher value of verbose implies a lot of details are printed
# cv=5 means that we are using 5 fold cv (not stratified)

model = model_selection.GridSearchCV(
    estimator=classifier,
    param_grid=param_grid,
    scoring="accuracy",
    verbose=10,
    n_jobs=1,
    cv=5
)

# fit the model and extract best score
model.fit(X, y)


In [None]:
print(model.best_score_)
print(model.best_estimator_.get_params())

we see that our best five fold accuracy score was 0.8805 and we have the best parameters from our grid search. 

WOW, it is amazing to see model itself choose its best values. </br>These are the best hyperparameter values for this model. </br>
You can try with different parameters values and see if the accuracy score goes high or not, play with parameters.

- It must also be noted that if you have k- fold cross-validation, you need even more loops which implies even more time to find the perfect parameters. Grid search is therefore not very popular.

# 2. Randomized Search CV </br>
RandomizedSearchCV is very useful when we have many parameters to try and the training time is very long. For this example, I use a random-forest classifier, so I suppose you already know how this kind of algorithm works.

In random search, we randomly select a combination of parameters and calculate the cross-validation score. The time consumed here is less than grid search because we do not evaluate over all different combinations of parameters. We choose how many times we want to evaluate our models, and that’s what decides how much time the search takes.

In [None]:
# define the model here
# i am using random forest with n_jobs=-1
# n_jobs=-1 => use all cores
classifier = ensemble.RandomForestClassifier(n_jobs=-1)

# define a grid of parameters
# this can be a dictionary or a list of # dictionaries
param_grid = {
    "n_estimators" : np.arange(100, 1500, 100),
    "max_depth" : np.arange(1, 20),
    "criterion" : ["gini", "entropy"],
}

# initialize random search
# estimator is the model that we have defined
# param_distributions is the grid/distribution of parameters
# we use accuracy as our metric. you can define your own
# higher value of verbose implies a lot of details are printed
# cv=5 means that we are using 5 fold cv (not stratified)
# n_iter is the number of iterations we want
# if param_distributions has all the values as list,
# random search will be done by sampling without replacement
# if any of the parameters come from a distribution,
# random search uses sampling with replacement
model = model_selection.RandomizedSearchCV(
    estimator=classifier,
    param_distributions=param_grid,
    scoring="accuracy",
    n_iter=20,
    verbose=10,
    n_jobs=1,
    cv=5
)
model.fit(X, y)
print(f"Best score: {model.best_score_}")

print("Best parameters set:")
best_parameters = model.best_estimator_.get_params()
for param_name in sorted(param_grid.keys()):
    print(f"\t{param_name}: {best_parameters[param_name]}")

We have changed the grid of parameters for random search, and it seems like we even improved the results a little bit.

# 3. You can also do same thing with some kind of custom.

- You can also use some kind of pipelines.

Sometimes, you might want to use a pipeline. For example, let’s say that we are dealing with a multiclass classification problem. In this problem, the training data consists of two text columns, and you are required to build a model to predict the class. 

In [None]:
from sklearn import decomposition
from sklearn import preprocessing
from sklearn import pipeline

In [None]:
rf = ensemble.RandomForestClassifier(n_jobs=-1)


# Here I am creating pipeline
scl = preprocessing.StandardScaler()
pca = decomposition.PCA()
rf = ensemble.RandomForestClassifier(n_jobs=-1)

classifier = pipeline.Pipeline(
    [
        ("scaling", scl),
        ("pca", pca),
        ("rf", rf),
    ]

)

param_grid = {
    # Here we are using our pipeline key 
    # like #pca__, #scaling__, #rf__
    "pca__n_components" : np.arange(5, 10),
    "rf__n_estimators" : np.arange(100, 1500, 100),
    "rf__max_depth" : np.arange(1, 20),
    "rf__criterion" : ["gini", "entropy"],
}

model = model_selection.RandomizedSearchCV(
    estimator=classifier,
    param_distributions=param_grid,
    scoring="accuracy",
    n_iter=10,
    verbose=10,
    n_jobs=1,
    cv=5
)
model.fit(X, y)
print(model.best_score_)
print(model.best_estimator_.get_params())

The score is not good after using randomizedsearchcv with pipeline.

# 4. Bayesian optimization using Gaussian Processes.

</br>
You can read here : https://scikit-optimize.github.io/stable/modules/generated/skopt.gp_minimize.html

In [None]:
from functools import partial
from skopt import space
from skopt import gp_minimize
# Function to minimize. 
# Should take a single list of parameters
# and return the objective value.
def optimize(params, param_names, x, y):
    params = dict(zip(param_names, params)) # This you cannot use when you tuning multiple params things  
    model = ensemble.RandomForestClassifier(**params)
    kf = model_selection.StratifiedKFold(n_splits=5)
    accuracies = []
    for idx in kf.split(X=x, y=y):
        train_idx, valid_idx = idx[0], idx[1]
        xtrain = x[train_idx]
        ytrain = y[train_idx]
        
        xvalid = x[valid_idx]
        yvalid = y[valid_idx]
        
        model.fit(xtrain, ytrain)
        preds = model.predict(xvalid)
        fold_acc = metrics.accuracy_score(yvalid, preds)
        accuracies.append(fold_acc)
        
    # We need to return minimize 
    return -1.0 * np.mean(accuracies)

param_space = [
    # Order is matter
    space.Integer(3, 15, name="max_depth"),
    space.Integer(100, 600, name="n_estimators"),
    space.Categorical(["gini", "entropy"], name="criterion"),
    space.Real(0.01, 1, prior="uniform", name="max_features")
]
param_names = [
    "max_depth",
    "n_estimators",
    "criterion",
    "max_features"
]
optimization_function = partial(
    optimize, 
    param_names=param_names,
    x=X,
    y=y
)

result = gp_minimize(
    optimization_function,
    dimensions=param_space,
    n_calls=15,
    n_random_starts=10,
    verbose=10,
)
print(dict(zip(param_names, result.x)))


WOW...with using gp_minimize we got improved our score.

# 5. Hyperopt: Distributed Asynchronous Hyper-parameter Optimization


In [None]:
from hyperopt import hp, fmin, tpe, Trials
from hyperopt.pyll.base import scope
from functools import partial
from skopt import space
from skopt import gp_minimize
# Function to minimize. 
# Should take a single list of parameters
# and return the objective value.
def optimize(params, x, y):
    model = ensemble.RandomForestClassifier(**params)
    kf = model_selection.StratifiedKFold(n_splits=5)
    accuracies = []
    for idx in kf.split(X=x, y=y):
        train_idx, valid_idx = idx[0], idx[1]
        xtrain = x[train_idx]
        ytrain = y[train_idx]
        
        xvalid = x[valid_idx]
        yvalid = y[valid_idx]
        
        model.fit(xtrain, ytrain)
        preds = model.predict(xvalid)
        fold_acc = metrics.accuracy_score(yvalid, preds)
        accuracies.append(fold_acc)
        
    # We need to return minimize 
    return -1.0 * np.mean(accuracies)

param_space = {
    # Order is matter
    "max_depth" : scope.int(hp.quniform("max_depth", 3, 15, 1)),
    "n_estimators" : scope.int(hp.quniform("n_estimators",100, 600, 1)),
    "criterion" : hp.choice("criterion", ["gini", "entropy"]),
    "max_features" : hp.uniform("max_features", 0.01, 1)
}

optimization_function = partial(
    optimize, 
    x=X,
    y=y
)

trials = Trials()

result = fmin(
    fn=optimization_function,
    space=param_space,
    algo=tpe.suggest,
    max_evals=15,
    trials=trials,
)
print(result)

- WOW...with using hyperopt we improved our score futhure more.

# 6. Optuna
</br>
Take a look : https://optuna.org/

In [None]:
import optuna
from functools import partial
from skopt import space
from skopt import gp_minimize
from hyperopt import Trials

# Function to minimize. 
# Should take a single list of parameters
# and return the objective value.
def optimize(trails, x, y):
    criterion = trails.suggest_categorical("criterion", ["gini", "entropy"])
    n_estimators = trails.suggest_int("n_estimators", 100, 1500)
    max_depth = trails.suggest_int("max_depth", 3, 15)
    max_features = trails.suggest_uniform("max_features", 0.01, 1.0)
    
    model = ensemble.RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        max_features=max_features,
        criterion=criterion
    )
    kf = model_selection.StratifiedKFold(n_splits=5)
    accuracies = []
    for idx in kf.split(X=x, y=y):
        train_idx, valid_idx = idx[0], idx[1]
        xtrain = x[train_idx]
        ytrain = y[train_idx]
        
        xvalid = x[valid_idx]
        yvalid = y[valid_idx]
        
        model.fit(xtrain, ytrain)
        preds = model.predict(xvalid)
        fold_acc = metrics.accuracy_score(yvalid, preds)
        accuracies.append(fold_acc)
        
    # We need to return minimize 
    return -1.0 * np.mean(accuracies)

trails = Trials()
optimization_function = partial(optimize, x=X, y=y)

study = optuna.create_study(direction="minimize")

study.optimize(optimization_function, n_trials=15)

Wow... again we improved our accuracy.
So, for this situation the Optuna scored high.
0.9050

#################################################</br>
I like to try to tune my hyper parameters manually  first and then choose a range of values and then 
throw in some kind of Optimization Algorithm.</br>
##################################################