Parameters that control the training/fitting process of a model to a data in called as <b>Hyperparamters</b>.

<b>Hyper-parameter Optimization</b> is the process of choosing the mot suitable set of hyper-parameter so that the model can deliever the best results possible.

Here we take an examples where we have 3 parameters to tune inorder to the get the highest accuracy possible.

best_accurac = 0
best_parameters = {"a":0,"b":0,"c":0}
for a in range(1, 11):
    for b in range(1, 11):
        for c in range(1, 11):
            # inititalize model with current parameters
            model = MODEL(a, b, c)
            # fit the model
            model.fit(training_data)
            # make predictions
            preds = model.predict(validation_data)
            # calculate accuracy
            accuracy = metrics.accuracy_score(targets, preds)
            # save params if current accuracy
            # is greater than best accuracy
            if accuracy > best_accuracy:
                best_accuracy = accuracy
                best_parameters["a"] = a
                best_parameters["b"] = b
                best_parameters["c"] = c
            

Example for <b>RandomForestClassifier</b> hyper-parameters
<p> <br> </p>

`RandomForestClassifier(
    n_estimators=100,
    criterion='gini',
    max_depth=None,
    min_samples_split=2,
    min_samples_leaf=1,
    min_weight_fraction_leaf=0.0,
    max_features='auto',
    max_leaf_nodes=None,
    min_impurity_decrease=0.0,
    min_impurity_split=None,
    bootstrap=True,
    oob_score=False,
    n_jobs=None,
    random_state=None,
    verbose=0,
    warm_start=False,
    class_weight=None,
    ccp_alpha=0.0,
    max_samples=None,
)`

Tuning all the parameters in not possible so,<br>
we use <b>Grid Search</b> which enables us to provide a combination of hyper-parameter to the model and let it training on all the combinations given and land to the best accuracy possible.

In [2]:
import numpy as np
import pandas as pd
from sklearn import ensemble
from sklearn import metrics
from sklearn import model_selection
df = pd.read_csv("input/mobile_train.csv")
df.head()

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,...,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
0,842,0,2.2,0,1,0,7,0.6,188,2,...,20,756,2549,9,7,19,0,0,1,1
1,1021,1,0.5,1,0,1,53,0.7,136,3,...,905,1988,2631,17,3,7,1,1,0,2
2,563,1,0.5,1,2,1,41,0.9,145,5,...,1263,1716,2603,11,2,9,1,1,0,2
3,615,1,2.5,0,0,0,10,0.8,131,6,...,1216,1786,2769,16,8,11,1,0,0,2
4,1821,1,1.2,0,13,1,44,0.6,141,2,...,1208,1212,1411,8,2,15,1,1,0,1


Code for implementing ><b>Grid Search</b>
I am using small no.of combination inorder to reduce the execution time.

In [5]:
if __name__ == "__main__":
    df = pd.read_csv("input/mobile_train.csv")
    X = df.drop("price_range",axis=1).values
    y = df.price_range.values
    
    classifier = ensemble.RandomForestClassifier(n_jobs=-1)
    
    param_grid = {
        "n_estimators":[100,200,250],
        "max_depth":[1,2,5],
        "criterion":['gini','entropy']
    }
    
    model = model_selection.GridSearchCV(
        estimator = classifier,
        param_grid = param_grid,
        scoring = "accuracy",
        verbose=10,
        n_jobs=1,
        cv=5
    )
    
    model.fit(X,y)
    print(f"Best Score: {model.best_score_}")
    print("Best prrmeter set:")
    best_parameters = model.best_estimator_.get_params()
    for param_name in sorted(param_grid.keys()):
        print(f"\t{param_name}:{best_parameters[param_name]}")
    

Fitting 5 folds for each of 18 candidates, totalling 90 fits
[CV] criterion=gini, max_depth=1, n_estimators=100 ...................


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  criterion=gini, max_depth=1, n_estimators=100, score=0.580, total=   3.4s
[CV] criterion=gini, max_depth=1, n_estimators=100 ...................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    3.3s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=100, score=0.575, total=   0.3s
[CV] criterion=gini, max_depth=1, n_estimators=100 ...................


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    3.6s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=100, score=0.630, total=   0.3s
[CV] criterion=gini, max_depth=1, n_estimators=100 ...................


[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    3.9s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=100, score=0.557, total=   0.3s
[CV] criterion=gini, max_depth=1, n_estimators=100 ...................


[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    4.2s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=100, score=0.532, total=   0.3s
[CV] criterion=gini, max_depth=1, n_estimators=200 ...................


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    4.5s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=200, score=0.598, total=   0.4s
[CV] criterion=gini, max_depth=1, n_estimators=200 ...................


[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    4.9s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=200, score=0.600, total=   0.4s
[CV] criterion=gini, max_depth=1, n_estimators=200 ...................


[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:    5.3s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=200, score=0.593, total=   0.5s
[CV] criterion=gini, max_depth=1, n_estimators=200 ...................


[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:    5.8s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=200, score=0.555, total=   0.4s
[CV] criterion=gini, max_depth=1, n_estimators=200 ...................


[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    6.2s remaining:    0.0s


[CV]  criterion=gini, max_depth=1, n_estimators=200, score=0.595, total=   0.4s
[CV] criterion=gini, max_depth=1, n_estimators=250 ...................
[CV]  criterion=gini, max_depth=1, n_estimators=250, score=0.593, total=   0.5s
[CV] criterion=gini, max_depth=1, n_estimators=250 ...................
[CV]  criterion=gini, max_depth=1, n_estimators=250, score=0.595, total=   0.5s
[CV] criterion=gini, max_depth=1, n_estimators=250 ...................
[CV]  criterion=gini, max_depth=1, n_estimators=250, score=0.618, total=   0.4s
[CV] criterion=gini, max_depth=1, n_estimators=250 ...................
[CV]  criterion=gini, max_depth=1, n_estimators=250, score=0.613, total=   0.5s
[CV] criterion=gini, max_depth=1, n_estimators=250 ...................
[CV]  criterion=gini, max_depth=1, n_estimators=250, score=0.595, total=   0.4s
[CV] criterion=gini, max_depth=2, n_estimators=100 ...................
[CV]  criterion=gini, max_depth=2, n_estimators=100, score=0.750, total=   0.3s
[CV] criterion

[CV]  criterion=entropy, max_depth=2, n_estimators=100, score=0.655, total=   0.3s
[CV] criterion=entropy, max_depth=2, n_estimators=100 ................
[CV]  criterion=entropy, max_depth=2, n_estimators=100, score=0.637, total=   0.3s
[CV] criterion=entropy, max_depth=2, n_estimators=200 ................
[CV]  criterion=entropy, max_depth=2, n_estimators=200, score=0.677, total=   0.4s
[CV] criterion=entropy, max_depth=2, n_estimators=200 ................
[CV]  criterion=entropy, max_depth=2, n_estimators=200, score=0.657, total=   0.5s
[CV] criterion=entropy, max_depth=2, n_estimators=200 ................
[CV]  criterion=entropy, max_depth=2, n_estimators=200, score=0.695, total=   0.5s
[CV] criterion=entropy, max_depth=2, n_estimators=200 ................
[CV]  criterion=entropy, max_depth=2, n_estimators=200, score=0.767, total=   0.5s
[CV] criterion=entropy, max_depth=2, n_estimators=200 ................
[CV]  criterion=entropy, max_depth=2, n_estimators=200, score=0.682, total= 

[Parallel(n_jobs=1)]: Done  90 out of  90 | elapsed:   43.1s finished


Best Score: 0.8425
Best prrmeter set:
	criterion:entropy
	max_depth:5
	n_estimators:200
