# Hyperparameter Optimization

With great models , comes the great problem of optimizing hyperparameters to get the best scoring model. So, **what is this hyperparameters optimization?**
Suppose there is a simple pipeline for your machine learning project. there is a dataset, you directly apply a model, and then you have results. The parameters that the model has here are know as hyper-parameters, i.e. the parameters that control the training/fitting process of the model.
So, **how would you find the best parameters?**
A methos swould be to evaluate all the combinations and see which one improves the metric.

let's look at the random forest model from scikit-learn.

In [None]:
from sklearn.ensemble import RandomForestClassifier
RandomForestClassifier(
    n_estimators=100,
    criterion="gini", 
    max_depth=None, 
    min_samples_split=2, 
    min_samples_leaf=1, 
    min_weight_fraction_leaf=0., 
    max_features="auto", 
    max_leaf_nodes=None, 
    min_impurity_decrease=0., 
    min_impurity_split=None, 
    bootstrap=True, 
    oob_score=False, 
    n_jobs=None, 
    random_state=None, 
    verbose=0, 
    warm_start=False, 
    class_weight=None, 
    ccp_alpha=0.0, 
    max_samples=None)
)

There are nineteen parameters, and all the combinations of all these paramters for all the values they can assume are going to be infinite. Normally, we don't have the resource and time to do this. Thus, we specify a *grid* of parameters . A search over this grid to find the best combination of parameters is know as **grid search** . we can say the *"n_estimator"* can be 100, 250, 300, 500; *"max_depth"* can be 1,2,5,7,11,15 and *"criterion"* can be "gini" or "entropy". These may not look like a lot of parameters but it would take a lot of times for computationif the dataset is too large.
Grid search is not very popular. let's look at how it is done with an example of **predicting mobile phone price range** given the specifications.

In [3]:
import pandas as pd 

df = pd.read_csv('dataset/train.csv')
df.head()

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,pc,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
0,842,0,2.2,0,1,0,7,0.6,188,2,2,20,756,2549,9,7,19,0,0,1,1
1,1021,1,0.5,1,0,1,53,0.7,136,3,6,905,1988,2631,17,3,7,1,1,0,2
2,563,1,0.5,1,2,1,41,0.9,145,5,6,1263,1716,2603,11,2,9,1,1,0,2
3,615,1,2.5,0,0,0,10,0.8,131,6,9,1216,1786,2769,16,8,11,1,0,0,2
4,1821,1,1.2,0,13,1,44,0.6,141,2,14,1208,1212,1411,8,2,15,1,1,0,1


In [4]:
df.shape

(2000, 21)

we have 20 features and 2000 samples and a range of price which has 4 categories from 0 to 3. we can easily use strtified kfold and accuracy as a metric to evaluate.
let's create a python file **rf_grid_search.py**

In the end, we see that our best five fold accuracy score was 0.889, and we have the best parameters from our grid search. Next best thing that we can use is random search. In **random search**, we randomly select a combination of parameters and calculate the cross-validation score. The time consumed here is less than grid search because we do not evaluate over all different combinations of parameters. we choose how many times we want to evaluate our models, and that's what decides how much time the search takes, We use **RandomizeSearchCV** . let's create a python file **rf_random_search.py**

Random searrch is faster than grid search if the number of iterations is less. Using these two, we can find the optimal parameters for all kind of models as long as they have a fit and predict function, which is the standard of scikit-learn.

Sometimes, we might want to use a pipeline. For example , let's say that we are dealing with a multiclass classification problem. In this problem, the training data consists of two text columns , and we are required to build a model to predict the calss. Let's assume that the pipeline you choose is to first apply tf-idf in a semi supervised manner and then use SVD with SVM classifier. Now , the problem is we have to select  the components of SVD and also need to tune the parameters of SVM. How to do this ? Let's build a python file **pipline_search.py**

The pipeline show in the python file **pipeline_search.py** has SVD (singular value decomposition) , standard scaling and an SVM (support vector machine) model. please note that we won't be able to run the above code since we don't have data

## Advance Hyperparameter Optimization

When we go into Advance hyperparameters optimization techniques, we can take a look at **minimization of functions** using different kinds of minimization algorithms. This can be achieved by using many minimization functions such as :
- Downhill simplex algorithms
- Nelder-Mead optimization
- Bayesian technique with Gaussian process for finding optimal parameters
- Genetic algorithms

let's see how the Gaussian process can be used for hyperparameters optimization . These kinds of algorithms need a function they can optimize. Most of the time , it's about the minimization of this funtion, like we **minimiza loss**

So, let's say, we want to find the best parameters for best accuracy and obiously , the more the accuracy is better. Now we cannot minimize the accuracy , but we can minimize it when we multiply it by -1. this way, we are minimizing the negative of accuracy, but in fact we are maximizing accuracy. Using **Bayesian optimization with Gaussian process** can be accomplished by using **gp_minimize** function from scikit-learn (skopt) libaray. 
Let's take a look at how we can tune the parameters of our random forest model using this function.

we will be creating a python file **rf_gp_minimize.py**