# Hyperparameter Tuning

### All Techniques Of Hyper Parameter Optimization

# GridSearchCV
- RandomizedSearchCV
- Bayesian Optimization -Automate Hyperparameter Tuning (Hyperopt)
- Sequential Model Based Optimization(Tuning a scikit-learn estimator with skopt)
- Optuna- Automate Hyperparameter Tuning
- Genetic Algorithms (TPOT Classifier)

#### References

- https://github.com/fmfn/BayesianOptimization
- https://github.com/hyperopt/hyperopt
- https://www.jeremyjordan.me/hyperparameter-tuning/
- https://optuna.org/
- https://towardsdatascience.com/hyperparameters-optimization-526348bb8e2d(By Pier Paolo Ippolito )
- https://scikit-optimize.github.io/stable/auto_examples/hyperparameter-optimization.html

In [4]:
import warnings
warnings.filterwarnings('ignore')

In [5]:
import pandas as pd
df=pd.read_csv('diabetes.csv')
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [6]:
#in the glucose,skinthichkness,insulin columns we are replacing the value 0 with the median value of the feature.
import numpy as np
df['Glucose']=np.where(df['Glucose']==0,df['Glucose'].median(),df['Glucose'])
df['SkinThickness']=np.where(df['SkinThickness']==0,df['SkinThickness'].median(),df['SkinThickness'])
df['Insulin	']=np.where(df['Insulin']==0,df['Insulin'].median(),df['Insulin'])
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome,Insulin\t
0,6,148.0,72,35.0,0,33.6,0.627,50,1,30.5
1,1,85.0,66,29.0,0,26.6,0.351,31,0,30.5
2,8,183.0,64,23.0,0,23.3,0.672,32,1,30.5
3,1,89.0,66,23.0,94,28.1,0.167,21,0,94.0
4,0,137.0,40,35.0,168,43.1,2.288,33,1,168.0


In [7]:
#### Independent And Dependent features
X=df.drop('Outcome',axis=1)
y=df['Outcome']

In [8]:

pd.DataFrame(X,columns=df.columns[:-1])

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148.0,72,35.0,0,33.6,0.627,50,
1,1,85.0,66,29.0,0,26.6,0.351,31,
2,8,183.0,64,23.0,0,23.3,0.672,32,
3,1,89.0,66,23.0,94,28.1,0.167,21,
4,0,137.0,40,35.0,168,43.1,2.288,33,
...,...,...,...,...,...,...,...,...,...
763,10,101.0,76,48.0,180,32.9,0.171,63,
764,2,122.0,70,27.0,0,36.8,0.340,27,
765,5,121.0,72,23.0,112,26.2,0.245,30,
766,1,126.0,60,23.0,0,30.1,0.349,47,


In [9]:

#### Train Test Split
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.20,random_state=0)

we have used Random Forest classifier even though we have many different algorithms  because it has a lot of parameters and we can play with them. Initially we can call the random forest classifier without mentioning the hyperparameters then the default parameters will be selected, but how can we say that the default parameters are perfect to get the best accuracy, for this purpose we do hyper parameter tuning. The Hyper parameters change for each dataset.

In [10]:

from sklearn.ensemble import RandomForestClassifier
rf_classifier=RandomForestClassifier(n_estimators=10).fit(X_train,y_train)
prediction=rf_classifier.predict(X_test)

In [11]:
y.value_counts()

0    500
1    268
Name: Outcome, dtype: int64

In [12]:
from sklearn.metrics import confusion_matrix,classification_report,accuracy_score
print(confusion_matrix(y_test,prediction))
print(accuracy_score(y_test,prediction))
print(classification_report(y_test,prediction))

[[94 13]
 [17 30]]
0.8051948051948052
              precision    recall  f1-score   support

           0       0.85      0.88      0.86       107
           1       0.70      0.64      0.67        47

    accuracy                           0.81       154
   macro avg       0.77      0.76      0.76       154
weighted avg       0.80      0.81      0.80       154



In [13]:
### Manual Hyperparameter Tuning
#here we took values randomly on our own and we are trying to get the result and that worked in the same way we change the values and see the result
#but this should not be done based on our mood it should be done using some techniques
model=RandomForestClassifier(n_estimators=300,criterion='entropy',
                             max_features='sqrt',min_samples_leaf=10,random_state=100).fit(X_train,y_train)
predictions=model.predict(X_test)
print(confusion_matrix(y_test,predictions))
print(accuracy_score(y_test,predictions))
print(classification_report(y_test,predictions))

[[98  9]
 [17 30]]
0.8311688311688312
              precision    recall  f1-score   support

           0       0.85      0.92      0.88       107
           1       0.77      0.64      0.70        47

    accuracy                           0.83       154
   macro avg       0.81      0.78      0.79       154
weighted avg       0.83      0.83      0.83       154



## Which crossvalidation techinque we should use first?
### 1.Grid Search
### 2.Random Search

since random search cv tries to search for the best hyperparameters randomly and is faster than the grid search.
we should always go with random search cv.

### Randomized Search Cv

In [11]:
import numpy as np
from sklearn.model_selection import RandomizedSearchCV
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt','log2']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 1000,10)]
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10,14]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4,6,8]
# Create the random grid
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
              'criterion':['entropy','gini']}
print(random_grid)

{'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000], 'max_features': ['auto', 'sqrt', 'log2'], 'max_depth': [10, 120, 230, 340, 450, 560, 670, 780, 890, 1000], 'min_samples_split': [2, 5, 10, 14], 'min_samples_leaf': [1, 2, 4, 6, 8], 'criterion': ['entropy', 'gini']}


In [12]:
rf=RandomForestClassifier()
rf_randomcv=RandomizedSearchCV(estimator=rf,param_distributions=random_grid,n_iter=100,cv=3,verbose=2,
                               random_state=100,n_jobs=-1)
### fit the randomized model
rf_randomcv.fit(X_train,y_train)

Fitting 3 folds for each of 100 candidates, totalling 300 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:   32.4s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:  1.9min
[Parallel(n_jobs=-1)]: Done 300 out of 300 | elapsed:  3.6min finished


RandomizedSearchCV(cv=3, estimator=RandomForestClassifier(), n_iter=100,
                   n_jobs=-1,
                   param_distributions={'criterion': ['entropy', 'gini'],
                                        'max_depth': [10, 120, 230, 340, 450,
                                                      560, 670, 780, 890,
                                                      1000],
                                        'max_features': ['auto', 'sqrt',
                                                         'log2'],
                                        'min_samples_leaf': [1, 2, 4, 6, 8],
                                        'min_samples_split': [2, 5, 10, 14],
                                        'n_estimators': [200, 400, 600, 800,
                                                         1000, 1200, 1400, 1600,
                                                         1800, 2000]},
                   random_state=100, verbose=2)

In [13]:
rf_randomcv.best_params_

{'n_estimators': 600,
 'min_samples_split': 2,
 'min_samples_leaf': 1,
 'max_features': 'log2',
 'max_depth': 780,
 'criterion': 'gini'}

In [14]:
rf_randomcv

RandomizedSearchCV(cv=3, estimator=RandomForestClassifier(), n_iter=100,
                   n_jobs=-1,
                   param_distributions={'criterion': ['entropy', 'gini'],
                                        'max_depth': [10, 120, 230, 340, 450,
                                                      560, 670, 780, 890,
                                                      1000],
                                        'max_features': ['auto', 'sqrt',
                                                         'log2'],
                                        'min_samples_leaf': [1, 2, 4, 6, 8],
                                        'min_samples_split': [2, 5, 10, 14],
                                        'n_estimators': [200, 400, 600, 800,
                                                         1000, 1200, 1400, 1600,
                                                         1800, 2000]},
                   random_state=100, verbose=2)

In [15]:
best_random_grid=rf_randomcv.best_estimator_

In [16]:
from sklearn.metrics import accuracy_score
y_pred=best_random_grid.predict(X_test)
print(confusion_matrix(y_test,y_pred))
print("Accuracy Score {}".format(accuracy_score(y_test,y_pred)))
print("Classification report: {}".format(classification_report(y_test,y_pred)))

[[92 15]
 [14 33]]
Accuracy Score 0.8116883116883117
Classification report:               precision    recall  f1-score   support

           0       0.87      0.86      0.86       107
           1       0.69      0.70      0.69        47

    accuracy                           0.81       154
   macro avg       0.78      0.78      0.78       154
weighted avg       0.81      0.81      0.81       154



In [17]:
rf_randomcv.best_params_

{'n_estimators': 600,
 'min_samples_split': 2,
 'min_samples_leaf': 1,
 'max_features': 'log2',
 'max_depth': 780,
 'criterion': 'gini'}

### GridSearch CV

In the grid search cv we will take some parameters values which have risen as best hyperparameters in Random Search CV.

In [18]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'criterion': [rf_randomcv.best_params_['criterion']],
    'max_depth': [rf_randomcv.best_params_['max_depth']],
    'max_features': [rf_randomcv.best_params_['max_features']],
    'min_samples_leaf': [rf_randomcv.best_params_['min_samples_leaf'], 
                         rf_randomcv.best_params_['min_samples_leaf']+2, 
                         rf_randomcv.best_params_['min_samples_leaf'] + 4],
    'min_samples_split': [rf_randomcv.best_params_['min_samples_split'] - 2,
                          rf_randomcv.best_params_['min_samples_split'] - 1,
                          rf_randomcv.best_params_['min_samples_split'], 
                          rf_randomcv.best_params_['min_samples_split'] +1,
                          rf_randomcv.best_params_['min_samples_split'] + 2],
    'n_estimators': [rf_randomcv.best_params_['n_estimators'] - 200, rf_randomcv.best_params_['n_estimators'] - 100, 
                     rf_randomcv.best_params_['n_estimators'], 
                     rf_randomcv.best_params_['n_estimators'] + 100, rf_randomcv.best_params_['n_estimators'] + 200]
}

print(param_grid)

{'criterion': ['gini'], 'max_depth': [780], 'max_features': ['log2'], 'min_samples_leaf': [1, 3, 5], 'min_samples_split': [0, 1, 2, 3, 4], 'n_estimators': [400, 500, 600, 700, 800]}


In [19]:
#### Fit the grid_search to the data
rf=RandomForestClassifier()
grid_search=GridSearchCV(estimator=rf,param_grid=param_grid,cv=10,n_jobs=-1,verbose=2)
grid_search.fit(X_train,y_train)

Fitting 10 folds for each of 75 candidates, totalling 750 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    2.2s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:   26.9s
[Parallel(n_jobs=-1)]: Done 349 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 632 tasks      | elapsed:  2.4min
[Parallel(n_jobs=-1)]: Done 750 out of 750 | elapsed:  3.2min finished


GridSearchCV(cv=10, estimator=RandomForestClassifier(), n_jobs=-1,
             param_grid={'criterion': ['gini'], 'max_depth': [780],
                         'max_features': ['log2'],
                         'min_samples_leaf': [1, 3, 5],
                         'min_samples_split': [0, 1, 2, 3, 4],
                         'n_estimators': [400, 500, 600, 700, 800]},
             verbose=2)

In [20]:
grid_search.best_estimator_

RandomForestClassifier(max_depth=780, max_features='log2', min_samples_leaf=5,
                       min_samples_split=4, n_estimators=600)

In [21]:
best_grid=grid_search.best_estimator_

In [22]:
best_grid

RandomForestClassifier(max_depth=780, max_features='log2', min_samples_leaf=5,
                       min_samples_split=4, n_estimators=600)

In [23]:

y_pred=best_grid.predict(X_test)
print(confusion_matrix(y_test,y_pred))
print("Accuracy Score {}".format(accuracy_score(y_test,y_pred)))
print("Classification report: {}".format(classification_report(y_test,y_pred)))

[[97 10]
 [17 30]]
Accuracy Score 0.8246753246753247
Classification report:               precision    recall  f1-score   support

           0       0.85      0.91      0.88       107
           1       0.75      0.64      0.69        47

    accuracy                           0.82       154
   macro avg       0.80      0.77      0.78       154
weighted avg       0.82      0.82      0.82       154



## Automated Hyperparameter Tuning
Automated Hyperparameter Tuning can be done by using techniques such as

- Bayesian Optimization
- Gradient Descent
- Evolutionary Algorithms


#### Bayesian Optimization
Bayesian optimization uses probability to find the minimum of a function. The final aim is to find the input value to a function which can gives us the lowest possible output value.It usually performs better than random,grid and manual search providing better performance in the testing phase and reduced optimization time.  

#### HyperOpt

HyperOpt is an open-source Python library for Bayesian optimization developed by James Bergstra.
It is designed for large-scale optimization for models with hundreds of parameters and allows the optimization procedure to be scaled across multiple cores and multiple machines.

Hyperopt's job is to find the best value of a scalar-valued, possibly-stochastic function over a set of possible arguments to that function. Whereas many optimization packages will assume that these inputs are drawn from a vector space, Hyperopt is different in that it encourages you to describe your search space in more detail. By providing more information about where your function is defined, and where you think the best values are, you allow algorithms in hyperopt to search more efficiently.

<br>


The library was explicitly used to optimize machine learning pipelines, including data preparation, model selection, and model hyperparameters.



The way to use hyperopt is to describe:

- Objective Function = defines the loss function to minimize.
- Domain Space = defines the range of input values to test (in Bayesian Optimization this space creates a probability 
distribution for each of the used Hyperparameters).
- Optimization Algorithm = defines the search algorithm to use to select the best input values to use in each new iteration.

##### Algorithms

Currently three algorithms are implemented in hyperopt:

- Random Search
- Tree of Parzen Estimators (TPE)
- Adaptive TPE

Choosing the search algorithm is as simple as passing algo=hyperopt.tpe.suggest instead of algo=hyperopt.random.suggest.
<br> The search algorithms are actually callable objects, whose constructors accept configuration arguments, but that's about all there is to say about the mechanics of choosing a search algorithm.

In [26]:
#fmin is responsible for minimizing the function value
#trials is responible in minimizing the fucntion   by passing in a trials object directly, we can inspect all of the return values that were calculated during the experiment.
from hyperopt import hp,fmin,tpe,STATUS_OK,Trials



- hp.choice is used for selecting the values one by one from the given list
- hp.uniform is used for selecting the int values 
- hp.quniform is use for randomly selecting the given number of values in the given range
- in our case its selects any 10 numbers between 10 and 1200

In [27]:
#use hp.choice whenever you have selection mechanism like in the list
#use hp.quniform for selecting the integer values
#hp.uniform for selecting the floating numbers
#space is similar to the paramgrid we used in the case of grid search and random search
#we are now defining the hyperparameters for random forest algorithm in space
space = {'criterion': hp.choice('criterion', ['entropy', 'gini']),
        'max_depth': hp.quniform('max_depth', 10, 1200, 10),
        'max_features': hp.choice('max_features', ['auto', 'sqrt','log2', None]),
        'min_samples_leaf': hp.uniform('min_samples_leaf', 0, 0.5),
        'min_samples_split' : hp.uniform ('min_samples_split', 0, 1),
        'n_estimators' : hp.choice('n_estimators', [10, 50, 300, 750, 1200,1300,1500])
    }

In [28]:
space

{'criterion': <hyperopt.pyll.base.Apply at 0x1eb17e96f40>,
 'max_depth': <hyperopt.pyll.base.Apply at 0x1eb17ea40a0>,
 'max_features': <hyperopt.pyll.base.Apply at 0x1eb17ea41c0>,
 'min_samples_leaf': <hyperopt.pyll.base.Apply at 0x1eb17ea43a0>,
 'min_samples_split': <hyperopt.pyll.base.Apply at 0x1eb17ea44c0>,
 'n_estimators': <hyperopt.pyll.base.Apply at 0x1eb17ea45b0>}

- we have defined the Domain space no we have to define the objective function.
- we should also define the loss function.
- we have also defined the algorithm .

When the objective function returns a dictionary, the fmin function looks for some special key-value pairs in the return value, which it passes along to the optimization algorithm. There are two mandatory key-value pairs:

- status - one of the keys from hyperopt.STATUS_STRINGS, such as 'ok' for successful completion, and 'fail' in cases where the function turned out to be undefined.
- loss - the float-valued function value that you are trying to minimize, if the status is 'ok' then this has to be present.

In [29]:
 def objective(space):
    model = RandomForestClassifier(criterion = space['criterion'], max_depth = space['max_depth'],
                                 max_features = space['max_features'],
                                 min_samples_leaf = space['min_samples_leaf'],
                                 min_samples_split = space['min_samples_split'],
                                 n_estimators = space['n_estimators'], 
                                 )
    
    accuracy = cross_val_score(model, X_train, y_train, cv = 5).mean()

    # We aim to maximize accuracy, therefore we return it as a negative value
    return {'loss': -accuracy, 'status': STATUS_OK }

- A Trials() object is first created to make possible to visualize later what was going on while the fmin() function was running 

In [None]:
from sklearn.model_selection import cross_val_score
trials = Trials()
best = fmin(fn= objective,
            space= space,
            algo= tpe.suggest,
            max_evals = 80,
            trials= trials)
best

 72%|██████████████████████████████████             | 58/80 [12:03<07:48, 21.28s/trial, best loss: -0.7670798347327735]

In [None]:
#when you see the output criteria,features,estimator values are given there position values but not exact value.
#to get the exact value we are writing the below code
crit = {0: 'entropy', 1: 'gini'}
feat = {0: 'auto', 1: 'sqrt', 2: 'log2', 3: None}
est = {0: 10, 1: 50, 2: 300, 3: 750, 4: 1200,5:1300,6:1500}


print(crit[best['criterion']])
print(feat[best['max_features']])
print(est[best['n_estimators']])

In [None]:
best['min_samples_leaf']

### now by using the best values of hyperparameters we are training and testing the model

In [None]:
trainedforest = RandomForestClassifier(criterion = crit[best['criterion']], max_depth = best['max_depth'], 
                                       max_features = feat[best['max_features']], 
                                       min_samples_leaf = best['min_samples_leaf'], 
                                       min_samples_split = best['min_samples_split'], 
                                       n_estimators = est[best['n_estimators']]).fit(X_train,y_train)
predictionforest = trainedforest.predict(X_test)
print(confusion_matrix(y_test,predictionforest))
print(accuracy_score(y_test,predictionforest))
print(classification_report(y_test,predictionforest))
acc5 = accuracy_score(y_test,predictionforest)

### Genetic Algorithms
- https://towardsdatascience.com/hyperparameter-tuning-in-xgboost-using-genetic-algorithm-17bd2e581b17
- https://towardsdatascience.com/hyperparameters-optimization-526348bb8e2d

Genetic Algorithms tries to apply natural selection mechanisms to Machine Learning contexts.

Let's immagine we create a population of N Machine Learning models with some predifined Hyperparameters. We can then calculate the accuracy of each model and decide to keep just half of the models (the ones that performs best). We can now generate some offsprings having similar Hyperparameters to the ones of the best models so that go get again a population of N models. At this point we can again caltulate the accuracy of each model and repeate the cycle for a defined number of generations. In this way, just the best models will survive at the end of the process.

In [14]:
import numpy as np
from sklearn.model_selection import RandomizedSearchCV
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt','log2']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 1000,10)]
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10,14]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4,6,8]
# Create the random grid
param = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
              'criterion':['entropy','gini']}
print(param)

{'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000], 'max_features': ['auto', 'sqrt', 'log2'], 'max_depth': [10, 120, 230, 340, 450, 560, 670, 780, 890, 1000], 'min_samples_split': [2, 5, 10, 14], 'min_samples_leaf': [1, 2, 4, 6, 8], 'criterion': ['entropy', 'gini']}


TPOT is meant to be an assistant that gives you ideas on how to solve a particular machine learning problem by exploring pipeline configurations that you might have never considered, then leaves the fine-tuning to more constrained parameter tuning techniques such as grid search.

TPOT has what its developers call a genetic search algorithm to find the best parameters and model ensembles. It could also be thought of as a natural selection or evolutionary algorithm. TPOT tries a pipeline, evaluates its performance, and randomly changes parts of the pipeline in search of better performing algorithms.
- https://epistasislab.github.io/tpot/

In [15]:
from tpot import TPOTClassifier


tpot_classifier = TPOTClassifier(generations= 5, population_size= 24, offspring_size= 12,
                                 verbosity= 2, early_stop= 12,
                                 config_dict={'sklearn.ensemble.RandomForestClassifier': param}, 
                                 cv = 10, scoring = 'accuracy')
tpot_classifier.fit(X_train,y_train)

HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=84.0, style=ProgressStyle(des…


Generation 1 - Current best internal CV score: 0.7621628767847699

Exception ignored in: <function WeakSet.__init__.<locals>._remove at 0x0000022D80767EE0>
Traceback (most recent call last):
  File "C:\Users\shkatta\Anaconda3\New folder\lib\_weakrefset.py", line 38, in _remove
    def _remove(item, selfref=ref(self)):
stopit.utils.TimeoutException: 



Generation 2 - Current best internal CV score: 0.7621628767847699
Generation 3 - Current best internal CV score: 0.7621893178212586
Generation 4 - Current best internal CV score: 0.7621893178212586
Generation 5 - Current best internal CV score: 0.7621893178212586
Best pipeline: RandomForestClassifier(CombineDFs(RandomForestClassifier(input_matrix, criterion=entropy, max_depth=890, max_features=auto, min_samples_leaf=1, min_samples_split=2, n_estimators=1200), input_matrix), criterion=gini, max_depth=670, max_features=auto, min_samples_leaf=2, min_samples_split=14, n_estimators=1000)


TPOTClassifier(config_dict={'sklearn.ensemble.RandomForestClassifier': {'criterion': ['entropy',
                                                                                      'gini'],
                                                                        'max_depth': [10,
                                                                                      120,
                                                                                      230,
                                                                                      340,
                                                                                      450,
                                                                                      560,
                                                                                      670,
                                                                                      780,
                                                                                 

In [17]:
#you will get the whole evaluated parameters with which the algorithm tried to use and find best results
tpot_classifier.evaluated_individuals_

{'RandomForestClassifier(input_matrix, RandomForestClassifier__criterion=gini, RandomForestClassifier__max_depth=10, RandomForestClassifier__max_features=log2, RandomForestClassifier__min_samples_leaf=4, RandomForestClassifier__min_samples_split=2, RandomForestClassifier__n_estimators=1000)': {'generation': 0,
  'mutation_count': 0,
  'crossover_count': 0,
  'predecessor': ('ROOT',),
  'operator_count': 1,
  'internal_cv_score': 0.7507932310946589},
 'RandomForestClassifier(RandomForestClassifier(input_matrix, RandomForestClassifier__criterion=gini, RandomForestClassifier__max_depth=120, RandomForestClassifier__max_features=log2, RandomForestClassifier__min_samples_leaf=6, RandomForestClassifier__min_samples_split=10, RandomForestClassifier__n_estimators=1200), RandomForestClassifier__criterion=gini, RandomForestClassifier__max_depth=230, RandomForestClassifier__max_features=log2, RandomForestClassifier__min_samples_leaf=4, RandomForestClassifier__min_samples_split=2, RandomForestClass

In [18]:
#best parameters used
tpot_classifier.get_params

<bound method BaseEstimator.get_params of TPOTClassifier(config_dict={'sklearn.ensemble.RandomForestClassifier': {'criterion': ['entropy',
                                                                                      'gini'],
                                                                        'max_depth': [10,
                                                                                      120,
                                                                                      230,
                                                                                      340,
                                                                                      450,
                                                                                      560,
                                                                                      670,
                                                                                      780,
                                       

In [16]:
accuracy = tpot_classifier.score(X_test, y_test)
print(accuracy)

0.8311688311688312


### Optimize hyperparameters of the model using Optuna
Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API. Thanks to our define-by-run API, the code written with Optuna enjoys high modularity, and the user of Optuna can dynamically construct the search spaces for the hyperparameters.

#### Key Features
Optuna has modern functionalities as follows:

- Lightweight, versatile, and platform agnostic architecture

   - Handle a wide variety of tasks with a simple installation that has few requirements.

- Pythonic search spaces

  - Define search spaces using familiar Python syntax including conditionals and loops.

- Efficient optimization algorithms

  - Adopt state-of-the-art algorithms for sampling hyper parameters and efficiently pruning unpromising trials.

- Easy parallelization

  - Scale studies to tens or hundreds or workers with little or no changes to the code.

- Quick visualization

  - Inspect optimization histories from a variety of plotting functions.
  
  
The hyperparameters of the above algorithm are n_estimators and max_depth for which we can try different values to see if the model accuracy can be improved. The objective function is modified to accept a trial object. This trial has several methods for sampling hyperparameters. We create a study to run the hyperparameter optimization and finally read the best hyperparameters.

In [19]:
import optuna
import sklearn.svm
def objective(trial):

    classifier = trial.suggest_categorical('classifier', ['RandomForest', 'SVC'])
    
    if classifier == 'RandomForest':
        n_estimators = trial.suggest_int('n_estimators', 200, 2000,10)
        max_depth = int(trial.suggest_float('max_depth', 10, 100, log=True))

        clf = sklearn.ensemble.RandomForestClassifier(
            n_estimators=n_estimators, max_depth=max_depth)
    else:
        c = trial.suggest_float('svc_c', 1e-10, 1e10, log=True)
        
        clf = sklearn.svm.SVC(C=c, gamma='auto')

    return sklearn.model_selection.cross_val_score(
        clf,X_train,y_train, n_jobs=-1, cv=3).mean()

In [20]:
#optuna.create_study creates a large space inorder to maximize the accuracy by finding the best parameters
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

trial = study.best_trial

print('Accuracy: {}'.format(trial.value))
print("Best hyperparameters: {}".format(trial.params))

[I 2020-09-18 14:22:03,279] A new study created in memory with name: no-name-e0176125-d4eb-43c3-83c1-95c18d0bb9d8
[I 2020-09-18 14:22:21,148] Trial 0 finished with value: 0.7426669854933844 and parameters: {'classifier': 'RandomForest', 'n_estimators': 1890, 'max_depth': 18.548999524499475}. Best is trial 0 with value: 0.7426669854933844.
[I 2020-09-18 14:22:26,757] Trial 1 finished with value: 0.640068547744301 and parameters: {'classifier': 'SVC', 'svc_c': 5.066906483334027}. Best is trial 0 with value: 0.7426669854933844.
[I 2020-09-18 14:22:39,244] Trial 2 finished with value: 0.7475370636059302 and parameters: {'classifier': 'RandomForest', 'n_estimators': 1620, 'max_depth': 13.34615561152007}. Best is trial 2 with value: 0.7475370636059302.
[I 2020-09-18 14:22:39,324] Trial 3 finished with value: 0.640068547744301 and parameters: {'classifier': 'SVC', 'svc_c': 0.00021834353382894025}. Best is trial 2 with value: 0.7475370636059302.
[I 2020-09-18 14:22:39,397] Trial 4 finished wit

[I 2020-09-18 14:24:28,967] Trial 36 finished with value: 0.7491790212019768 and parameters: {'classifier': 'RandomForest', 'n_estimators': 1280, 'max_depth': 78.15639528327945}. Best is trial 25 with value: 0.7556910569105691.
[I 2020-09-18 14:24:29,055] Trial 37 finished with value: 0.640068547744301 and parameters: {'classifier': 'SVC', 'svc_c': 1.2736472430789219e-10}. Best is trial 25 with value: 0.7556910569105691.
[I 2020-09-18 14:24:32,335] Trial 38 finished with value: 0.7426510441575004 and parameters: {'classifier': 'RandomForest', 'n_estimators': 780, 'max_depth': 62.45688693873906}. Best is trial 25 with value: 0.7556910569105691.
[I 2020-09-18 14:24:32,443] Trial 39 finished with value: 0.640068547744301 and parameters: {'classifier': 'SVC', 'svc_c': 2281.999017498122}. Best is trial 25 with value: 0.7556910569105691.
[I 2020-09-18 14:24:34,592] Trial 40 finished with value: 0.7491790212019768 and parameters: {'classifier': 'RandomForest', 'n_estimators': 580, 'max_depth'

[I 2020-09-18 14:27:17,762] Trial 72 finished with value: 0.7459269886816515 and parameters: {'classifier': 'RandomForest', 'n_estimators': 1890, 'max_depth': 23.55850100748148}. Best is trial 25 with value: 0.7556910569105691.
[I 2020-09-18 14:27:24,734] Trial 73 finished with value: 0.7459190180137095 and parameters: {'classifier': 'RandomForest', 'n_estimators': 1750, 'max_depth': 10.045651447044499}. Best is trial 25 with value: 0.7556910569105691.
[I 2020-09-18 14:27:31,063] Trial 74 finished with value: 0.7491790212019768 and parameters: {'classifier': 'RandomForest', 'n_estimators': 1560, 'max_depth': 11.808391775664196}. Best is trial 25 with value: 0.7556910569105691.
[I 2020-09-18 14:27:37,063] Trial 75 finished with value: 0.7556751155746851 and parameters: {'classifier': 'RandomForest', 'n_estimators': 1510, 'max_depth': 14.386020884472575}. Best is trial 25 with value: 0.7556910569105691.
[I 2020-09-18 14:27:40,648] Trial 76 finished with value: 0.7540411286465806 and para

Accuracy: 0.7556910569105691
Best hyperparameters: {'classifier': 'RandomForest', 'n_estimators': 800, 'max_depth': 65.4612702165372}


In [21]:
trial

FrozenTrial(number=25, value=0.7556910569105691, datetime_start=datetime.datetime(2020, 9, 18, 14, 23, 52, 278052), datetime_complete=datetime.datetime(2020, 9, 18, 14, 23, 55, 384042), params={'classifier': 'RandomForest', 'n_estimators': 800, 'max_depth': 65.4612702165372}, distributions={'classifier': CategoricalDistribution(choices=('RandomForest', 'SVC')), 'n_estimators': IntUniformDistribution(high=2000, low=200, step=10), 'max_depth': LogUniformDistribution(high=100, low=10)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=25, state=TrialState.COMPLETE)

In [22]:
study.best_params

{'classifier': 'RandomForest',
 'n_estimators': 800,
 'max_depth': 65.4612702165372}

In [23]:
rf=RandomForestClassifier(n_estimators=330,max_depth=30)
rf.fit(X_train,y_train)

RandomForestClassifier(max_depth=30, n_estimators=330)

In [24]:
y_pred=rf.predict(X_test)
print(confusion_matrix(y_test,y_pred))
print(accuracy_score(y_test,y_pred))
print(classification_report(y_test,y_pred))

[[94 13]
 [16 31]]
0.8116883116883117
              precision    recall  f1-score   support

           0       0.85      0.88      0.87       107
           1       0.70      0.66      0.68        47

    accuracy                           0.81       154
   macro avg       0.78      0.77      0.77       154
weighted avg       0.81      0.81      0.81       154



https://towardsdatascience.com/optuna-vs-hyperopt-which-hyperparameter-optimization-library-should-you-choose-ed8564618151
https://neptune.ai/blog/optuna-vs-hyperopt?utm_campaign=News&utm_medium=Community&utm_source=DataCamp.com#5