<h3 style="color:purple">Hyper Parameter Tuning</h3>

Hyperparameter tuning is the process of finding the best set of hyperparameters for a machine learning model that maximizes its performance on a specific task. Hyperparameters are settings for a model that are chosen before training and are not learned from the data, such as learning rate, regularization strength, number of hidden layers, etc. The choice of hyperparameters can have a significant impact on the performance of the model, and finding the optimal combination of hyperparameters can be a challenging task.

In [1]:
import pandas as pd
import numpy as np
import sklearn

<h5 style="color:purple">Digits Dataset sklearn</h5>

In [2]:
from sklearn.datasets import load_digits
digits=load_digits()

<h3 style="color:purple">Finding Best Model for Handwritten Digit Classification</h3>

In [3]:
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB
from sklearn.tree import DecisionTreeClassifier

In [5]:
from sklearn.model_selection import KFold
km=KFold(n_splits=10)

**making a dictionary of model**

In [6]:
model_params = {
    'svm': {
        'model': svm.SVC(),
        'params' : {
              'kernel': ['linear', 'poly', 'rbf'],
              'C': [1, 10,20],
              'gamma': [0.1, 1, 10],
              'degree': [2, 3, 4]
        }  
    },
    'random_forest': {
        'model': RandomForestClassifier(),
        'params' : {
              'n_estimators': [10, 50, 100],
              'max_depth': [None, 5, 10],
              'max_features': ['sqrt', 'log2'],
              'min_samples_split': [2, 5, 10],
              'min_samples_leaf': [1, 2, 4]}
    },
    'logistic_regression' : {
        'model': LogisticRegression(multi_class='auto'),
        'params': {
              'penalty': ['l1', 'l2'],
              'C': [0.001, 0.01, 0.1, 1, 10],
              'solver': ['liblinear', 'saga'],
              'max_iter': [100, 500, 1000]}
    },
    'GaussianNB':{
        'model': GaussianNB(),
        'params':{
        }
    },
    'MultinomialNB':{
        'model': MultinomialNB(),
        'params':{
            'alpha': [0.5, 1, 2]
            
        }
    },
    'DecisionTree':{
        'model': DecisionTreeClassifier(),
        'params':{
            'criterion': ['gini', 'entropy'],
            'max_depth': [2, 4, 6],
            'min_samples_split': [2, 5, 10],
            'min_samples_leaf': [1, 2, 4]
        }
    }
}

<h3 style="color:purple">GridSearchCV</h3>

Grid search involves defining a set of hyperparameter values and evaluating the model's performance for each possible combination of these values. It creates a grid of all possible hyperparameter combinations and trains and evaluates a model for each combination. Grid search can be computationally expensive, especially when the number of hyperparameters and their possible values are large.

<h3 style="color:purple">RandomizedSearchCV</h3>

Randomized search, on the other hand, involves randomly sampling hyperparameters from a defined search space and training and evaluating a model for each sampled combination. This approach is less computationally expensive compared to grid search since it does not evaluate all possible hyperparameter combinations. Instead, it randomly selects a defined number of combinations and evaluates them.

In [7]:
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

*here we used randomized because we use almost all parameter of all model so there will be lot of combination*

In [9]:
scores = []

for model_name, mp in model_params.items():
    clf =  RandomizedSearchCV(mp['model'], mp['params'], cv=5, return_train_score=False,n_iter=5)
    clf.fit(digits.data, digits.target)
    scores.append({
        'model': model_name,
        'best_score': clf.best_score_,
        'best_params': clf.best_params_
    })
    
df = pd.DataFrame(scores,columns=['model','best_score','best_params'])
df



Unnamed: 0,model,best_score,best_params
0,svm,0.947697,"{'kernel': 'linear', 'gamma': 1, 'degree': 2, ..."
1,random_forest,0.923784,"{'n_estimators': 100, 'min_samples_split': 5, ..."
2,logistic_regression,0.92823,"{'solver': 'liblinear', 'penalty': 'l2', 'max_..."
3,GaussianNB,0.806928,{}
4,MultinomialNB,0.871464,{'alpha': 2}
5,DecisionTree,0.719591,"{'min_samples_split': 2, 'min_samples_leaf': 1..."
