# **Fine Tuning Models**

In order to get the optimal solution, we need to fine tune our model with differnt values of the hyperparameters. This can be daunting task, fortunately *Scikit-Learn* provides with libraries which help us to do that. The general idea is to try to out multiple values *(either from a given set of values or from a range of values)* and compare the scores for all those values, then choose the one which has the best score out of all.

I will discuss 2 main concepts for fine tuning your model:
- ## `GridSearchCV`
- ## `RandomizedSearchCV`

---

## **Preparing Data**

## Fetch the data

We use `fetch_openml` to get the *MNIST* dataset.

In [1]:
import numpy as np
from sklearn.datasets import fetch_openml

mnist = fetch_openml(name='mnist_784', 
                     version=1)

X = mnist['data']
y = mnist['target'].astype(np.uint8)

## Splitting the dataset into *Training* and *Test* dataset
By default *MNIST* dataset is shuffled into training and test dataset and arranged. First 60000 rows are the *Training* set and remaing are *Test* set.

In [10]:
X_train = X[:60000]
X_test = X[60000:]
y_train = y[:60000]
y_test = y[60000:]

X_train_subset = X_train[:1000]
y_train_subset = y_train[:1000]

## **Base line model**

We will use a *Support Vector Machine* for our practice, but you can use any model you prefer. To be more specific, we will be using `SVC` with different kernels.

## *Training on a subset of the training set and checking the performance*

In [3]:
from sklearn.svm import SVC

#### Training the model :

In [7]:
svm_clf = SVC()
svm_clf.fit(X_train_subset, y_train_subset)

SVC()

#### Measuring Performance :

Now let's look at the performance of this model using `accuracy_score`

In [11]:
from sklearn.metrics import accuracy_score

y_pred = svm_clf.predict(X_train_subset)
acc_scr_baseline = accuracy_score(y_train_subset, y_pred)
print(f'Baseline Accuracy Score: {round(acc_scr_baseline*100, 2)}%')

Baseline Accuracy Score: 98.2%


As we can see that even without any fine tuning the general model performance pretty well with an accuracy score of 98.2%. Keep in mind we haven't trained the model with the entire set. Let's try to do that before we proceed ahead and see if the performance does drop and we really need to fine tune the model. If not, we can use another model for our practice. 

## *Training on the complete set and checking the performance*

#### Training the model :

In [12]:
svm_clf = SVC()
svm_clf.fit(X_train, y_train)

SVC()

#### Measuring Performance :

In [13]:
y_pred = svm_clf.predict(X_train)
acc_scr_baseline_complete = accuracy_score(y_train, y_pred)
print(f'Baseline Accuracy Score (Complete): {round(acc_scr_baseline_complete*100, 2)}%')

Baseline Accuracy Score (Complete): 98.99%


***Please Note:** This process may take pretty long since we are using the entire training set instead of the subset*

~As we can see, the accuracy score does drop. So we can fine tune the model on the subset *(in order to save time)* and then use the tweaked model to see if gained any performance boost on the entire training set.~

The accuracy score infact increases when we provide more training data to the model. We can still fine tune the model but for this exercise let's look at a model where we can see a significant performance boost after the fine tuning.

## **Trying out Linear Model**

## Implementing `LinearSVC` Model and checking its performance

Importing Libraries

In [3]:
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score

In [3]:
lin_svc = LinearSVC()
lin_svc.fit(X_train, y_train)

y_pred = lin_svc.predict(X_train)
acc_scr_lin_svc = accuracy_score(y_train, y_pred)
print(f'LinearSVC Accuracy Score: {round(acc_scr_lin_svc*100, 2)}%')



LinearSVC Accuracy Score: 86.81%


The baseline perfomance measure is not that bad, but bad enough that we can see improvement when we fine tune the model. So we will proceed with `LinearSVC` model.

***Please Note:** The purpose of this notebook is not to find the most optimal solution, but to go throught the exercise of fine tuning the model*

## Before we proceed to the fine-tuning, let's quickly create a pipeline with pre-processing of data

In [5]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

clf = Pipeline([
    ('scaler', StandardScaler()),
    ('lin_svc', LinearSVC())
])

## **Grid Search CV**
We will be using `GridSearchCV` from *Scikit-Learn* package for this

*[Link to the documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)*

## *The idea behind `GridSearchCV` is :*
- We provide a list of hyperparameters *(`param_distribution`)* for the model to go through
- The model then goes through all the combinations and generates a score for each combination *(each dictionary)*
- It then selects the combination which has the highest score

## *Understanding the hyperparameters first*

In order to create a `param_distribution`, we need to first understand the various hyperparameters of the model

#### `LinearSVC` hyperparameters:

Quickly looking at the [document](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html) for `LinearSVC` we can see that it provides quite a lot of hyperparameters, but let's focus on some of them here:

- penalty: {'l1', 'l2'}
- tol: float, default=1e-4
- C: float, default=1.0

## *Creating the `param_distribution` for our `GridSearchCV`*

A `param_distribution` is of dictionary type or a list of dictionaries. Let's start with individual dictionaries and then we can create a list of them

In order to get the key for your dictionary, we can use `get_params().keys()` on our classifier

In [6]:
clf.get_params().keys()

dict_keys(['memory', 'steps', 'verbose', 'scaler', 'lin_svc', 'scaler__copy', 'scaler__with_mean', 'scaler__with_std', 'lin_svc__C', 'lin_svc__class_weight', 'lin_svc__dual', 'lin_svc__fit_intercept', 'lin_svc__intercept_scaling', 'lin_svc__loss', 'lin_svc__max_iter', 'lin_svc__multi_class', 'lin_svc__penalty', 'lin_svc__random_state', 'lin_svc__tol', 'lin_svc__verbose'])

In [7]:
param_distribution = [
    {
        'lin_svc__C': [1, 10, 50],
        'lin_svc__penalty': ['l2'],
        'lin_svc__tol': [1e-2, 1e-3, 1e-4, 1e-5]
    }
]

## **Implementing `GridSearchCV` :**

In [8]:
from sklearn.model_selection import GridSearchCV

grid_search = GridSearchCV(estimator=clf,
                          param_grid=param_distribution,
                           verbose=2,
                          cv=3)

Implementing `GridSearchCV` on the subset of the training set, to reduce the time

In [9]:
X_train_subset = X_train[:1000]
y_train_subset = y_train[:1000]

grid_search.fit(X_train_subset, y_train_subset)

Fitting 3 folds for each of 12 candidates, totalling 36 fits
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.4s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.4s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.4s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.7s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.6s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.5s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=0.0001; total time=   1.0s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=0.0001; total time=   0.9s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=0.0001; total time=   0.7s




[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.5s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.2s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.0s
[CV] END lin_svc__C=10, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.4s
[CV] END lin_svc__C=10, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.4s
[CV] END lin_svc__C=10, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.3s
[CV] END lin_svc__C=10, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.7s
[CV] END lin_svc__C=10, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.7s
[CV] END lin_svc__C=10, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.5s
[CV] END lin_svc__C=10, lin_svc__penalty=l2, lin_svc__tol=0.0001; total time=   1.0s
[CV] END lin_svc__C=10, lin_svc__penalty=l2, lin_svc__tol=0.0001; total time=   0.9s
[CV] END lin_svc__C=10, lin_svc__penalty=l2, lin_svc__tol=0.0001; total time=   



[CV] END lin_svc__C=10, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.4s
[CV] END lin_svc__C=10, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.4s
[CV] END lin_svc__C=10, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.0s
[CV] END lin_svc__C=50, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.4s
[CV] END lin_svc__C=50, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.3s
[CV] END lin_svc__C=50, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.3s
[CV] END lin_svc__C=50, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.7s
[CV] END lin_svc__C=50, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.6s
[CV] END lin_svc__C=50, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.5s
[CV] END lin_svc__C=50, lin_svc__penalty=l2, lin_svc__tol=0.0001; total time=   1.0s
[CV] END lin_svc__C=50, lin_svc__penalty=l2, lin_svc__tol=0.0001; total time=   1.0s
[CV] END lin_svc__C=50, lin_svc__penalty=l2, lin_svc__tol=0.0001; total time=



[CV] END lin_svc__C=50, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.6s
[CV] END lin_svc__C=50, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.3s
[CV] END lin_svc__C=50, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.1s


GridSearchCV(cv=3,
             estimator=Pipeline(steps=[('scaler', StandardScaler()),
                                       ('lin_svc', LinearSVC())]),
             param_grid=[{'lin_svc__C': [1, 10, 50], 'lin_svc__penalty': ['l2'],
                          'lin_svc__tol': [0.01, 0.001, 0.0001, 1e-05]}],
             verbose=2)

In [10]:
grid_search.best_estimator_

Pipeline(steps=[('scaler', StandardScaler()),
                ('lin_svc', LinearSVC(C=1, tol=0.001))])

Using the best estimator, we will train the trainind data and see its performance

In [11]:
grid_search.best_estimator_.fit(X_train, y_train)



Pipeline(steps=[('scaler', StandardScaler()),
                ('lin_svc', LinearSVC(C=1, tol=0.001))])

In [12]:
y_pred = grid_search.best_estimator_.predict(X_train)
acc_scr_lin_svc_best = accuracy_score(y_train, y_pred)
print(f'Best LinearSVC Accuracy Score: {round(acc_scr_lin_svc_best*100, 2)}%')

Best LinearSVC Accuracy Score: 92.08%


As we can see that we were able to increase the accuracy score to 92.08% just playing tweaking some parameters. We could apply more options or tweak other parameters as well.

## **Ramdomized Search CV**
We will be using `RandomizedSearchCV` from *Scikit-Learn* package for this

*[Link to the documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV)*

Similar to `GridSearchCV` we provide the param distribution and scores are calculated for the possible combinations. The only difference is that in `GridSearchCV` we define the combinations and do training of the model whereas in RandomizedSearchCV the model selects the combinations randomly.

## *Initializing the model*

In [4]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

clf = Pipeline([
    ('scaler', StandardScaler()),
    ('lin_svc', LinearSVC())
])

## *Generating Parameter Distribution*

In [7]:
clf.get_params().keys()

dict_keys(['memory', 'steps', 'verbose', 'scaler', 'lin_svc', 'scaler__copy', 'scaler__with_mean', 'scaler__with_std', 'lin_svc__C', 'lin_svc__class_weight', 'lin_svc__dual', 'lin_svc__fit_intercept', 'lin_svc__intercept_scaling', 'lin_svc__loss', 'lin_svc__max_iter', 'lin_svc__multi_class', 'lin_svc__penalty', 'lin_svc__random_state', 'lin_svc__tol', 'lin_svc__verbose'])

In [12]:
param_distribution = [
    {
        'lin_svc__C': [x for x in range(1, 10)],
        'lin_svc__penalty': ['l2'],
        'lin_svc__tol': [1e-2, 1e-3, 1e-4, 1e-5]
    }
]

In [13]:
from sklearn.model_selection import RandomizedSearchCV

random_search = RandomizedSearchCV(estimator=clf,
                                  param_distributions=param_distribution,
                                  cv=3,
                                  verbose=2)

## *Fitting the Random Search model to the subset*

In [14]:
random_search.fit(X_train_subset, y_train_subset)

Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV] END lin_svc__C=6, lin_svc__penalty=l2, lin_svc__tol=0.0001; total time=   1.1s
[CV] END lin_svc__C=6, lin_svc__penalty=l2, lin_svc__tol=0.0001; total time=   1.0s
[CV] END lin_svc__C=6, lin_svc__penalty=l2, lin_svc__tol=0.0001; total time=   0.8s




[CV] END lin_svc__C=4, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.6s
[CV] END lin_svc__C=4, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.3s
[CV] END lin_svc__C=4, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.1s
[CV] END lin_svc__C=2, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.7s
[CV] END lin_svc__C=2, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.7s
[CV] END lin_svc__C=2, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.6s
[CV] END lin_svc__C=9, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.4s
[CV] END lin_svc__C=9, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.4s
[CV] END lin_svc__C=9, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.4s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.4s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.3s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.3s




[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.6s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.2s
[CV] END lin_svc__C=1, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.1s
[CV] END lin_svc__C=4, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.7s
[CV] END lin_svc__C=4, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.7s
[CV] END lin_svc__C=4, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.5s
[CV] END lin_svc__C=6, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.7s
[CV] END lin_svc__C=6, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.6s
[CV] END lin_svc__C=6, lin_svc__penalty=l2, lin_svc__tol=0.001; total time=   0.5s
[CV] END lin_svc__C=5, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.4s
[CV] END lin_svc__C=5, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.4s
[CV] END lin_svc__C=5, lin_svc__penalty=l2, lin_svc__tol=0.01; total time=   0.3s




[CV] END lin_svc__C=2, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.5s
[CV] END lin_svc__C=2, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.3s
[CV] END lin_svc__C=2, lin_svc__penalty=l2, lin_svc__tol=1e-05; total time=   1.1s




RandomizedSearchCV(cv=3,
                   estimator=Pipeline(steps=[('scaler', StandardScaler()),
                                             ('lin_svc', LinearSVC())]),
                   param_distributions=[{'lin_svc__C': [1, 2, 3, 4, 5, 6, 7, 8,
                                                        9],
                                         'lin_svc__penalty': ['l2'],
                                         'lin_svc__tol': [0.01, 0.001, 0.0001,
                                                          1e-05]}],
                   verbose=2)

In [15]:
random_search.best_estimator_

Pipeline(steps=[('scaler', StandardScaler()),
                ('lin_svc', LinearSVC(C=1, tol=1e-05))])

Fitting the best estimator to the entire training set

## *Training on the entire set and checking performance*

In [16]:
random_search.best_estimator_.fit(X_train, y_train)



Pipeline(steps=[('scaler', StandardScaler()),
                ('lin_svc', LinearSVC(C=1, tol=1e-05))])

In [17]:
y_pred = random_search.best_estimator_.predict(X_train)
rnd_src_acc_src = accuracy_score(y_train, y_pred)
print(f'Rnd Src Accuracy Score: {round(rnd_src_acc_src*100, 2)}%')

Rnd Src Accuracy Score: 92.11%


As we can see that the *Accuracy Score* increased to 92.11% after the fine tuning

---

## **Conclusion**

Fine tuning a model is an important step in your Machine Learning process. Before we fine tune a model, we should narrow down to few models first using the performance metrics and then fine tune them in order to gain a performance boost