<a href="https://colab.research.google.com/github/khojwar/Machine-Learning-and-Deep-Learning/blob/main/Machine%20Learning%20with%20tensorflow/HyperParameter_SVM_(one_vs_all_classification).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

SVM’s is classification tool. It uses one vs all strategy where we calculate probabilities or classification of one class and then put it against rest of classes instead of just finding this is apple, this is orange etc we go with this is not apple, this is apple, this is not apple and so on.

### About the Dataset

MNIST dataset containing numerical letters from 0 to 9.

Using one vs all strategy we **first** find, what is 1 and not 1, what is 2 and not 2 etc. and **then** use it to guess the letters we provide as a test.

In [None]:
import numpy as np

from sklearn.datasets import fetch_openml     # provides easy access to datasets from the OpenML repository.
mnist = fetch_openml('mnist_784', version=1, cache=True)

X = mnist["data"]
y = mnist["target"].astype(np.uint8)

  warn(


In [None]:
X_train = X[:60000]
y_train = y[:60000]
X_test = X[60000:]
y_test = y[60000:]

In [None]:
from sklearn.svm import LinearSVC

In [None]:
lin_clf = LinearSVC(random_state=42)    # Create an instance of LinearSVC with a specified random_state for reproducibility (ensuring that the results are consistent across runs).
lin_clf.fit(X_train, y_train)   # Train the LinearSVC model using the training data



In [None]:
from sklearn.metrics import accuracy_score

y_pred = lin_clf.predict(X_train)
accuracy_score(y_train, y_pred)

0.8348666666666666

The accuracy score comes out to 83.48  which is pretty bad , let’s try and scale the training dataset to see if any improvements exist -

**Standardization (or z-score normalization)** is a common preprocessing technique used to *transform numerical features into a standard scale, where the mean is 0 and the standard deviation is 1*. It is particularly important when working with machine learning models that rely on distance-based calculations or optimization algorithms.

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train.astype(np.float32))   # Convert the data to float32 and then apply standardization to X_train
X_test_scaled = scaler.fit_transform(X_test.astype(np.float32))

lin_clf = LinearSVC(random_state=42)
lin_clf.fit(X_train_scaled, y_train)

y_pred = lin_clf.predict(X_train_scaled)
accuracy_score(y_train, y_pred)



0.9214

The accuracy score comes out to 92.10 which is better than before but still not great.

#### Can we do more?

YES

We can use `kernels`

In SVM:
Kernels are a way in ML to add more flexibility to the algorithm by adding the polynomial degree of the dataset without increasing the features

In CNN:
**Kernel size** is filter size, refers to the dimensions of the sliding window over the input. Choosing this hyperparameter has a massive impact on the image classification task.

For example,
* `small kernel sizes` are able to extract a much larger amount of information containing highly local features from the input.
* Conversely, a `large kernel size` extracts less information, which leads to a faster reduction in layer dimensions, often leading to worse performance. Large kernels are better suited to extract features that are larger.

In the context of `SVC(gamma='scale')`, gamma is one of the hyperparameters of the SVC model. The gamma parameter *controls the influence of individual training samples* on the decision boundary. A **small gamma** value will result in a more extended decision boundary, while a **large gamma** value will make the decision boundary more tightly fit around the data points.

Setting `gamma='scale'` means that the *gamma parameter will be automatically calculated based on the inverse of the number of features in the training data*. Specifically, `gamma='scale'` is equivalent to `gamma=1 / (n_features * X.var())` where X is the input feature matrix of the training data. This scaling is useful when features have different units or scales because ***it normalizes the influence of individual features on the decision boundary.***

In [None]:
from sklearn.svm import SVC

svm_clf = SVC(gamma='scale')    # Create the SVC model with gamma='scale' . The "gamma" parameter controls the influence of individual training samples on the decision boundary.
svm_clf.fit(X_train_scaled[:10000], y_train[:10000])    # We use an SVC with an RBF kernel

y_pred = svm_clf.predict(X_train_scaled)
accuracy_score(y_train, y_pred)


0.9455333333333333

The accuracy score comes out to be 94.5 which is much better now.

Notice how we’ve only train 1/6th of actual dataset thats because the performance cost of this operation is a lot and there are a lot of hyper parameters to tune, since this can work for us let’s do hyperparameter tuning.

### What is hyperparameter tuning ?

Hyperparameters are properties of the algorithm that help classify or regress the dataset when you increase of decrease them for ex.

`lin_clf = LinearSVC(random_state=42)`

here `random_state=42` is a hyperparameter that helps keep the seed state set as 42 which helps the algorithm to pick similar random instances which helps in giving `accuracy scores` for same instances.

Similarly, each hyperparameter is a property and has it’s own function.

There is a technique called cross validation where we use small sets of dataset and check different values of hyperparameters on these small datasets and repeats this exercise for multiple times on multiple small sets. Then you can find the best values of each hyperparameter.

The usage of multiple small sets is called `cross val score` and the technique of using random hyperparameter values is called `randomized search`.


`RandomizedSearchCV` library used for hyperparameter tuning or optimization.

When training machine learning models, there are certain hyperparameters (like C, gamma, kernel, etc. for SVM) that cannot be learned from the data and need to be set before training the model. Tuning these hyperparameters can significantly impact the model's performance.

Let me demonstrate this using code —

`cv=3` -->  3-fold cross-validation.

`verbose=0` for no output, `verbose=1` for limited output, and `verbose=2` for more detailed output during the search process.

In [None]:
from sklearn.model_selection import RandomizedSearchCV      # used for hyperparameter tuning or optimization
from scipy.stats import reciprocal, uniform

# Define the hyperparameter grid to sample from. The "keys" of the dictionary are the names of the hyperparameters, and the "values" are the corresponding probability distributions from which to sample.
param_distributions = {
    "gamma": reciprocal(0.001, 0.1),
    "C": uniform(1, 10)
    }

#Adding all values of hyperparameters in a list from which the values of hyperparameter will randomly inserted as hyperparameter

rnd_rearch_cv = RandomizedSearchCV(svm_clf, param_distributions, n_iter=10, verbose=2, cv=3)    # Create the RandomizedSearchCV object
rnd_rearch_cv.fit(X_train_scaled[:10000], y_train[:10000])    # Fit the RandomizedSearchCV on the training data



Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV] END ....C=8.670950225937865, gamma=0.014210408187382057; total time=  49.9s
[CV] END ....C=8.670950225937865, gamma=0.014210408187382057; total time=  50.1s
[CV] END ....C=8.670950225937865, gamma=0.014210408187382057; total time=  50.7s
[CV] END ......C=7.108003192271348, gamma=0.0815767966228506; total time=  54.2s
[CV] END ......C=7.108003192271348, gamma=0.0815767966228506; total time=  57.9s
[CV] END ......C=7.108003192271348, gamma=0.0815767966228506; total time=  54.8s
[CV] END .....C=5.107080238084402, gamma=0.09789490862822041; total time=  54.4s
[CV] END .....C=5.107080238084402, gamma=0.09789490862822041; total time=  54.0s
[CV] END .....C=5.107080238084402, gamma=0.09789490862822041; total time=  55.0s
[CV] END ....C=9.175079172117236, gamma=0.021927613133928374; total time=  51.4s
[CV] END ....C=9.175079172117236, gamma=0.021927613133928374; total time=  52.1s
[CV] END ....C=9.175079172117236, gamma=0.021927

In [None]:
# Get the best hyperparameters and the corresponding model
rnd_rearch_cv.best_estimator_
rnd_rearch_cv.best_score_

0.9391997988041156

In [None]:
rnd_rearch_cv.best_estimator_.fit(X_train_scaled, y_train)

`rmse` (*Root Mean Squared Error*) value ***measures how well the model's predictions*** match the actual target values in the training data. A lower RMSE value indicates better performance of the model.

In [None]:
from sklearn.metrics import mean_squared_error

y_pred = rnd_rearch_cv.best_estimator_.predict(X_train_scaled)    # obtained the best estimator from RandomizedSearchCV as rnd_rearch_cv.best_estimator_ and make predictions on the scaled training data
mse = mean_squared_error(y_train, y_pred)   # Calculate the Mean Squared Error (MSE) between the actual and predicted values.
np.sqrt(mse)    # Calculate the Root Mean Squared Error (RMSE)

0.20424658299875342

In [None]:
y_pred = rnd_rearch_cv.best_estimator_.predict(X_test_scaled)   # obtained the best estimator from RandomizedSearchCV as rnd_rearch_cv.best_estimator_ and make predictions on the scaled testing data
mse = mean_squared_error(y_test, y_pred)    # Calculate the Mean Squared Error (MSE) between the actual and predicted values.
np.sqrt(mse)    # Calculate the Root Mean Squared Error (RMSE)

0.6835934464285041

In [None]:
y_pred = rnd_rearch_cv.best_estimator_.predict(X_test_scaled)
accuracy_score(y_test, y_pred)

0.9719

My accuracy score came out to be 97.2 which is not excellent but it’s good enough and the algorithm isn’t overfitting.

Also, note that we increased accuracy score from 89.5 to 97 which is the real victory here.

We first scaled the input’s and then tuned the hyperparameters.We must note that training 60,000 data point’s isn’t easy and might take a lot of time, so be patient.