<a href="https://colab.research.google.com/github/showrin20/Machine-Learning-Learning-Path/blob/main/SVMExercise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## SVM Problem

This exercise explores optimization of an SVM applied to a classification problem, the MINST digits.

In [17]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import os

In [18]:
%%html
<style type='text/css'>.CodeMirror{
font-size: 18px;
</style>

In [19]:
from sklearn.svm import LinearSVC
from sklearn.svm import SVC

### Fetch the data
The following code gets all of the data.  Set the train portion to the first 20000 and the test portion to the next 10000.

In [20]:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)

X = mnist["data"]
y = mnist["target"].astype(np.uint8)

X_train = X[:20000]
X_test = X[20000:]
y_train = y[:20000]
y_test = y[20000:]

  warn(


### A first attempt

Use LinearSVC (Linear Support Vector Classification.) to fit the data.  This is similar to SVC(kernel="linear").  It tends to work better with larger data sets and uses a different solver library.

In [21]:
lin_clf = LinearSVC(random_state=1)
lin_clf.fit(X_train, y_train)



Measure the accuracy of a prediction on your training data.  Your not done tweaking the model, so don't yet evaluate your testing data.

In [22]:
from sklearn.metrics import accuracy_score

y_pred = lin_clf.predict(X_train)
accuracy_score(y_train, y_pred)


0.9125

### Scale your data using StandardScaler.

Be sure to scale both the train and test data.  Then rerun and evaluate the same linear model as above.  Keep random_state the same!

In [23]:
from sklearn.preprocessing import StandardScaler
#TODO
sc = StandardScaler()
X_train_scaled=sc.fit_transform(X_train)
X_test_scaled=sc.fit_transform(X_test)

In [24]:
lin_clf = LinearSVC(random_state=1)
lin_clf.fit(X_train_scaled, y_train)



In [25]:
y_pred = lin_clf.predict(X_train_scaled)
accuracy_score(y_train, y_pred)

0.945

### Did scaling matter much?

### Try fitting a non-linear SVM

Using only the first 1000 items in your scaled training data, try a generic SVM (Scikit's SVC).  You can use all the defaults here.  Check your accuracy using *all* of the scaled training data.

In [26]:
svm_clf = SVC(gamma="scale")
svm_clf.fit(X_train_scaled[:1000], y_train[:1000])

In [27]:
y_pred = svm_clf.predict(X_train_scaled)
accuracy_score(y_train, y_pred)

0.8673

### Results?

How did well did you do?

You probably want to do better.  The right approach is to search in the parameter space of
* C, the regularizer.  With larger values it favors harder classification.
* gamma, a kernel coefficient for the kernel type chosen
* the kernel: rbf, polynomial, linear, sigmoid

To set up a grid search examine the GridSearchCV documentation.   You will want to start with a grid with widely spaced values (3-4 of them).  When you find an optimal value, refine the grid and search again.

Continue training on only 1000 data points, just to speed things up.

In [42]:
from sklearn.model_selection import GridSearchCV

param_grid = [{ 'C': [1, 10, 100],
               'gamma':[.001,.002],
                'kernel': ["rbf", "polynomial", "linear", "sigmoid"]}]
grid_cv = GridSearchCV(estimator=svm_clf,param_grid=param_grid,verbose=2)
grid_cv.fit(X_train_scaled[:1000], y_train[:1000])

Fitting 5 folds for each of 24 candidates, totalling 120 fits
[CV] END .......................C=1, gamma=0.001, kernel=rbf; total time=   0.3s
[CV] END .......................C=1, gamma=0.001, kernel=rbf; total time=   0.3s
[CV] END .......................C=1, gamma=0.001, kernel=rbf; total time=   0.3s
[CV] END .......................C=1, gamma=0.001, kernel=rbf; total time=   0.3s
[CV] END .......................C=1, gamma=0.001, kernel=rbf; total time=   0.3s
[CV] END ................C=1, gamma=0.001, kernel=polynomial; total time=   0.0s
[CV] END ................C=1, gamma=0.001, kernel=polynomial; total time=   0.0s
[CV] END ................C=1, gamma=0.001, kernel=polynomial; total time=   0.0s
[CV] END ................C=1, gamma=0.001, kernel=polynomial; total time=   0.0s
[CV] END ................C=1, gamma=0.001, kernel=polynomial; total time=   0.0s
[CV] END ....................C=1, gamma=0.001, kernel=linear; total time=   0.1s
[CV] END ....................C=1, gamma=0.001, 

30 fits failed out of a total of 120.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
30 fits failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/svm/_base.py", line 180, in fit
    self._validate_params()
  File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 600, in _validate_params
    validate_parameter_constraints(
  File "/usr/local/lib/python3.10/dist-packages/sklearn/utils/_param_validation.py", line 97, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.u

### Results of grid search

You can examine the best values, and best score.

In [43]:
grid_cv.best_estimator_

In [44]:
grid_cv.best_score_

0.868

### Final results

Using the best estimator from your series of grid searches, train on all the scaled data.

Now test your final model on the testing data.

In [45]:
grid_cv.best_estimator_.fit(X_train_scaled, y_train)

In [46]:
y_pred = grid_cv.best_estimator_.predict(X_train_scaled)
accuracy_score(y_train, y_pred)

0.9987