# Support Vector Machines with Grid Search for Hyperparameter Tuning

I'll be implementing SVM across a range of use-cases as outlined below.

Goals:
1. **Binary Classification** : Training a SVM Linear Classifier, `LinearSVC` on a linerly seperable dataset *(e.g. `IRIS Dataset , class =2`)*. Implementing `SVC` and `SGDClassifier` on the same dataset
2. **Muticlass Classification via OvO** :Training an SVM classifier on `MNIST` database *(note that SVM is strictly a binary classifier, cannot do multi-class, off the shelf)*
3. **Regression** : Training an SVM Rgressor on the `California Housing Prices` data-set


***

# 1. Binary Classification on `IRIS` Dataset

### Getting the Data
**Iris Setosa** and **Iris Versicolor** classes are linearly separable

In [1]:
from sklearn import datasets

iris = datasets.load_iris()
X = iris["data"][:, (2, 3)]  # petal length, petal width
y = iris["target"]

setosa_or_versicolor = (y == 0) | (y == 1)
X = X[setosa_or_versicolor]
y = y[setosa_or_versicolor]

### Training Models

In [2]:
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier

m = y.shape[0]
C = 1

std_scaler = StandardScaler()

X_scaled = std_scaler.fit_transform(X)

linear_svc_model=LinearSVC(C = C, loss = "hinge", random_state = 42)

svc_model = SVC(kernel = "linear", C=C)

svc_sgd_model = SGDClassifier(random_state = 42, loss = "hinge", alpha = 1/(m*C),learning_rate = "constant", eta0 = 0.001, max_iter = 1000,tol = 1e-3 )

linear_svc_model.fit(X_scaled,y)
svc_model.fit(X_scaled,y)
svc_sgd_model.fit(X_scaled,y)


SGDClassifier(alpha=0.01, average=False, class_weight=None,
       early_stopping=False, epsilon=0.1, eta0=0.001, fit_intercept=True,
       l1_ratio=0.15, learning_rate='constant', loss='hinge',
       max_iter=1000, n_iter=None, n_iter_no_change=5, n_jobs=None,
       penalty='l2', power_t=0.5, random_state=42, shuffle=True, tol=0.001,
       validation_fraction=0.1, verbose=0, warm_start=False)

In [3]:
Y_linear_svc = linear_svc_model.predict(X_scaled)
Y_svc = svc_model.predict(X_scaled)
Y_svc_sgd = svc_sgd_model.predict(X_scaled)

In [4]:
def model_performance_scores(train,predicted):
    precision_kpi = precision_score(train,predicted)
    recall_kpi = recall_score(train, predicted)
    f1_kpi = f1_score(train,predicted)
    
    print("model performance scores: ","\n")
    print("Precision = ", precision_kpi)
    print("Recall = ", recall_kpi)
    print("F1 = ", f1_kpi)
    
    print("\n")



In [7]:
# A quick look at the predictions
print(Y_linear_svc[48:53])
print(Y_svc[48:53])
print(Y_svc_sgd[48:53])

from sklearn.metrics import accuracy_score

#evalating classification accuracy
print("Linear SVC",accuracy_score(y,Y_linear_svc))
print("SVC",accuracy_score(y,Y_svc))
print("Linear SVC",accuracy_score(y,Y_svc_sgd))

[0 0 1 1 1]
[0 0 1 1 1]
[0 0 1 1 1]
Linear SVC 1.0
SVC 1.0
Linear SVC 1.0


The classification accuracy of 1 is not surpring given that the classes are linearly seperable.

***

# 2. Multi-class classification via OvO
I'll train the models using a variety of approaches. I'll tune the hyperparameters using `Randomized Search` as performing `Grid Search` will be computationally expensive.

### Getting Data


In [8]:
from sklearn.datasets import fetch_openml
import numpy as np
mnist = fetch_openml('mnist_784', version=1, cache=True)

X = mnist["data"]
y = mnist["target"].astype(np.uint8)

X_train = X[:60000]
y_train = y[:60000]
X_test = X[60000:]
y_test = y[60000:]

std_scaler = StandardScaler()

In [9]:
X_train_s = std_scaler.fit_transform(X_train)
X_test_s = std_scaler.fit_transform(X_test)

### Training Models

Starting off with a Linear SVC model first which does OvA (so, no need to do anything)

In [10]:
m = X_train.shape[0]
C = 1

linear_SVC_model = LinearSVC(random_state=42,C=C)
linear_SVC_model.fit(X_train_s[0:5000],y_train[0:5000])

Y_train_linear_SVC = linear_SVC_model.predict(X_train_s)
Y_test_linear_SVC = linear_SVC_model.predict(X_test_s)



In [11]:

print("Training Set Accuracy")
print( accuracy_score(y_train,Y_train_linear_SVC) )
print("\n")

print("Test Set Accuracy")
print(accuracy_score(y_test,Y_test_linear_SVC))

Training Set Accuracy
0.8312


Test Set Accuracy
0.8203


Next up, we will try `SVC Classifier` with `rbf` kernel which is the default kernel


In [12]:
SVC_model = SVC(gamma="scale")
SVC_model.fit(X_train_s[0:5000],y_train[0:5000])

Y_train_SVC = SVC_model.predict(X_train_s)
Y_test_SVC = SVC_model.predict(X_test_s)

In [13]:
print("Training Set Accuracy")
print( accuracy_score(y_train,Y_train_SVC) )
print("\n")

print("Test Set Accuracy")
print(accuracy_score(y_test,Y_test_SVC))

Training Set Accuracy
0.9269666666666667


Test Set Accuracy
0.9236


**IMPORTANT** : This is very cool, the fact that an implementation of SVC on a very small subset *m =5000* can give ~ 93% accuracy on both training and testing sets is amazing

The **next step** is to use `Randomised Search` for hypperparameter tuning

In [24]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import reciprocal,uniform

#step one: mention which parameters have to be tweaked
param_distributions = {"gamma": reciprocal(0.001,0.1), "C": uniform(1,10)}

#step two: mention the number of passes, governed by n_iter and cv
rand_search_cv = RandomizedSearchCV(SVC_model,param_distributions, n_iter=10, cv = 3, verbose = 2)

#step three; fit this model
rand_search_cv.fit(X_train_s[0:5000],y_train[0:5000])

Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV] C=6.03865046648656, gamma=0.01719603761760927 ...................


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV] .... C=6.03865046648656, gamma=0.01719603761760927, total=  23.9s
[CV] C=6.03865046648656, gamma=0.01719603761760927 ...................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   33.0s remaining:    0.0s


[CV] .... C=6.03865046648656, gamma=0.01719603761760927, total=  24.1s
[CV] C=6.03865046648656, gamma=0.01719603761760927 ...................
[CV] .... C=6.03865046648656, gamma=0.01719603761760927, total=  24.1s
[CV] C=2.9361149420283263, gamma=0.03202909720476346 .................
[CV] .. C=2.9361149420283263, gamma=0.03202909720476346, total=  24.3s
[CV] C=2.9361149420283263, gamma=0.03202909720476346 .................
[CV] .. C=2.9361149420283263, gamma=0.03202909720476346, total=  24.3s
[CV] C=2.9361149420283263, gamma=0.03202909720476346 .................
[CV] .. C=2.9361149420283263, gamma=0.03202909720476346, total=  24.5s
[CV] C=9.382863952151029, gamma=0.013420976951142602 .................
[CV] .. C=9.382863952151029, gamma=0.013420976951142602, total=  23.4s
[CV] C=9.382863952151029, gamma=0.013420976951142602 .................
[CV] .. C=9.382863952151029, gamma=0.013420976951142602, total=  23.7s
[CV] C=9.382863952151029, gamma=0.013420976951142602 .................
[CV] .

[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed: 13.3min finished


RandomizedSearchCV(cv=3, error_score='raise-deprecating',
          estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
          fit_params=None, iid='warn', n_iter=10, n_jobs=None,
          param_distributions={'gamma': <scipy.stats._distn_infrastructure.rv_frozen object at 0x1a2126e4e0>, 'C': <scipy.stats._distn_infrastructure.rv_frozen object at 0x1a2126e240>},
          pre_dispatch='2*n_jobs', random_state=None, refit=True,
          return_train_score='warn', scoring=None, verbose=2)

Using `rand_search_cv.best_estimator_` to get the best possible model

In [25]:
rand_search_cv.best_estimator_

SVC(C=5.267048372949148, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.002117027551408727,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

**Generating predictions and evaluating accuracy** corresponding to the "best" estimator for both training and testing sets

In [26]:
Y_train_best_est = rand_search_cv.best_estimator_.predict(X_train_s)
Y_test_best_est = rand_search_cv.best_estimator_.predict(X_test_s)

In [27]:
print("Training Set Accuracy")
print( accuracy_score(y_train,Y_train_best_est) )
print("\n")

print("Test Set Accuracy")
print(accuracy_score(y_test,Y_test_best_est))

Training Set Accuracy
0.9310333333333334


Test Set Accuracy
0.9269


**Summary**
Randomised Search gives a better accuracy than the *vanilla* version.

***

# 3. Training an SVM Rgressor on the `California Housing Prices` data-set

Goal:
1. Create a good model employing `LinearSVR` and `SVR`
2. Finetune the `SVR` model using `Randomised Search`
3. (optional) Implement `Grid Search`

### Getting the Data, Splitting it into Training and Test Sets

In [29]:
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
X = housing["data"]
y = housing["target"]

X.shape

(20640, 8)

In [34]:
print(housing.DESCR)

.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

    :Number of Instances: 20640

    :Number of Attributes: 8 numeric, predictive attributes and the target

    :Attribute Information:
        - MedInc        median income in block
        - HouseAge      median house age in block
        - AveRooms      average number of rooms
        - AveBedrms     average number of bedrooms
        - Population    block population
        - AveOccup      average house occupancy
        - Latitude      house block latitude
        - Longitude     house block longitude

    :Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
http://lib.stat.cmu.edu/datasets/

The target variable is the median house value for California districts.

This dataset was derived from the 1990 U.S. census, using one row per census
block group. A block group is the smallest geographical unit for which the U.S.
Census Bur

In [35]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [36]:
#It is important to scale the data for applying SVM

X_train_s = std_scaler.fit_transform(X_train)
X_test_s = std_scaler.fit_transform(X_test)

### Fitting Model

In [46]:
from sklearn.svm import LinearSVR
from sklearn.svm import SVR



linear_svr_model = LinearSVR(random_state = 42, epsilon = 1)

linear_svr_model.fit(X_train_s, y_train)

Y_train_linear_svr = linear_svr_model.predict(X_train_s)
Y_test_linear_svr = linear_svr_model.predict(X_test_s)


from sklearn.metrics import mean_squared_error
mse_train_linear_svr = mean_squared_error(y_train, Y_train_linear_svr)
mse_test_linear_svr = mean_squared_error(y_test, Y_test_linear_svr)



In [47]:
print("Training Set MSE")
print( mse_train_linear_svr )
print("\n")

print("Test Set MSE")
print(mse_test_linear_svr)

Training Set MSE
0.5595633542831504


Test Set MSE
0.5800105597796045


In [48]:
svr_model = SVR()

svr_model.fit(X_train_s, y_train)

Y_train_svr = svr_model.predict(X_train_s)
Y_test_svr = svr_model.predict(X_test_s)


from sklearn.metrics import mean_squared_error
mse_train_svr = mean_squared_error(y_train, Y_train_svr)
mse_test_svr = mean_squared_error(y_test, Y_test_svr)

In [49]:
print("Training Set MSE")
print( mse_train_svr )
print("\n")

print("Test Set MSE")
print(mse_test_svr)

Training Set MSE
0.33612151092674214


Test Set MSE
0.9246475842378048


### Attempting Randomized Search

In [50]:
from sklearn.svm import SVR
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import reciprocal, uniform

param_distributions = {"gamma": reciprocal(0.001, 0.1), "C": uniform(1, 10)}
rnd_search_cv = RandomizedSearchCV(SVR(), param_distributions, n_iter=10, verbose=2, cv=3, random_state=42)
rnd_search_cv.fit(X_train_s, y_train)

Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV] C=4.745401188473625, gamma=0.07969454818643928 ..................


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV] ... C=4.745401188473625, gamma=0.07969454818643928, total=   6.0s
[CV] C=4.745401188473625, gamma=0.07969454818643928 ..................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    7.9s remaining:    0.0s


[CV] ... C=4.745401188473625, gamma=0.07969454818643928, total=   5.8s
[CV] C=4.745401188473625, gamma=0.07969454818643928 ..................
[CV] ... C=4.745401188473625, gamma=0.07969454818643928, total=   6.2s
[CV] C=8.31993941811405, gamma=0.015751320499779724 ..................
[CV] ... C=8.31993941811405, gamma=0.015751320499779724, total=   5.4s
[CV] C=8.31993941811405, gamma=0.015751320499779724 ..................
[CV] ... C=8.31993941811405, gamma=0.015751320499779724, total=   5.2s
[CV] C=8.31993941811405, gamma=0.015751320499779724 ..................
[CV] ... C=8.31993941811405, gamma=0.015751320499779724, total=   5.3s
[CV] C=2.560186404424365, gamma=0.002051110418843397 .................
[CV] .. C=2.560186404424365, gamma=0.002051110418843397, total=   4.8s
[CV] C=2.560186404424365, gamma=0.002051110418843397 .................
[CV] .. C=2.560186404424365, gamma=0.002051110418843397, total=   5.0s
[CV] C=2.560186404424365, gamma=0.002051110418843397 .................
[CV] .

[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:  3.5min finished


RandomizedSearchCV(cv=3, error_score='raise-deprecating',
          estimator=SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
  gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True,
  tol=0.001, verbose=False),
          fit_params=None, iid='warn', n_iter=10, n_jobs=None,
          param_distributions={'gamma': <scipy.stats._distn_infrastructure.rv_frozen object at 0x1a211ecef0>, 'C': <scipy.stats._distn_infrastructure.rv_frozen object at 0x1a211ec940>},
          pre_dispatch='2*n_jobs', random_state=42, refit=True,
          return_train_score='warn', scoring=None, verbose=2)

In [51]:
rnd_search_cv.best_estimator_

SVR(C=4.745401188473625, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
  gamma=0.07969454818643928, kernel='rbf', max_iter=-1, shrinking=True,
  tol=0.001, verbose=False)

In [54]:
Y_train_rnd_srch = rnd_search_cv.best_estimator_.predict(X_train_s)
Y_test_rnd_srch = rnd_search_cv.best_estimator_.predict(X_test_s)

In [55]:
mse_train_rnd_srch = mean_squared_error(y_train, Y_train_rnd_srch)
mse_test_rnd_srch = mean_squared_error(y_test, Y_test_rnd_srch)

In [56]:
print("Training Set MSE")
print( mse_train_rnd_srch )
print("\n")

print("Test Set MSE")
print(mse_test_rnd_srch)

Training Set MSE
0.3280453999995988


Test Set MSE
1.224712529941543
