___


___
# Support Vector Machines In-class Lab

## Import Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Get the Data

In [2]:
social = pd.read_csv('Social_Network_Ads.csv')

In [3]:
social.head(3)

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0


UserID is an identifier, not a predictor. Gender should be changed from Male/Female to 0/1. Purchased is the target variable. This will likely need to be scaled because EstimatedSalary is on a very large scale.

In [4]:
social.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 5 columns):
User ID            400 non-null int64
Gender             400 non-null object
Age                400 non-null int64
EstimatedSalary    400 non-null int64
Purchased          400 non-null int64
dtypes: int64(4), object(1)
memory usage: 15.7+ KB


No missing values to handle!

Drop User ID:

In [5]:
social.drop(['User ID'], axis=1, inplace=True)

In [6]:
social['Gender'].replace('Male', 1, inplace=True)
social['Gender'].replace('Female', 0, inplace=True)

In [7]:
## validating according to cell above
social['Gender'].head()

0    1
1    1
2    0
3    0
4    1
Name: Gender, dtype: int64

In [8]:
social.rename(index=str, columns={"Gender": "Male"}, inplace=True)
social.head()

Unnamed: 0,Male,Age,EstimatedSalary,Purchased
0,1,19,19000,0
1,1,35,20000,0
2,0,26,43000,0
3,0,27,57000,0
4,1,19,76000,0


In [9]:
social.info()

<class 'pandas.core.frame.DataFrame'>
Index: 400 entries, 0 to 399
Data columns (total 4 columns):
Male               400 non-null int64
Age                400 non-null int64
EstimatedSalary    400 non-null int64
Purchased          400 non-null int64
dtypes: int64(4)
memory usage: 15.6+ KB


In [10]:
X = social.drop(['Purchased'], axis=1)
X.head()

Unnamed: 0,Male,Age,EstimatedSalary
0,1,19,19000
1,1,35,20000
2,0,26,43000
3,0,27,57000
4,1,19,76000


In [11]:
y = social['Purchased']

## Train Test Split

In [12]:
from sklearn.model_selection import train_test_split

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

## Train the Support Vector Classifier

In [14]:
from sklearn.svm import SVC

In [15]:
model = SVC()

In [16]:
model.fit(X_train,y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

## Predictions and Evaluations

In [17]:
predictions = model.predict(X_test)

In [18]:
from sklearn.metrics import classification_report,confusion_matrix

In [19]:
print(confusion_matrix(y_test,predictions))

[[67  1]
 [25  7]]


In [20]:
print(classification_report(y_test,predictions))

             precision    recall  f1-score   support

          0       0.73      0.99      0.84        68
          1       0.88      0.22      0.35        32

avg / total       0.78      0.74      0.68       100



Doesn't look like recall is good on purchasers- try scaling.

## Train the Support Vector Classifier on Scaled Predictors

In [21]:
## with feature scaling
from sklearn import preprocessing

scaler = preprocessing.StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [22]:
model2 = SVC()
model2.fit(X_train_scaled,y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

## Predictions and Evaluations on Scaled Predictors

In [23]:
predictions2 = model2.predict(X_test_scaled)

In [24]:
from sklearn.metrics import classification_report,confusion_matrix

In [25]:
print(confusion_matrix(y_test,predictions))

[[67  1]
 [25  7]]


In [26]:
print(classification_report(y_test,predictions))

             precision    recall  f1-score   support

          0       0.73      0.99      0.84        68
          1       0.88      0.22      0.35        32

avg / total       0.78      0.74      0.68       100



## Gridsearch- RBF

In [27]:
param_grid = {'C': [0.1,1, 10, 100, 1000], 'gamma': [1,0.1,0.01,0.001,0.0001], 'kernel': ['rbf']} 

In [28]:
from sklearn.model_selection import GridSearchCV

In [29]:
grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=3)

In [30]:
grid.fit(X_train,y_train)

Fitting 3 folds for each of 25 candidates, totalling 75 fits
[CV] kernel=rbf, gamma=1, C=0.1 ......................................
[CV] ........... kernel=rbf, gamma=1, C=0.1, score=0.63, total=   0.0s
[CV] kernel=rbf, gamma=1, C=0.1 ......................................
[CV] ........... kernel=rbf, gamma=1, C=0.1, score=0.63, total=   0.0s
[CV] kernel=rbf, gamma=1, C=0.1 ......................................
[CV] ........... kernel=rbf, gamma=1, C=0.1, score=0.63, total=   0.0s
[CV] kernel=rbf, gamma=0.1, C=0.1 ....................................
[CV] ......... kernel=rbf, gamma=0.1, C=0.1, score=0.63, total=   0.0s
[CV] kernel=rbf, gamma=0.1, C=0.1 ....................................
[CV] ......... kernel=rbf, gamma=0.1, C=0.1, score=0.63, total=   0.0s
[CV] kernel=rbf, gamma=0.1, C=0.1 ....................................
[CV] ......... kernel=rbf, gamma=0.1, C=0.1, score=0.63, total=   0.0s
[CV] kernel=rbf, gamma=0.01, C=0.1 ...................................
[CV] ........ ke

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s


[CV] ............. kernel=rbf, gamma=1, C=1, score=0.63, total=   0.0s
[CV] kernel=rbf, gamma=1, C=1 ........................................
[CV] ............. kernel=rbf, gamma=1, C=1, score=0.65, total=   0.0s
[CV] kernel=rbf, gamma=0.1, C=1 ......................................
[CV] ........... kernel=rbf, gamma=0.1, C=1, score=0.66, total=   0.0s
[CV] kernel=rbf, gamma=0.1, C=1 ......................................
[CV] ........... kernel=rbf, gamma=0.1, C=1, score=0.65, total=   0.0s
[CV] kernel=rbf, gamma=0.1, C=1 ......................................
[CV] ........... kernel=rbf, gamma=0.1, C=1, score=0.71, total=   0.0s
[CV] kernel=rbf, gamma=0.01, C=1 .....................................
[CV] .......... kernel=rbf, gamma=0.01, C=1, score=0.66, total=   0.0s
[CV] kernel=rbf, gamma=0.01, C=1 .....................................
[CV] ........... kernel=rbf, gamma=0.01, C=1, score=0.7, total=   0.0s
[CV] kernel=rbf, gamma=0.01, C=1 .....................................
[CV] .

[Parallel(n_jobs=1)]: Done  75 out of  75 | elapsed:    0.9s finished


GridSearchCV(cv=None, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'kernel': ['rbf'], 'gamma': [1, 0.1, 0.01, 0.001, 0.0001], 'C': [0.1, 1, 10, 100, 1000]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=3)

You can inspect the best parameters found by GridSearchCV in the best_params_ attribute, and the best estimator in the best\_estimator_ attribute:

In [31]:
grid.best_params_

{'C': 1, 'gamma': 0.01, 'kernel': 'rbf'}

In [32]:
grid.best_estimator_

SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [33]:
grid_predictions = grid.predict(X_test)

In [34]:
print(confusion_matrix(y_test,grid_predictions))

[[65  3]
 [15 17]]


In [35]:
print(classification_report(y_test,grid_predictions))

             precision    recall  f1-score   support

          0       0.81      0.96      0.88        68
          1       0.85      0.53      0.65        32

avg / total       0.82      0.82      0.81       100



These results are much better! I will also try sigmoid and poly kernel types since they also take gamma.

## Gridsearch- Poly

Note: Tried this, but it was very slow so probably not the right parameters to try!

In [36]:
##param_grid = {'C': [0.1,1, 10, 100, 1000], 'gamma': [1,0.1,0.01,0.001,0.0001], 'kernel': ['poly']} 

In [37]:
##from sklearn.model_selection import GridSearchCV

In [38]:
##grid2 = GridSearchCV(SVC(),param_grid,refit=True,verbose=3)

In [39]:
##grid2.fit(X_train,y_train)

In [40]:
##grid2.best_params_

In [41]:
##grid2.best_estimator_

In [42]:
##grid2_predictions = grid.predict(X_test)

In [43]:
##print(confusion_matrix(y_test,grid2_predictions))

In [44]:
##print(classification_report(y_test,grid2_predictions))

## Gridsearch- Sigmoid

In [45]:
param_grid = {'C': [0.1,1, 10, 100, 1000], 'gamma': [1,0.1,0.01,0.001,0.0001], 'kernel': ['sigmoid']} 

In [46]:
from sklearn.model_selection import GridSearchCV

In [47]:
grid3 = GridSearchCV(SVC(),param_grid,refit=True,verbose=3)

In [48]:
grid3.fit(X_train,y_train)

Fitting 3 folds for each of 25 candidates, totalling 75 fits
[CV] kernel=sigmoid, gamma=1, C=0.1 ..................................
[CV] ....... kernel=sigmoid, gamma=1, C=0.1, score=0.63, total=   0.0s
[CV] kernel=sigmoid, gamma=1, C=0.1 ..................................
[CV] ....... kernel=sigmoid, gamma=1, C=0.1, score=0.63, total=   0.0s
[CV] kernel=sigmoid, gamma=1, C=0.1 ..................................
[CV] ....... kernel=sigmoid, gamma=1, C=0.1, score=0.63, total=   0.0s
[CV] kernel=sigmoid, gamma=0.1, C=0.1 ................................
[CV] ..... kernel=sigmoid, gamma=0.1, C=0.1, score=0.63, total=   0.0s
[CV] kernel=sigmoid, gamma=0.1, C=0.1 ................................
[CV] ..... kernel=sigmoid, gamma=0.1, C=0.1, score=0.63, total=   0.0s
[CV] kernel=sigmoid, gamma=0.1, C=0.1 ................................
[CV] ..... kernel=sigmoid, gamma=0.1, C=0.1, score=0.63, total=   0.0s
[CV] kernel=sigmoid, gamma=0.01, C=0.1 ...............................
[CV] .... kernel

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s



[CV] kernel=sigmoid, gamma=1, C=10 ...................................
[CV] ........ kernel=sigmoid, gamma=1, C=10, score=0.63, total=   0.0s
[CV] kernel=sigmoid, gamma=1, C=10 ...................................
[CV] ........ kernel=sigmoid, gamma=1, C=10, score=0.63, total=   0.0s
[CV] kernel=sigmoid, gamma=1, C=10 ...................................
[CV] ........ kernel=sigmoid, gamma=1, C=10, score=0.63, total=   0.0s
[CV] kernel=sigmoid, gamma=0.1, C=10 .................................
[CV] ...... kernel=sigmoid, gamma=0.1, C=10, score=0.63, total=   0.0s
[CV] kernel=sigmoid, gamma=0.1, C=10 .................................
[CV] ...... kernel=sigmoid, gamma=0.1, C=10, score=0.63, total=   0.0s
[CV] kernel=sigmoid, gamma=0.1, C=10 .................................
[CV] ...... kernel=sigmoid, gamma=0.1, C=10, score=0.63, total=   0.0s
[CV] kernel=sigmoid, gamma=0.01, C=10 ................................
[CV] ..... kernel=sigmoid, gamma=0.01, C=10, score=0.63, total=   0.0s
[CV] 

[Parallel(n_jobs=1)]: Done  75 out of  75 | elapsed:    0.3s finished


GridSearchCV(cv=None, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'kernel': ['sigmoid'], 'gamma': [1, 0.1, 0.01, 0.001, 0.0001], 'C': [0.1, 1, 10, 100, 1000]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=3)

In [49]:
grid3.best_params_

{'C': 0.1, 'gamma': 1, 'kernel': 'sigmoid'}

In [50]:
grid3.best_estimator_

SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='sigmoid',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [51]:
grid3_predictions = grid.predict(X_test)

In [52]:
print(confusion_matrix(y_test,grid3_predictions))

[[65  3]
 [15 17]]


In [53]:
print(classification_report(y_test,grid3_predictions))

             precision    recall  f1-score   support

          0       0.81      0.96      0.88        68
          1       0.85      0.53      0.65        32

avg / total       0.82      0.82      0.81       100



Performed very similarly!

## Results

Either SVM sigmoid with C:0.1 and gamma 1 or SVM rbf with C:1 and gamma 0.1 will produce the best precision and recall outcomes.