# SVM Demonstration

In this tutorial we will demonstrate how to use the `SVM` class in `scikit-learn` to perform logistic regression on a dataset. 

NOTE: We are not splitting the data in this example. For this example we focus on the fitting process and results of the model on training data. As we know, this isn't how you would normally use a model. You can easily add splitting the data (as we did in the previous examples).

## 1. Setup

Import modules

In [48]:
import pandas as pd
from sklearn.svm import SVC
from matplotlib import pyplot as plt
import numpy as np
from sklearn.metrics import confusion_matrix

np.random.seed(1)

In [49]:
## 2. Load data

Load data (it's already cleaned and preprocessed)

In [50]:
# Uncomment the following snippet of code to debug problems with finding the .csv file path
# This snippet of code will exit the program and print the current working directory.
#import os
#print(os.getcwd())

In [51]:
df = pd.read_csv('./data/logistic_example01.csv') # let's use the same data as we did in the logistic regression example
df.head(3)

Unnamed: 0,kgs_smoked,cancer
0,-0.65956,0
1,5.78149,0
2,-8.247713,0


In [52]:
X = df[['kgs_smoked']]
y = df[['cancer']]

## 3. Model the data

First, let's create a dataframe to load the model performance metrics into.

In [53]:
performance = pd.DataFrame({"model": [], "Accuracy": [], "Precision": [], "Recall": [], "F1": []})

### 3.1 Fit a SVM classification model using linear kernal

In [54]:
svm_lin_model = SVC(kernel="linear")
_ = svm_lin_model.fit(X, np.ravel(y))

In [55]:
model_preds = svm_lin_model.predict(X)
c_matrix = confusion_matrix(y, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"linear svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])

### 3.2 Fit a SVM classification model using rbf kernal

In [56]:
svm_rbf_model = SVC(kernel="rbf", C=10, gamma='scale')
_ = svm_rbf_model.fit(X, np.ravel(y))

In [57]:
model_preds = svm_rbf_model.predict(X)
c_matrix = confusion_matrix(y, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"rbf svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])

### 3.3 Fit a SVM classification model using polynomial kernal

In [58]:
svm_poly_model = SVC(kernel="poly", degree=3, coef0=1, C=10)
_ = svm_poly_model.fit(X, np.ravel(y))

In [59]:
model_preds = svm_poly_model.predict(X)
c_matrix = confusion_matrix(y, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"poly svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])

In [60]:
## 4.0 Summary

performance

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.76,0.833333,0.714286,0.769231
0,rbf svm,0.76,1.0,0.571429,0.727273
0,poly svm,0.76,1.0,0.571429,0.727273


In [3]:
import sklearn
print(dir(sklearn))
help(sklearn)


['__SKLEARN_SETUP__', '__all__', '__builtins__', '__cached__', '__check_build', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_config', '_distributor_init', 'base', 'clone', 'config_context', 'exceptions', 'externals', 'get_config', 'logger', 'logging', 'os', 'random', 'set_config', 'setup_module', 'show_versions', 'sys', 'utils']
Help on package sklearn:

NAME
    sklearn

DESCRIPTION
    Machine learning module for Python
    
    sklearn is a Python module integrating classical machine
    learning algorithms in the tightly-knit world of scientific Python
    packages (numpy, scipy, matplotlib).
    
    It aims to provide simple and efficient solutions to learning problems
    that are accessible to everybody and reusable in various contexts:
    machine-learning as a versatile tool for science and engineering.
    
    See http://scikit-learn.org for complete documentation.

PACKAGE CONTENTS
    __check_build (package)
    

In [10]:
from sklearn.svm import __all__

In [11]:
__all__.

['LinearSVC',
 'LinearSVR',
 'NuSVC',
 'NuSVR',
 'OneClassSVM',
 'SVC',
 'SVR',
 'l1_min_c']