# SVM Demonstration

In this tutorial we will demonstrate how to use the `SVM` class in `scikit-learn` to perform logistic regression on a dataset. 

NOTE: We are not splitting the data in this example. For this example we focus on the fitting process and results of the model on training data. As we know, this isn't how you would normally use a model. You can easily add splitting the data (as we did in the previous examples).

## 1. Setup

Import modules

In [115]:
import pandas as pd
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt
import numpy as np
from sklearn.metrics import confusion_matrix
import pickle

np.random.seed(100)

In [13]:
## 2. Load data

Load data (it's already cleaned and preprocessed)

In [116]:
df = pd.read_csv('RidingMowers.csv')
df.head(3)

Unnamed: 0,Income,Lot_Size,Ownership
0,60.0,18.4,Owner
1,85.5,16.8,Owner
2,64.8,21.6,Owner


In [117]:
# split the data into validation and training set
train_df, test_df = train_test_split(df, test_size=0.3)

# to reduce repetition in later code, create variables to represent the columns
# that are our predictors and target
target = 'Ownership'
predictors = list(df.columns)
predictors.remove(target)

In [118]:
train_X = train_df[predictors]
train_y = train_df[target] # train_target is now a series
test_y = test_df[target] # validation_target is now a series object

## 3. Model the data

First, let's create a dataframe to load the model performance metrics into.

In [119]:
performance = pd.DataFrame({"model": [], "Accuracy": [], "Precision": [], "Recall": [], "F1": []})

### 3.1 Fit a SVM classification model using linear kernal

In [157]:
svm_lin_model = SVC(kernel="linear",probability=True)
_ = svm_lin_model.fit(train_X, np.ravel(train_y))

In [158]:
model_preds = svm_lin_model.predict(test_X)
c_matrix = confusion_matrix(test_y, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"linear svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])
performance

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.75,0.6,1.0,0.75
0,rbf svm,0.875,0.75,1.0,0.857143
0,poly svm,0.5,0.333333,0.333333,0.333333
0,rbf svm,0.875,0.75,1.0,0.857143
0,linear svm,0.75,0.6,1.0,0.75


### 3.2 Fit a SVM classification model using rbf kernal

In [134]:
svm_rbf_model = SVC(kernel="rbf", C=10, gamma='scale',probability=True)
_ = svm_rbf_model.fit(train_X, np.ravel(train_y))

In [135]:
model_preds = svm_rbf_model.predict(test_X)
c_matrix = confusion_matrix(test_y, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"rbf svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])
performance

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.75,0.6,1.0,0.75
0,rbf svm,0.875,0.75,1.0,0.857143
0,poly svm,0.5,0.333333,0.333333,0.333333
0,rbf svm,0.875,0.75,1.0,0.857143


### 3.3 Fit a SVM classification model using polynomial kernal

In [124]:
svm_poly_model = SVC(kernel="poly", degree=3, coef0=1, C=10)
_ = svm_poly_model.fit(train_X, np.ravel(train_y))

In [125]:
model_preds = svm_poly_model.predict(test_X)
c_matrix = confusion_matrix(test_y, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"poly svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])

In [126]:
## 4.0 Summary

performance

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.75,0.6,1.0,0.75
0,rbf svm,0.875,0.75,1.0,0.857143
0,poly svm,0.5,0.333333,0.333333,0.333333


Based on the values for our three SVM model we can see that ur SVM with RBF kernel performs the better in-terms of accuracy ,precision, recall and F1. Also, we can see that the dtaset is balance for our independent variable we can consider accuracy aas a parameter to select our model as it is a measure of how often the model correctly predicts the target variable, i.e., the percentage of correct predictions among all predictions. So we will select SVM with RBF kernel as our best model among all. 

In [159]:
pickle.dump(svm_rbf_model , open('svm_model.pkl', 'wb'))