# Support Vector Machines (SVMs) in Python #
##### Killian McKee #####

### Overview ###

1. [What is a Support Vector Machine](#section1)
2. [Pros and Cons of SVMs](#section2)
3. [When to use SVMs](#section3)
4. [Key Parameters](#section4)
5. [SVM Classifier Walkthrough](#section5)
6. [Conclusion](#section6)
7. [Additional Reading](#section7)
8. [Sources](#section8)

<a id='section1'></a>

### What is a Support Vector Machine? ###

The support vector machine (svm) is a supervised learning algorithm that is primarily used for classification (although it can be used for regression). Svms are currently one of the most popular machine learning algorithms because they are capable of performing non linear classifications on both standard and high dimensional data.  The objective of support vector machines is to find the optimal hyperplane in an N dimensional space (where N is the number of features). The optimal hyperplane has the largest margin between different classes of data. 

<img src='svm_example.png'>

<a id='section2'></a>

### Pros and Cons of SVMs ### 

#### Pros #### 
1. **Gives optimal solution**: Unlike some algorithms that can get caught in locally optimum solutions, svms always give the global optimum. 
2. **Overfitting Resistance**: svm's regularization parameters helps prevent overfitting.
3. **Accurate**: svms are accurate and tunable thanks to their regularization and kernel parameters. 
4. **Resistant to Class Imbalances**: Svms continue to perform well on classification tasks where there are many more of a certain classs than the others. 

#### Cons ####
1. **Slow Training Time**: Svm's can take a while to train on larger datasets. 
2. **Noise Sensitive**: Svm's can be adversely affected by noise in datasets, especially with overlapping classes. 

<a id='section3'></a>

### When to use SVMs? ###

Svms are a great option for most classification problems and should also be considered in regression. Because of their popularity, it is easier to talk about some of the instances when svms are not appropriate. 

1. They can be somewhat cumbersome for large multiclass classification problems since each class needs a new model 
2. On perceptual tasks (Speech, vision, etc.) svm's are usually worse the deep neural networks
3. Gradient boosted trees tend to perform better on structured data than svms
4. It can be difficult to interpret the output of svms
5. They can take a long time to train on larger datasets 
6. Choosing a good [kernel function](https://towardsdatascience.com/understanding-the-kernel-trick-e0bc6112ef78) can be difficult

<a id='section4'></a>

### Key SVM Parameters ### 

There are three key parameters for svms: kernel, cost (lambda), and gamma

1. Kernel: kernel is the type of svm we want to create (which could be linear, polynomial, sigmoid, or radial). We choose this based on the underlying shape of our data (can be tough to know without testing). For example, if we want to classify/separate nonlinear data, we wouldn't use a linear kernel. 
2. Lambda: serves as a degree of importance given to miss classifications of the svm. Higher lambda values necessitate more accurate models at the cost of generalizing on new data. 
3. Gamma: gamma is a parameter for gaussian kernels i.e. high dimensional data spaces (see below). Gamma controls the shapes of peaks in a high dimensional setting. A small gamma provides low bias with high variance and vice versa. Grid search can be used to find ideal lambda and gamma values. 

<img src='hdd_example.jpg'>
                           

<a id='section5'></a>

### SVM Classifier Walkthrough: Simple and Kernel ###

Let's walk through how to build a svm classifier on some sample data. More specifically, we will walk through building a kernel svm. We will be using the iris dataset to classify species of iris based on their characteristics. 

In [20]:
# building a simple svm 

In [21]:
import numpy as mp 
import matplotlib.pyplot as plt 
import pandas as pd 
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC  
from sklearn.metrics import classification_report, confusion_matrix 


In [22]:
# importing the data 

#url for the dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# Assign colum names to the dataset
colnames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Read dataset to pandas dataframe
irisdata = pd.read_csv(url, names=colnames)  

In [23]:
# preprocessing the data 

#drop class
X = irisdata.drop('Class', axis=1)  
y = irisdata['Class']  

In [24]:
#create a train test split 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)  

In [25]:
# here we implement a polynomial kernel svm using scikit learn
#the degree parameter is the degree of the polynomial 

svclassifier = SVC(kernel='poly', degree=8)  
svclassifier.fit(X_train, y_train)  

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=8, gamma='auto', kernel='poly',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [26]:
# making predictions on new test data

y_pred = svclassifier.predict(X_test)  

In [27]:
# evaluating our algorithm 

print(confusion_matrix(y_test, y_pred))  
print(classification_report(y_test, y_pred))  

[[12  0  0]
 [ 0 13  1]
 [ 0  0  4]]
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        12
Iris-versicolor       1.00      0.93      0.96        14
 Iris-virginica       0.80      1.00      0.89         4

    avg / total       0.97      0.97      0.97        30



In [28]:
# now lets build a svm with a gaussian kernel 

svclassifier = SVC(kernel='rbf')  
svclassifier.fit(X_train, y_train)  

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [29]:
# getting predictions with our svm model 

y_pred = svclassifier.predict(X_test)  

In [30]:
# evaluating the accuracy of the model 

print(confusion_matrix(y_test, y_pred))  
print(classification_report(y_test, y_pred)) 

[[12  0  0]
 [ 0 13  1]
 [ 0  0  4]]
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        12
Iris-versicolor       1.00      0.93      0.96        14
 Iris-virginica       0.80      1.00      0.89         4

    avg / total       0.97      0.97      0.97        30



In [31]:
#lets fit one last svm model with a sigmoid kernel 

svclassifier = SVC(kernel='sigmoid')  
svclassifier.fit(X_train, y_train)  

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='sigmoid',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [32]:
# generating new predictions 

y_pred = svclassifier.predict(X_test)  

In [33]:
# model accuracy evaluation 
# here we can see the model is not very accurate when we make the assumption the data can be fit with a sigmoid svm 

print(confusion_matrix(y_test, y_pred))  
print(classification_report(y_test, y_pred)) 

[[ 0  0 12]
 [ 0  0 14]
 [ 0  0  4]]
                 precision    recall  f1-score   support

    Iris-setosa       0.00      0.00      0.00        12
Iris-versicolor       0.00      0.00      0.00        14
 Iris-virginica       0.13      1.00      0.24         4

    avg / total       0.02      0.13      0.03        30



  'precision', 'predicted', average, warn_for)


<a id='section6'></a>

### Conclusion ### 

In this tutorial we stepped through how support vector machines can be used effectively in both classification and regression settings. Next, we examined the key parameters (kernel, lambda, and gamma) of svms and then fit three different svms to the iris dataset to classify different flower species. 

<a id='section7'></a>

### Additional Reading ### 
1. [The math behind svms](https://www.svm-tutorial.com/2014/11/svm-understanding-math-part-1/)
2. [svms in depth](https://med.nyu.edu/chibi/sites/default/files/chibi/Final.pdf) 

<a id='section8'></a>

### Sources ###

1. https://www.quora.com/What-are-C-and-gamma-with-regards-to-a-support-vector-machine
2. https://stackabuse.com/implementing-svm-and-kernel-svm-with-pythons-scikit-learn/
3. https://www.google.com/search?rlz=1C1CHBD_enUS811US811&biw=1396&bih=641&tbm=isch&sa=1&ei=2tlIXIroIJKJjwTE4IGwCg&q=3d+data&oq=3d+data&gs_l=img.3..0i67l2j0l8.20519.21706..21839...1.0..0.74.332.5......1....1..gws-wiz-img.......0i7i30j0i8i7i30.6AmvZzYaikE#imgdii=E5kmTvFhCiHPlM:&imgrc=DhVTosm32bZMtM: 
4. https://www.google.com/search?rlz=1C1CHBD_enUS811US811&biw=1396&bih=641&tbm=isch&sa=1&ei=sLlIXL-ENei-jwSwib_IBg&q=support+vector+machine+optimal+hyperplane&oq=support+vector+machine+optimal+hyperplane&gs_l=img.3...13058.16730..16892...0.0..0.180.1768.15j4......1....1..gws-wiz-img.......0j0i8i30j0i24.-zmN7H2K3RU#imgrc=QhS3ivfEb21sNM:
