

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/akshayrb22/playing-with-data/blob/master/supervised_learning/support_vector_machine/svm.ipynb)


#SVM Concept (Credits not mine)

## Support Vector Machine Classification


## What will we do?

We will build a Support Vector Machine that will find the optimal hyperplane that maximizes the margin between two toy data classes using gradient descent.  

![alt text](http://opticalengineering.spiedigitallibrary.org/data/journals/optice/24850/oe_52_2_027003_f005.png "Logo Title Text 1")


## What are some use cases for SVMs?

-Classification, regression (time series prediction, etc) , outlier detection, clustering


## How does an SVM compare to other ML algorithms?

![alt text](https://image.slidesharecdn.com/mscpresentation-140722065852-phpapp01/95/msc-presentation-bioinformatics-7-638.jpg?cb=1406012610 "Logo Title Text 1")

- As a rule of thumb, SVMs are great for relatively small data sets with fewer outliers. 
- Other algorithms (Random forests, deep neural networks, etc.) require more data but almost always come up with very robust models.
- The decision of which classifier to use depends on your dataset and the general complexity of the problem.
- "Premature optimization is the root of all evil (or at least most of it) in programming." - Donald Knuth, CS Professor (Turing award speech 1974)  


## What is a Support Vector Machine?

It's a supervised machine learning algorithm which can be used for both classification or regression problems. But it's usually used for classification. Given 2 or more labeled classes of data, it acts as a discriminative classifier, formally defined by an optimal hyperplane that seperates all the classes. New examples that are then mapped into that same space can then be categorized based on on which side of the gap they fall.

## What are Support Vectors?

![alt text](https://www.dtreg.com/uploaded/pageimg/SvmMargin2.jpg "Logo Title Text 1")
 
Support vectors are the data points nearest to the hyperplane, the points of a data set that, if removed, would alter the position of the dividing hyperplane. Because of this, they can be considered the critical elements of a data set, they are what help us build our SVM. 

## Whats a hyperplane?

![alt text](http://slideplayer.com/slide/1579281/5/images/32/Hyperplanes+as+decision+surfaces.jpg "Logo Title Text 1")

Geometry tells us that a hyperplane is a subspace of one dimension less than its ambient space. For instance, a hyperplane of an n-dimensional space is a flat subset with dimension n − 1. By its nature, it separates the space into two half spaces.

## Let's define our loss function (what to minimize) and our objective function (what to optimize)

#### Loss function

We'll use the Hinge loss. This is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs).

![alt text](http://i.imgur.com/OzCwzyN.png "Logo Title Text 1")


c is the loss function, x the sample, y is the true label, f(x) the predicted label.

![alt text](http://i.imgur.com/FZ7JcG3.png "Logo Title Text 1")

 
#### Objective Function

![alt text](http://i.imgur.com/I5NNu44.png "Logo Title Text 1")

As you can see, our objective of a SVM consists of two terms. The first term is a regularizer, the heart of the SVM, the second term the loss. The regularizer balances between margin maximization and loss. We want to find the decision surface that is maximally far away from any data points.

How do we minimize our loss/optimize for our objective (i.e learn)?

We have to derive our objective function to get the gradients! Gradient descent ftw.  As we have two terms, we will derive them seperately using the sum rule in differentiation.


![alt text](http://i.imgur.com/6uK3BnH.png "Logo Title Text 1")

This means, if we have a misclassified sample, we update the weight vector w using the gradients of both terms, else if classified correctly,we just update w by the gradient of the regularizer.



Misclassification condition 

![alt text](http://i.imgur.com/g9QLAyn.png "Logo Title Text 1")

Update rule for our weights (misclassified)

![alt text](http://i.imgur.com/rkdPpTZ.png "Logo Title Text 1")

including the learning rate η and the regularizer λ
The learning rate is the length of the steps the algorithm makes down the gradient on the error curve.
- Learning rate too high? The algorithm might overshoot the optimal point.
- Learning rate too low? Could take too long to converge. Or never converge.

The regularizer controls the trade off between the achieving a low training error and a low testing error that is the ability to generalize your classifier to unseen data. As a regulizing parameter we choose 1/epochs, so this parameter will decrease, as the number of epochs increases.
- Regularizer too high? overfit (large testing error) 
- Regularizer too low? underfit (large training error) 

Update rule for our weights (correctly classified)

![alt text](http://i.imgur.com/xTKbvZ6.png "Logo Title Text 1")


#Code

###Imports

In [52]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

from matplotlib import pyplot as plt
%matplotlib inline

###Download Required Flower Data

In [72]:
dir = '.'
!gdown --id 1v5MT6hfE2rzdubarO3ZVLu0VkBr6Kud0 --output "{dir}/data.zip"
f = dir + "/data.zip"
data_p = "/content/data/"
!mkdir "$data_p"

!unzip "$f" -d "$data_p"

Downloading...
From: https://drive.google.com/uc?id=1v5MT6hfE2rzdubarO3ZVLu0VkBr6Kud0
To: /content/data.zip
  0% 0.00/1.34k [00:00<?, ?B/s]100% 1.34k/1.34k [00:00<00:00, 2.26MB/s]
mkdir: cannot create directory ‘/content/data/’: File exists
Archive:  ./data.zip
  inflating: /content/data/TestSet1.csv  
  inflating: /content/data/TrainingSet.csv  


###Load Train and Test Data

In [74]:
train_df = pd.read_csv('./data/TrainingSet.csv')
train_df.plant.unique()

array(['Arctica', 'Harlequin', 'Carolinian'], dtype=object)

In [75]:
test_df  = pd.read_csv('./data/TestSet1.csv') 
test_X = test_df.drop(['plant'], axis=1)
test_X.head()

Unnamed: 0,leaf.length,leaf.width,flower.length,flower.width
0,4.4,2.9,1.4,0.2
1,4.6,3.1,1.5,0.2
2,4.6,3.4,1.4,0.3
3,4.7,3.2,1.3,0.2
4,4.9,3.0,1.4,0.2


### Separate Labels and Input Data

In [76]:
# Extract Lables

train_df['plant'] = train_df['plant'].map({
    'Arctica': 1,
    'Harlequin': 2,
      'Carolinian':3
})  # Label values - 1 for Arctica and 2 for Harlequin and vice versa


In [77]:
train_df[0:2]

Unnamed: 0,leaf.length,leaf.width,flower.length,flower.width,plant
0,5.4,3.7,1.5,0.2,1
1,4.8,3.4,1.6,0.2,1


In [78]:
# Extract all lables as list

train_Y = train_df['plant'].tolist()
train_Y[35:45]

[1, 1, 1, 1, 1, 2, 2, 2, 2, 2]

In [79]:
train_X = train_df.drop(['plant'], axis=1)
train_X[0:5]

Unnamed: 0,leaf.length,leaf.width,flower.length,flower.width
0,5.4,3.7,1.5,0.2
1,4.8,3.4,1.6,0.2
2,4.8,3.0,1.4,0.1
3,4.3,3.0,1.1,0.1
4,5.8,4.0,1.2,0.2


###Make a Model

In [80]:
model = SVC(kernel='linear')

In [81]:
model.fit(train_X, train_Y)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [82]:
predictions = model.predict(test_X)
print(predictions)

[1 1 1 1 1 1 2 2 1 1 1 2 1 2 2 3 2 3 3 2 2 3 2 3 2 2 3 3 3 3]


###Create a csv file using test data and predictions

In [83]:
# Inser Labels in Datafram

test_X.insert(4, "plant", predictions)
test_X.head()

Unnamed: 0,leaf.length,leaf.width,flower.length,flower.width,plant
0,4.4,2.9,1.4,0.2,1
1,4.6,3.1,1.5,0.2,1
2,4.6,3.4,1.4,0.3,1
3,4.7,3.2,1.3,0.2,1
4,4.9,3.0,1.4,0.2,1


In [85]:
# Write the file as csv
test_X.to_csv('./data/TestSet1.csv')

In [None]:
# percentage = model.score(X_test, y_test)

In [None]:
# from sklearn.metrics import confusion_matrix
# res = confusion_matrix(y_test, predictions)
# print("Confusion Matrix")
# print(res)
# print(f"Test Set: {len(X_test)}")
# print(f"Accuracy = {percentage*100} %")