# SUPPORT VECTOR MACHINES

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In two dimentional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side.

Suppose you are given plot of two label classes on graph as shown in image. And a seperating line is also shown. These lines are built using Support vector machines.

![SVM GRAPH PRESENTATION](http://bit.ly/30M7quR)

Now consider what if we had data as shown in image below? Clearly, there is no line that can separate the two classes in this x-y plane. So what do we do? We apply transformation and add one more dimension as we call it z-axis. Lets assume value of points on z plane, w = x² + y². In this case we can manipulate it as distance of point from z-origin.

![SVM CIRCLE GRAPH](http://bit.ly/34aA6Qq)

### Libraries
<hr>

Sklearn - It provides simple and efficient tools for data mining and analysis <br>
Matplotlib - Used to plot data graphs <br>
Pandas - Easy Dataframe manipulation



In [1]:
from sklearn.datasets import load_iris
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

### Load Iris
<hr>
It is a dataset provided by sklearn, which contains Petal and Sepal, lengths and sizes along with flower details. It can be used to implement SVM efficiently. Now let's lead the dataset into a variable, and take a look at it.

In [2]:
iris = load_iris()
iris

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

### Dataframe and Target
<hr>
Let's create the dataframe variable, where the data used to predict is contained in iris.data and the column names are available in iris.feature_names.<br>
Then we create a column, that will define our target values. ie. Flower names. A machine doesn't recongnise flower names, so it is redefined into numbers -   0,1,2,3   - that can be directly trained into the model.<br>
This gives us the perfect trainable dataset.

In [3]:
df= pd.DataFrame(

    data=iris.data,
    columns=iris.feature_names
)
df["flower"]=iris.target
df.loc[45:55,:]

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),flower
45,4.8,3.0,1.4,0.3,0
46,5.1,3.8,1.6,0.2,0
47,4.6,3.2,1.4,0.2,0
48,5.3,3.7,1.5,0.2,0
49,5.0,3.3,1.4,0.2,0
50,7.0,3.2,4.7,1.4,1
51,6.4,3.2,4.5,1.5,1
52,6.9,3.1,4.9,1.5,1
53,5.5,2.3,4.0,1.3,1
54,6.5,2.8,4.6,1.5,1


### Train Test Split <hr>
So now, we will split x and y into training and testing data sets and retrain the Support Vector Machine.

#### Parameters<hr>
test_size: shows the percentage of data that should be put into the test set, rest goes into train set.<br> 
random_state: To set the random reading to a state, to be able to reproduce it later

#### NOTE <br>
We need to train first four columns of the acquired dataframe to predict the last column. Hence dataframe.loc[:,:] is used to get a slice of columns and rows inside a dataframe.

In [4]:
Xtrain,Xtest,Ytrain,Ytest = train_test_split(df.iloc[:,0:4],df["flower"],test_size=0.2,random_state=9)

### SVC <hr>
Now we create an svc variable, to train the model into and then provide it with the training dataset to fit.

In [5]:
svc=SVC(probability=True)
svc.fit(Xtrain,Ytrain)



SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=True, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

### Accuracy check
We use the sklearn library to inport a standard accuracy checker on the testing dataset


In [6]:
print(int(accuracy_score(svc.predict(Xtest),Ytest)*100),"%")

100 %


### The accuracy we got was a 100% as this is an ideal dataset. Usually we cannot get accuracy this high. :)
<hr>

### That is it for the basics of Support Vector Machine! 
### Please star and follow if you learnt anything useful! :)