# SUPPORT VECTOR MACHINE

## CONTENT
<br>

1. [What is Support Vector Machine(SVM)](#1)
1. [SVM Parameters](#2)
1. [Import Libraries and Read Data](#3)
1. [Visualize Data](#4)
1. [Create and Evaluate Model](#5)

## <a id=1></a>What is Support Vector Machine (SVM)

[](http://)SVM or Support Vector Machine algorithm tries to draw a hyperplane between two classes to seperate them. There can be many hyperplane but SVM's purpose is finding maximum margin or finding maximum distance between data points from each classes. This data points are nearest data points to hyperlane from each classes. Let's look image below it explains better.
<br>
<img src="https://lh6.googleusercontent.com/r0dB9ntNr6FWOOLf6GqVUF72K4iBV_oR7IgAl3RO61WpDnIpgkwNhmjxjtMwNIN-23MMlJAnTFe0a2ZqXxMNF0WursGwV5bHaqRMmiCyEyH21k4e6Tj5DFBr2ck4DMgS-FkNz5fl" width=400 />

Also SVM uses a technique called **kernel trick** to transform the data. If datapoints have low dimensional space and it wouldn't be able to draw a hyperplane it tries to add a new dimension to data.
<br>
<img src="https://qph.fs.quoracdn.net/main-qimg-8a4a30421342fedb9bdda38fbd2529a8" />
Now I'll explain some parameters in SVM and we'll try to use SVM to classify voices according to features.

## <a id=2></a>SVM - Parameters

**C Parameter**
<br>
C parameter controls trade-off between training points.
- Small C: Large margin
- Large C: Small margin, it has potential to overfit.
<br>
If you ask which is better to use, answer is 'it depends on your data'. It would be better if you try different C values to find best score.
<br>
<img src="https://www.learnopencv.com/wp-content/uploads/2018/07/svm-parameter-c-example.png" />

**Kernel**
<br>
You can choose the kernel type used by SVM. It can be ‘linear’, ‘rbf’, ‘poly’, ‘sigmoid’, ‘precomputed’.
And yes, answer is still same 'it depends your data'.
<br>
<img src="http://dataaspirant.com/wp-content/uploads/2017/01/Iris_Petal_Svm.png" />

**Gamma Parameter**
<br>
It is kernel coefficient. You use it if you choose 'rbf', 'poly' or 'sigmoid' as a kernel.

Also there is 'degree' parameters. It is used for 'poly' kernel to define degree of polynomial kernel. It is 3 by default.

## <a id=3></a> Import Libraries and Read Data

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import os
print(os.listdir("../input"))


In [None]:
# Read Data
df = pd.read_csv("../input/voice.csv")

In [None]:
# First 5 Rows of Data
df.head()

In [None]:
df.columns

In [None]:
df.info()

## <a id=4></a>Visualize Data

In [None]:
sns.pairplot(df, hue='label', vars=['skew', 'kurt',
       'sp.ent', 'sfm', 'mode','meanfun',
       'meandom','dfrange'])
plt.show()

In [None]:
sns.countplot(df.label)
plt.show()

In [None]:
sns.scatterplot(x = 'skew', y = 'kurt', hue = 'label', data = df)
plt.show()

In [None]:
plt.figure(figsize=(20,10))
sns.heatmap(df.corr(), annot=True, linewidth=.5, fmt='.2f', linecolor = 'grey')
plt.show()

## <a id=5></a>Create and Evaluate Model

In [None]:
X = df.drop(['label'],axis=1)
y = df.label

We'll use 70% of our data to train our model and we'll test it with 30% of the data.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42)

In [None]:
# Import SVM
from sklearn.svm import SVC

svm = SVC()
svm.fit(X_train, y_train)

In [None]:
y_pred = svm.predict(X_test)

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

In [None]:
cm = confusion_matrix(y_test, y_pred)

In [None]:
sns.heatmap(cm, annot=True, cmap="Paired_r", linewidth=2, linecolor='w', fmt='.0f')
plt.xlabel('Predicted Value')
plt.ylabel('True Value')
plt.show()

In [None]:
print("Test Accuracy: {:.2f}%".format(svm.score(X_test, y_test)*100))

Our accuracy is not well and as you can see confusion matrix above our prediction is not good. So let's try to improve our model. At first we'll normalize our data after that we'll apply some parameter optimizations.

In [None]:
# Normalization
X = (X - np.min(X)) / (np.max(X) - np.min(X)).values

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)

Let's fit our model.

In [None]:
svm.fit(X_train, y_train)

In [None]:
y_pred = svm.predict(X_test)

In [None]:
cm = confusion_matrix(y_test,y_pred)

In [None]:
sns.heatmap(cm, annot=True, fmt='.0f', cmap='brg_r')
plt.xlabel('Predicted Value')
plt.ylabel('True Value')
plt.show()

In [None]:
print("Test Accuracy: {:.2f}%".format(svm.score(X_test, y_test)*100))

Wow! Our score increase to 97.27% and all we did is normalize the data! We can see importance of normalizaton in here. Let's try to find best parameters for our model. 

In [None]:
param_grid = {'C':[0.1, 1, 10, 100], 'gamma':[1, 0.1, 0.01, 0.001], 'kernel' : ['rbf', 'poly', 'sigmoid', 'linear']}

In [None]:
from sklearn.model_selection import GridSearchCV

In [None]:
grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=4)

In [None]:
grid.fit(X_train, y_train)

In [None]:
print("Best Parameters: ",grid.best_params_)

In [None]:
grid_pred = grid.predict(X_test)

In [None]:
cmNew = confusion_matrix(y_test, grid_pred)

In [None]:
sns.heatmap(cmNew, annot=True, fmt='.0f', cmap='gray_r')
plt.xlabel('Predicted Value')
plt.ylabel('True Value')
plt.show()

In [None]:
print("Test Accuracy: {:.2f}%".format(grid.score(X_test, y_test)*100))

In [None]:
print(classification_report(y_test, grid_pred))

Our test score incresed a little bit again and we reach **97.79%** of accuracy.

**Thank you. If you like it please upvote and I will be happy to hear your comments and feedbacks**