## SVM

SVM stands for Support Vector Machine.It is a supervised machine learning method generally used for both regression and classification, but it's been a very successful classification algorithm i.e. for binary classification. SVMs are based on the idea of finding a hyperplane that best divides a dataset into two classes.<br>

So how this algorithm works is that we plot each data item as a point in n-dimensional space (where n = Number of features ) with the value of each feature being a value of a particular coordinate. Then we perform classification by finding the hyperplane to differentiate the two classes.

<b> Hyper-Plane

<img src="data/Hyper-plane.JPG">



To understand the intuition behind hyperplane let's take an example where we have two classes Black and Blue plotted on the XY-Plane as per the features of the dataset (two in our case) and the perpendicular line separating these two regions is the <b>Hyper-Plane</b>. Now the question is how to find the best Hyper-Plane?

<img src="data/hyper-2.JPG">

In the picture above  we have three-hyperplanes separating the two classes hence we will choose the plane that separates the classes in a better manner and we are choosing the hyper-plane that has the maximum margin from the data points.<br>

<b>Margin :</b> The distance between the hyperplane and the nearest data point from either set is known as the margin.The goal is to choose a hyperplane with the greatest possible margin between the hyperplane and any point within the training set, giving a greater chance of new data being classified correctly.<br>

<b>Note:</b> If we have a data point which is an outlier and lies in the other class. SVM has a feature to ignore the outlier and find the best hyper-plane with maximum margin because we are plotting in the n-dimension and that's why it's robust to outliers.


How to solve a problem like given in the first picture below ?


<img src="data/Kernel.JPG">

We can't separate the two classes in XY- Plane. So what SVM does is it applies an transformation and add one more axis as we call it Z-axis and z = x^2 + y^2 .This additional feature is called <b>Kernel</b><br>

<b>Kernel</b> is a set of mathematical functions that take data as input and transorm it into the required form or we can say that these functions takes low dimensional input space and transform it to higher dimensional space and converts the nonseparable problem to seprarable problem. Different SVM algorithms use different types of kernel functions. These functions can be different types. For example linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid. With the help of kernel SVM can solve non-linear problems.

### Tunning Parameters: Kernel,Regularization,Gamma,Margin

<b>Regularization or C parameter </b> tells the SVM optimization how much how much you want to avoid misclassifying each training example.If C is large then the optimization will choose a smaller margin hyperplane and if that hyperplane does a better job of getting all the training points classified correctly and Coversely a very small value of C will cause the optimizer to look for a larger-margin separating hyperplane,even if the hyperplane missclassifies more points.


<img src="data/reg.JPG">

<b> Gamma </b> The gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. In other words, with low gamma, points far away from plausible seperation line are considered in calculation for the seperation line. Where as high gamma means the points close to plausible line are considered in calculation.
<img src="data/gama.JPG">


In [37]:
import numpy as np
import pandas as pd

In [38]:
data = pd.read_csv('Advertisements.csv')

In [39]:
data.head(5)

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0


In [40]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   User ID          400 non-null    int64 
 1   Gender           400 non-null    object
 2   Age              400 non-null    int64 
 3   EstimatedSalary  400 non-null    int64 
 4   Purchased        400 non-null    int64 
dtypes: int64(4), object(1)
memory usage: 15.8+ KB


In [41]:
X = data.iloc[:, [2,3]].values
y = data.iloc[:, 4].values

In [42]:
# splitting the data into train and test
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.25,random_state = 101)

In [43]:
# Feature scaling | Standardization | Z-score | Mean+Std > fit

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train) # fit will find the mean and std and use z-score formula on the dataset and
#transform the data  as per the mean and std

X_test = sc.transform(X_test) # we performed transform on test so that we can use same formula on 
#test data so that we can avoid over fitting

In [44]:
from sklearn.svm import SVC # support vector classifier

classifier = SVC(kernel= 'linear',random_state=129,verbose=True)
classifier.fit(X_train,y_train)

[LibSVM]

SVC(kernel='linear', random_state=129, verbose=True)

In [45]:
y_pred = classifier.predict(X_test)

In [46]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test,y_pred)

In [47]:
cm

array([[57,  8],
       [10, 25]], dtype=int64)

In [48]:
#----------------------------- end of basic model-----------------------------

https://chrisalbon.com/machine_learning/support_vector_machines/svc_parameters_using_rbf_kernel/

In [49]:
from sklearn.model_selection import GridSearchCV

In [50]:
param_grid = {'C':[0.001,0.01,0.1,1,10],
             'gamma':[0.001,0.01,0.1,1,10]}

In [51]:
grid_search = GridSearchCV(SVC(kernel='rbf'),param_grid,cv = 5)

In [52]:
grid_search.fit(X_train,y_train)

GridSearchCV(cv=5, estimator=SVC(),
             param_grid={'C': [0.001, 0.01, 0.1, 1, 10],
                         'gamma': [0.001, 0.01, 0.1, 1, 10]})

In [53]:
grid_search.best_params_

{'C': 1, 'gamma': 1}

In [54]:
classifier = SVC(kernel= 'rbf',random_state=129,verbose=True,C=1,gamma=1)
classifier.fit(X_train,y_train)

[LibSVM]

SVC(C=1, gamma=1, random_state=129, verbose=True)

In [55]:
y_pred = classifier.predict(X_test)

In [56]:
cm = confusion_matrix(y_test,y_pred)
cm

array([[58,  7],
       [ 1, 34]], dtype=int64)