<a href="https://colab.research.google.com/github/pgupta26dec/Machine-Learning-Algorithms-And-Concepts/blob/main/SVM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Support Vector Machines

## What is Support Vector Machine?

* Support Vector Machine is a supervised machine learning algorithm that can be used for both regression and classification.
* The basic idea behind SVM is that it is focused on finding a hyperplane(or a threshold) that best divides the dataset into classes.


## Terminologies for SVM



*   ***Hyperplane*** : A hyperplane is a decision plane which separates between a set of objects having different class memberships.
*   ***Margin*** : A margin is a gap between the two lines on the closest class points.
* ***Soft Margin*** : When we allow misclassifications, the distance between the threshold and the closest data points is called the soft margin.
* ***Support Vectors*** : Support vectors are the data points, which are closest to the hyperplane. These points will define the separating line better by calculating margins. These points are more relevant to the construction of the classifier.
* ***Maximum Margin Classifier*** : The classifier which uses a hyperplane such that the margin is highest. (in 1D data setting, when the hyperplane is exactly in the middle of the support vectors, the margin is highest)
* ***Soft Margin Classifier*** : When we use a soft margin to determine the location of the threshold, the classifier is called soft margin classifier aka support vector classifier.(the name comes from the fact that the observations on the edge and within the soft margin are called support vectors)



## Common questions for SVM



1.   Is Maximum Margin Classifier the best?<br>
No. It seems cool but it will be heavily affected in the case of outliers in the training data. 

2.   How to make a threshold that is not sensitive to outliers?<br>
To do this, we have to allow misclassifications in the training data. This allows us to build a classifier that is general and does not overfits the training data. Allowing misclassifications while building the classifier is an example of 'bias-variance tradeoff'

3. How to we know which soft margin to use, i.e. how many outliers within a soft margin to come up with the best classifier?<br>
Cross Validation to determine how many misclassifications and observations within a soft margin would make the best classifier.

4. How does support vector classifier deal with different dimensions of data?<br>
  *   When the data is one dimensional, the classifier is a point(a point is a flat affine 0 dimensional subspace).
  *   When the data is two dimensional, support vector classification is a line(a line is a flat affine 1 dimensional subspace).
  *   When the data is three dimensional, support vector classifier forms a plane instead of a line(a plane is a flat affine 2 dimensional subspace).
  *   When the data is four dimensional, support vector classifier is a hyperplane(a hyperplane is a flat affine subspace).
  * Technically, all flat affine subspaces are hyperplanes, but generally used for four dimesional data.

5. What are the different kernels in Support Vector Machine?
  *   Polynomial Kernel<br> The polynomial kernel has parameter d, which stands for degree of the polynomial.<br> When d = 1, the polynomial kernel computes the relationships between each pair of observations in 1-Dimension. These relationships are used to find svc<br> When d=2, the polunomial kernel computes 2d relationships between observations and so on..<br> **The best value of d can be found using cross validation.**
  *   Radial Basis Function Kernel<br>The radial kernel finds support vector classifiers in infinite dimensions.<br>The radial kernel behaves like a weighted nearest neighbor model, i.e., the  closest observations(the nearest neighbors) have a lot of influence on how the new observation is classified.








## Intuition behind Support Vector Machine?



1.   Start with data in a relatively low dimension.
2.   Move the data into a higher dimension.
3.   Find a support vector classifier that separates the data into two groups.

------



*   Support Vector Machines use Kernel functions to systematically find support vector classifiers in higher dimensions
*   Kernel functions only calculate the relationships between each pair of points as if they are in higher dimensions; ***they do not actually do the transformation***. This trick of calculating high-dimensional relationships without actually transforming the data to higher dimensions is called the kernel trick. This trick reduces the amount of computation by avoiding the math. It makes radial kernel possible.




## Hyperparameters for SVM



*   **Kernel**: <br>The main function of the kernel is to transform the given dataset input data into the required form. There are various types of functions such as linear, polynomial, and radial basis function (RBF). Polynomial and RBF are useful for non-linear hyperplane. Polynomial and RBF kernels compute the separation line in the higher dimension. In some of the applications, it is suggested to use a more complex kernel to separate the classes that are curved or nonlinear. This transformation can lead to more accurate classifiers.
*   **Regularization**: <br>Regularization parameter in python's Scikit-learn C parameter used to maintain regularization. Here C is the penalty parameter, which represents misclassification or error term. The misclassification or error term tells the SVM optimization how much error is bearable. This is how you can control the trade-off between decision boundary and misclassification term. A smaller value of C creates a small-margin hyperplane and a larger value of C creates a larger-margin hyperplane.
*  **Gamma**: <br>A lower value of Gamma will loosely fit the training dataset, whereas a higher value of gamma will exactly fit the training dataset, which causes over-fitting. In other words, you can say a low value of gamma considers only nearby points in calculating the separation line, while the a value of gamma considers all the data points in the calculation of the separation line.



## SVM classifier using Scikit Learn

In [1]:
from sklearn import datasets

#Load dataset
cancer = datasets.load_breast_cancer()

In [2]:
cancer.data.shape

(569, 30)

In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.3,random_state=109)

In [4]:
#Import svm model
from sklearn import svm

#Create a svm Classifier
clf = svm.SVC(kernel='linear') # Linear Kernel

#Train the model using the training sets
clf.fit(X_train, y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)

In [5]:
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics

# Model Accuracy: how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.9649122807017544


In [6]:
# Model Precision: what percentage of positive tuples are labeled as such?
print("Precision:",metrics.precision_score(y_test, y_pred))

# Model Recall: what percentage of positive tuples are labelled as such?
print("Recall:",metrics.recall_score(y_test, y_pred))

Precision: 0.9811320754716981
Recall: 0.9629629629629629


## SVM Regression using scikit learn