# Lab 7: Supervised Learning - Support Vector Machine (SVM)


## Support Vector Machine (SVM)

**SVM** is a supervised machine learning algorithm used for classification and regression tasks. It works by finding a hyperplane that best separates the data into different classes. The goal of SVM is to maximize the margin between the closest data points (called support vectors) and the hyperplane.

## Decision Boundary

A **decision boundary** is the line or surface that separates different classes in a classification task. It represents the point at which the model changes its prediction from one class to another. In the case of SVM, the decision boundary is the hyperplane.

## Linear Separability

**Linear separability** means that a dataset can be separated into distinct classes by a straight line (in 2D), plane (in 3D), or hyperplane (in higher dimensions). If data points of different classes can be divided without error by such a line or plane, the dataset is said to be linearly separable.

## Non-Linearly Separable

**Non-linearly separable** data cannot be separated into different classes using a straight line or hyperplane. In such cases, more complex decision boundaries are needed, often achieved by transforming the data using kernels in SVM.

## Linear SVM

**Linear SVM** refers to an SVM model that uses a linear kernel to classify linearly separable data. It finds the best straight-line (or hyperplane) decision boundary that separates the classes.

## Advantages and Disadvantages of SVM

**Advantages**:

- Effective in high-dimensional spaces.
- Works well when the number of dimensions is greater than the number of samples.
- Memory efficient since only a subset of training data (support vectors) is used to define the hyperplane.
- Good at handling clear margin of separation between classes.

**Disadvantages**:

- Inefficient on large datasets, as training time increases significantly with more data.
- Less effective on noisy data or overlapping classes.
- Choosing the right kernel can be tricky and affects the performance.
- Does not directly provide probability estimates.


#### Equation of Linear SVM: The two-dimensional linearly separable data can be separated by a line.

The function of the line is $ y=ax+b $ and it can be writen as $ y - ax− b = 0 $

#### Given two vectors  $ W = \begin{bmatrix} -b \\-a \\1\end {bmatrix} $ and  $ x = \begin{bmatrix} 1 \\x \\y\end {bmatrix}$
####  we get: $ w^{T}⋅x= −b *(1)+(−a) * x + 1 * y $

This equation is derived from two-dimensional vectors. But in fact, it also works for any number of dimensions. This is the equation of the hyperplane.

## Classifier

Once we have the hyperplane, we can then use the hyperplane to make predictions. We define the hypothesis function h as:

$ h(x)=+1 \; if \;\; w^{T}.x+b≥0 \;\;and \;\; h(x)=-1 \; if \;\; w^{T}.x+b<0 $

The point above or on the hyperplane will be classified as class +1, and the point below the hyperplane will be classified as class -1.

#### The goal of the SVM learning algorithm is to find a hyperplane which could separate the data accurately. There might be many such hyperplanes. And we need to find the best one, which is often referred as the optimal hyperplane.


#### Using Linear SVM classify the apple_orange dataset and evaluate its perfomance on test data set


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix

In [2]:
df = pd.read_csv('dataset/applorng.csv')
df.head()

Unnamed: 0,weight of each of the fruit (grams),Size of each of\nthe fruit (cm),Type of Fruit
0,69,4.39,orange
1,69,4.21,orange
2,65,4.09,orange
3,72,5.85,apple
4,67,4.7,orange


In [3]:
X = df.drop('Type of Fruit', axis=1)
y = df['Type of Fruit']

In [4]:
training_set, test_set = train_test_split(df, test_size = 0.2, random_state = 10)

In [5]:
X_train = training_set.iloc[:,0:2].values  # weight and size of the train sample
Y_train = training_set.iloc[:,2].values  # Actual class label of the train sample
X_test = test_set.iloc[:,0:2].values  
Y_test = test_set.iloc[:,2].values 

In [6]:
print(X_test)

[[65.    4.09]
 [70.    4.22]
 [69.    4.11]
 [73.    5.78]
 [74.    5.22]
 [68.    4.47]
 [75.    5.11]
 [67.    4.25]]


In [7]:
print(Y_test)

['orange' 'orange' 'orange' 'apple' 'apple' 'orange' 'apple' 'orange']


In [8]:
test_set

Unnamed: 0,weight of each of the fruit (grams),Size of each of\nthe fruit (cm),Type of Fruit
2,65,4.09,orange
27,70,4.22,orange
35,69,4.11,orange
30,73,5.78,apple
14,74,5.22,apple
13,68,4.47,orange
7,75,5.11,apple
24,67,4.25,orange


In [9]:
# kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’} or callable, default=’rbf’
svclassifier = SVC(kernel='linear', random_state=10)
# Specifies the kernel type to be used in the algorithm. If none is given, ‘rbf’ will be used.
svclassifier.fit(X_train, Y_train)

## Predicting the classes for test set

In [10]:
Y_pred = svclassifier.predict(X_test) 
print(Y_pred)

['orange' 'orange' 'orange' 'apple' 'apple' 'orange' 'apple' 'orange']


## Attaching the predictions to test set for comparing


Comparing the actual classes and predictions

In [11]:
test_set["Predictions"] = Y_pred
test_set

Unnamed: 0,weight of each of the fruit (grams),Size of each of\nthe fruit (cm),Type of Fruit,Predictions
2,65,4.09,orange,orange
27,70,4.22,orange,orange
35,69,4.11,orange,orange
30,73,5.78,apple,apple
14,74,5.22,apple,apple
13,68,4.47,orange,orange
7,75,5.11,apple,apple
24,67,4.25,orange,orange


## Model Evaluation

Evaluating the Algorithm using Confusion matrix, precision, recall, and F1 measures

In [12]:
cm = confusion_matrix(Y_test,Y_pred)
cm

array([[3, 0],
       [0, 5]])

In [13]:
print(classification_report(Y_test,Y_pred))

              precision    recall  f1-score   support

       apple       1.00      1.00      1.00         3
      orange       1.00      1.00      1.00         5

    accuracy                           1.00         8
   macro avg       1.00      1.00      1.00         8
weighted avg       1.00      1.00      1.00         8

