In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# <center>Supervised Machine Learning Algorithms</center>
---

### Q1: What is Support Vector Machine?  Related To: SVM
Support vector machines (SVMs) are a set of supervised learning methods used for **classification, regression** and **outliers detection**. They can handle both linear and non-linear problems by using different kernel functions.

The trick of calculating the high dimensional relationships between every pair of points without actually transforming the data to the higher dimensions is known as **`The Kernel Trick:`** and therefore, reduces the computaion for SVM by avoiding the math for transforming the data from low dimensions to high dimensions.

**Keywords:**
* SVC: **`Support Vector Classification`**
* NuSVC: **`Nu-Support Vector Classification`** ---> Similar to SVC but uses a parameter to control the number of support vectors.
* LinearSVC: **`Linear Support Vector Classification`**

## Kernel Functions
Kernel functions are mathematical functions that are used to transform the input data into a higher dimensional space where it is easier to separate the classes. Different kernel functions can be used for different types of problems. Some common kernel functions are:

1. **Linear kernel:** This is the simplest kernel function that computes the dot product of two vectors. It is suitable for linearly separable problems.
2. **Polynomial kernel:** This is a more general kernel function that computes a polynomial function of the dot product of two vectors. It can capture nonlinear relationships between features.
3. **Radial basis function (RBF) kernel:** This is a popular kernel function that computes an exponential function of the distance between two vectors. It can handle complex and nonlinear problems.
4. **Sigmoid kernel:** This is a kernel function that computes a sigmoid function of the dot product of two vectors. It can be used for neural networks and logistic regression.

There are also other types of kernel functions such as **Laplacian, Gaussian, cosine similarity**, etc. The choice of kernel function depends on the data and the problem domain.

## Advantages of Support Vector Machines
The **advantages** of support vector machines are:

1. Effective in high dimensional(features) spaces.

2. Still effective in cases where number of dimensions(features) is greater than the number of samples.

3. **Versatile:** different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.

4. Support Vector Machine uses a subset of training data(called **support vectors**) in the decision function *because support vectors are the ones that define the position and orientation of the hyperplane that separates the classes* also they are the closest ones to the hyperplane. The other training data do not affect the decision function, so they are not used. This makes SVMs memory efficient, as they only need to store the support vectors and not all the training data.

5. SVM can classify Outliers and they can handle overlapping classifications because they allow misclassifications.


## Disadvantages of support vector machines

The **disadvantages** of support vector machines include:

1. If the number of features is much greater than the number of samples, avoid over-fitting via choosing Kernel functions and regularization term carefully. Kernel functions are used to map the data into a higher dimensional space, where they can be more easily separated by a hyperplane.
    However, if the kernel function is too complex or flexible, it may fit the training data too closely and create a decision boundary that is too irregular or wiggly. This can lead to poor performance on new data that may not follow the same patterns as the training data. Regularization term is a parameter that controls the trade-off between fitting the training data well and keeping the model simple and smooth. A larger regularization term means more penalty for complex models, and a smaller regularization term means less penalty. If the regularization term is too small, the model may over-fit the training data and have high variance. If the regularization term is too large, the model may under-fit the training data and have high bias. Therefore, we need to find a balance between complexity and simplicity that minimizes the error on both training and test data

2. SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation (see Scores and probabilities, below).

## Terminologies in SVM
![image.png](attachment:0b896002-9519-4400-8d5b-f372044f0893.png)

* **`Margin:`** is the shortest distance between the observation and threshold. The margin is maximum when the threshold is placed exactly at the middle of two observations from both the class then it is called **`Maximal Marginal Classifier`**(they are extremely sensitive to outliers) and the plane is called **`Maximum Margin Hyperplane`**
* **`Soft Margin:`** is the distance between the observations and the threshold when we allow misclassification. **Cross Validation** determines which Soft Margin is the optimal one.
* When we use Soft Margin to determine the location of the threshold then we are using **`Soft Margin Classifier`** aka **`Support Vector Classifier`** to classify observations and the name comes from the fact that the observations that are near the edge or within the Soft Margin are called **`Support Vectors`**

![image.png](attachment:3f445131-594b-4765-9fce-a62d54a55bbc.png)
