### **SVM**
svm is a binary classification algorithm whose goal is to find the best possible boundary(hyperplane) that separates two classes.

##### **hyperplane**
a hyperplane is the decision boundary
- 2D -> hyperplane = a line
- 3D -> hyperplane = a plane
- n-dim -> hyperplane exists mathematically but cant be visualised

#### **support vectors**
these are the data pts that are closest to the hyperplane

they define:
- the boundary
- determine where the hyperplane sits
- if they move even slightly -> the hyperplane moves

hence these are rlly critical

#### **what does SVM optimize?**
svm wants the hyperplane w/ maximum margin, margin is the dist btw support vectors and hyperplane
- **why?**
    - a larger margin = more robust to future data
    - a smaller margin = sensitive, overfitting

A large margin helps because:
- it reduces the model’s sensitivity to tiny shifts or noise in the data.
- it lowers overfitting — the classifier isn't “hugging” specific points.
- it generalizes better to new, unseen samples, because the boundary is placed in a more stable region of the feature space.

*"SVM prefers the hyperplane with the largest possible margin because it leads to a more stable, robust, and generalizable classifier."*

#### **linearly separable vs non-linear data**
sometimes data cannot be separated w/ a straight line
##### **how does SVM handle this?**
##### using kernels (kernel trick)
- map data from low dimension -> higher dimension
- in higher dimension the data becomes separable
- svm then finds a linear hyperplane there
- the model still behaves non-linear in original space

#### **original (x,y)**
#### **new feature z = x² + y²**

##### **1. linear kernel**
- decision boundary is a straight line
- works when data is linearly separable
- fastest, simple
- rarely used in real-world, if data is complex

##### **2. polynomial kernel**
- expands features to polynomials
- can capture curved boundaries
- but computationally expensive

##### **3. rbf/gaussian kernel**
- projects points into infinite-dimensional feature space
- great for irregular boundaries
- often highest accuracy
- but can overfit if gamma is too high

##### **hard margin vs soft margin**
##### **hard margin**
- no misclassification is allowed
- only works when is perfectly separable
- very sensitive to outliers -> rarely used
##### **soft margin**
- allows some points inside the margin **using slack variables *ξi***
$$
\min_{\mathbf{w},\,b,\,\boldsymbol{\xi}} \quad \frac{1}{2}\|\mathbf{w}\|^{2} + C \sum_{i=1}^{n} \xi_i
$$

$$
\text{subject to:} \quad 
y_i(\mathbf{w}^\top \mathbf{x}_i + b) \ge 1 - \xi_i,
\qquad 
\xi_i \ge 0,\quad i = 1,2,\dots,n
$$

##### **hyperparameters in SVM**
##### -> **1. *C* parameter**
controls tradeoff between:
- maximising margin
- minimising misclassification
- **HIGH C** -> Less Margin (marginal dist decreases), fewer misclassified points(errors decrease) -> risk of overfitting
- **LOW C** -> Bigger Margin (marginal dist increases), more misclassification allowed (errors increase) -> generalizes better - might underfit if c is too small


**using slack variables *ξi***
- if *correctly classified*: $$slack = 0$$
- if *correctly classified but in margin*: $$0 < slack < 1$$
- if *misclassified*: $$slack > 1$$

##### -> **2. *Gamma* parameter (RBF only)**
controls the influence of each training point
- **HIGH *GAMMA*** ->  fewer data points will influencce the decision boundary, hence the boundary becomes non-linear leading to overfiting

- **LOW *GAMMA*** -> more data points will influence the decision boundary, hence the boundary is more generic/smoother -> underfits

#### **implementation flow - code**

In [None]:
from sklearn.svm import SVC

svc = SVC()
modelSVC = svc.fit(x_train, y_train)
prediction = modelSVC.predict(x_test)