![non-linear.png](attachment:non-linear.png)


https://www.kaggle.com/code/prashant111/svm-classifier-tutorial

 ## Support Vector Machines (SVM)
 Understanding Vectors

Decision Boundary 

What are support vectors & Hyperplace 

What is Support Vector Machine? 

Working of SVM 

Kernels and Types of Kernel 

Hard Margin & Soft Margin 

SVM for multi-class Classification 

[SVM](<../ml2/15 & 16 June-20241006T131223Z-001/01 & 02nd June/ML-15 SVM complete.pdf>)

https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/ 

https://www.kaggle.com/code/prashant111/svm-classifier-tutorial

https://www.analyticsvidhya.com/blog/2021/10/support-vector-machinessvm-a-complete-guide-for-beginners/  --->maths


![img](https://vitalflux.com/wp-content/uploads/2022/08/support-vector-machine-1-640x354.png)

### Support Vector Machines (SVM)

#### Understanding Vectors

In the context of machine learning, vectors are mathematical objects that have both a magnitude and a direction. In a multi-dimensional space, vectors represent data points. For example, in a 2D space, each point can be represented as a vector \((x, y)\).

#### Decision Boundary

The **decision boundary** is a line (in 2D), plane (in 3D), or hyperplane (in higher dimensions) that separates different classes in the feature space. The goal of a classifier, including SVM, is to find the optimal decision boundary that maximizes the margin between the classes.

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

#### What are Support Vectors & Hyperplane

- **Support Vectors**: These are the data points that are closest to the decision boundary. They are critical because they define the position and orientation of the hyperplane. If you remove any other data points (not support vectors), the decision boundary remains unchanged.
  
- **Hyperplane**: A hyperplane is a flat affine subspace of one dimension less than its ambient space. For example:
  - In 2D, it is a line.
  - In 3D, it is a plane.
  - In n-dimensional space, it is an n-1 dimensional hyperplane.

The hyperplane is defined by the equation:
\[ w \cdot x + b = 0 \]
where \( w \) is the weight vector, \( x \) is the feature vector, and \( b \) is the bias term.

#### What is Support Vector Machine?

A **Support Vector Machine (SVM)** is a supervised machine learning algorithm used for classification and regression tasks. It aims to find the hyperplane that best separates data points of different classes while maximizing the margin between them. 

#### Working of SVM

1. **Data Representation**: Data points are represented as vectors in a high-dimensional space.
2. **Finding the Optimal Hyperplane**: SVM searches for the hyperplane that maximizes the margin between the closest data points of each class (the support vectors).
3. **Optimization Problem**: The optimization problem can be expressed as:
   - Maximize the margin: \( \frac{2}{\|w\|} \)
   - Subject to the constraints: \( y_i (w \cdot x_i + b) \geq 1 \)
   - Where \( y_i \) is the class label (+1 or -1), \( x_i \) are the training samples.

#### Kernels and Types of Kernel

SVM can be extended to non-linear classification using **kernels**. A kernel function transforms the input data into a higher-dimensional space where a linear hyperplane can be used for separation. Common types of kernels include:
![](https://i0.wp.com/spotintelligence.com/wp-content/uploads/2024/05/data-transformed-svm.jpg?resize=1080%2C608&ssl=1)
1. **Linear Kernel**: Suitable for linearly separable data.
   \[ K(x_i, x_j) = x_i \cdot x_j \]

2. **Polynomial Kernel**: Captures interactions between features.
   \[ K(x_i, x_j) = (x_i \cdot x_j + c)^d \]
   where \( c \) is a constant and \( d \) is the degree.

3. **Radial Basis Function (RBF) Kernel**: Effective in many cases, especially with non-linear data.
   \[ K(x_i, x_j) = e^{-\gamma \|x_i - x_j\|^2} \]
   where \( \gamma \) is a parameter that defines the spread of the kernel.

4. **Sigmoid Kernel**: Similar to a neural network activation function.
   \[ K(x_i, x_j) = \tanh(\alpha (x_i \cdot x_j) + c) \]

![image-4.png](attachment:image-4.png)

![image-3.png](attachment:image-3.png)

![image-5.png](attachment:image-5.png)


## Hard Margin & Soft Margin

![image-6.png](attachment:image-6.png)

- **Hard Margin SVM**: Assumes that the data is perfectly linearly separable. It tries to find a hyperplane with the maximum margin without any misclassified points. This can lead to overfitting when the data has noise or is not perfectly separable.

- **Soft Margin SVM**: Allows for some misclassifications in order to achieve a better generalization. It introduces a penalty for misclassifications, controlled by the parameter \( C \):
  - A larger \( C \) places more emphasis on correctly classifying all training samples.
  - A smaller \( C \) allows some misclassifications in favor of a larger margin.

C is typically used as a regularization parameter to control overfitting, allowing the algorithm to make more accurate predictions on new data points. The strength of the regularization is inversely proportional to C.

With the variable C, we can penalize for misclassification.

Large values of C correspond to large error penalties while we are less strict about misclassification errors if we choose smaller values for C.

We can then use the parameter C to control the width of the margin and therefore tune the bias-variance trade-off as shown in the picture below:
![](https://www.bogotobogo.com/python/scikit-learn/images/svm/bias-variance-trade-off-svm.png)

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html 

## SVM for Multi-Class Classification

SVM is inherently a binary classifier, but it can be extended to multi-class classification using strategies like:

1. **One-vs-One (OvO)**: Trains a binary classifier for every pair of classes. For \( k \) classes, this results in \( \frac{k(k-1)}{2} \) classifiers.
  
2. **One-vs-Rest (OvR)**: Trains a single binary classifier for each class, distinguishing that class from all others. For \( k \) classes, this results in \( k \) classifiers.

#### When should we use SVM?
1. When there is not too much training data
   - Training SVMs is computationally heavy
   - A million instances could be the upper bound of training SVMs
2. When the data has a geometric interpretation
   - Computer vision problems
3. When we need high precision
- Note: Parameter tuning needed
### Conclusion

Support Vector Machines are powerful tools for both linear and non-linear classification tasks. Understanding the concepts of support vectors, hyperplanes, kernels, and margin types is crucial for effectively applying SVM in real-world problems.


https://www.bogotobogo.com/python/scikit-learn/scikit_machine_learning_Support_Vector_Machines_SVM.php

In [None]:
!pip install scikit-learn

In [None]:
# Importing necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC,SVR
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Use first two features (sepal length and sepal width)
y = iris.target

y[y<2]

In [None]:
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Use first two features (sepal length and sepal width)
y = iris.target

In [None]:
y

In [None]:

# Filter data to keep only TWO classes (e.g., classes 0 and 1)
mask = y < 2  # Binary classification (exclude class 2)
X = X[mask]
y = y[mask]

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [None]:
# Initialize SVM classifier (for binary classification)
svm_classifier = SVC(kernel='linear', C=1)

In [None]:
# Train the classifier
svm_classifier.fit(X_train, y_train)

In [None]:
# Make predictions on the test data
y_pred = svm_classifier.predict(X_test)

In [None]:
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

In [None]:
# Visualize decision boundaries (for binary classes)
def plot_decision_boundary(X, y, classifier):
    h = .02  # Step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plot decision boundary and margins
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)

    # Plot training points
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, edgecolors='k')
    plt.xlabel('Sepal length')
    plt.ylabel('Sepal width')
    plt.title('SVM Decision Boundary (Binary Classification)')
    plt.show()

# Plot decision boundary
plot_decision_boundary(X_train, y_train, svm_classifier)

In [None]:
# Visualize decision boundaries (for binary classes)
def plot_decision_boundary(X, y, classifier):
    h = .02  # Step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    # The purpose of meshgrid is to create a rectangular grid out of an array of x values and an array of y values. 
    # https://stackoverflow.com/questions/36013063/what-is-the-purpose-of-meshgrid-in-numpy
    # Predict the class for each point in the grid
    Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plot decision boundary and margins
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
    
    # The contourf() function in pyplot module of matplotlib library is used to plot contours. 
    # But contourf draw filled contours, while contourf draws contour lines. 
    # https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.contourf.html
    # Contour is the outline or shape of something
    # contourf( Z ) creates a filled contour plot containing the isolines of matrix Z , 
    # where Z contains height values on the x-y plane

    # Plot training points
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, edgecolors='k')
    plt.xlabel('Sepal length')
    plt.ylabel('Sepal width')
    plt.title('SVM Decision Boundary (Binary Classification)')
    plt.show()

# Plot decision boundary
plot_decision_boundary(X_train, y_train, svm_classifier)

In [None]:
# cancer = datasets.load_breast_cancer() another example for practice
# https://vitalflux.com/classification-model-svm-classifier-python-example/

To implement Support Vector Machines (SVM) for multi-class classification, we can use the Iris dataset, which contains three classes of iris flowers. Here’s how to set up the model using `scikit-learn` and evaluate its performance.

### SVM for Multi-Class Classification Example

In [None]:
iris = datasets.load_iris()
iris.target_names

In [None]:
y = iris.target
y

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Use only the first two features for visualization
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a Support Vector Classifier with a linear kernel
svm_classifier = SVC(kernel='rbf', random_state=42, C=1)

# Train the model
svm_classifier.fit(X_train, y_train)

# Make predictions
y_pred = svm_classifier.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"SVM Accuracy: {accuracy:.2f}")
print("Classification Report:\n", classification_report(y_test, y_pred))

# Plot the decision boundary
def plot_decision_boundary(model, X, y):
    # Create a grid to plot the decision boundary
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                         np.linspace(y_min, y_max, 100))

    # Predict the class for each point in the grid
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plotting
    plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o')
    plt.xlabel(iris.feature_names[0])
    plt.ylabel(iris.feature_names[1])
    plt.title("SVM Decision Boundary with Multi-Class Classification")
    plt.show()

# Call the function to plot the decision boundary
plot_decision_boundary(svm_classifier, X, y)


### Explanation of the Code

1. **Import Libraries**: Necessary libraries are imported for data manipulation, model training, and visualization.

2. **Load the Dataset**: The Iris dataset is loaded, and only the first two features are selected for easy visualization.

3. **Split the Data**: The dataset is split into training (70%) and testing (30%) sets using `train_test_split`.

4. **Create the SVM Classifier**: An SVM classifier is created using a linear kernel, which is suitable for multi-class classification.

5. **Train the Model**: The model is trained on the training data.

6. **Make Predictions**: The model makes predictions on the test set.

7. **Evaluate Performance**: The accuracy and a classification report (including precision, recall, and F1 score) are printed for all classes.

8. **Plot Decision Boundary**: A function is defined to visualize the decision boundary created by the SVM model. It creates a mesh grid of points, predicts the class for each point, and plots the results along with the actual data points.

### Conclusion

This example illustrates how to implement SVM for multi-class classification using `scikit-learn`. You can experiment with different kernels, such as the RBF kernel, and adjust hyperparameters to see how they affect model performance. This approach is applicable to many datasets beyond the Iris dataset!

# CrossValidations

In [None]:
from sklearn.model_selection import GridSearchCV
params = {'C': [1, 2, 3, 4, 5],
         'gamma': [0.2, 1, 0.4, 0.001, 0.003],
         'kernel': ['linear',"rbf"]}

params


In [None]:
grid = GridSearchCV(svm_classifier, param_grid=params, cv=5, verbose=3)

In [None]:
grid

In [None]:
grid.fit(X_train, y_train)

In [None]:
grid.best_params_

In [None]:
grid.best_score_

============================
# Support Vector Regression (SVR)
Here’s how to implement Support Vector Regression (SVR) using the `scikit-learn` library in Python. This example will demonstrate using SVR to predict values from a simple synthetic dataset.


In [None]:
# https://www.analyticsvidhya.com/blog/2020/03/support-vector-regression-tutorial-for-machine-learning

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR

# Creating the DataFrame
data = {
    "User ID": [15624510, 15810944, 15668575, 15603246, 15804002],
    "Gender": ["Male", "Male", "Female", "Female", "Male"],
    "Age": [19, 35, 26, 27, 19],
    "EstimatedSalary": [19000, 80000, 93000, 57000, 76000],
    "Purchased": [0, 0, 0, 0, 0]
}

df = pd.DataFrame(data)

# Extracting features and target variable
X = df.iloc[:, 2:3].values  # Age as feature
y = df.iloc[:, 3].values.reshape(-1, 1)  # EstimatedSalary as target (reshaped for scaling)

# Feature Scaling
sc_X = StandardScaler()
sc_y = StandardScaler()

X_scaled = sc_X.fit_transform(X)  # Scale X (Age)
y_scaled = sc_y.fit_transform(y).flatten()  # Scale y (Salary) and flatten to 1D array

# Training SVR model
regressor = SVR(kernel='rbf')
regressor.fit(X_scaled, y_scaled)

# Predicting for a specific age (e.g., 26)
scaled_input = sc_X.transform([[27]])  # Scale input
y_pred_scaled = regressor.predict(scaled_input)  # Predict using scaled input
y_pred_actual = sc_y.inverse_transform(y_pred_scaled.reshape(-1, 1))  # Convert back to original scale

print("Predicted Salary for Age 26:", y_pred_actual[0][0])


## Example 2

In [None]:
import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.svm import SVR 

# generate synthetic data 
X = np.sort(5 * np.random.rand(40, 1), 
			axis=0) 
y = np.sin(X).ravel() 

# add some noise to the data 
y[::5] += 3 * (0.5 - np.random.rand(8)) 

# create an SVR model with a linear kernel 
svr = SVR(kernel='linear')  
svr.fit(X, y) 
y_pred = svr.predict(X) 

# plotting the predicted vs true values
plt.scatter(X, y, color='darkorange', 
			label='data') 
plt.plot(X, y_pred, color='cornflowerblue', 
		label='prediction') 
plt.legend() 
plt.show() 


#### Fitting SVR on Sine Curve Data Using RBF Kernels

Subsequently, we'll employ a Support Vector Regression (SVR) model utilizing an RBF (Radial Basis Function) kernel. This strategy is expected to yield the most optimal results, as the RBF kernel is renowned for effectively introducing non-linearity into our model, potentially enhancing its performance significantly.

In [None]:
import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.svm import SVR 

# generate synthetic data 
X = np.sort(5 * np.random.rand(40, 1), 
			axis=0) 
y = np.sin(X).ravel() 

# add some noise to the data 
y[::5] += 3 * (0.5 - np.random.rand(8)) 

# create an SVR model with a rbf kernel 
svr = SVR(kernel='rbf') 
svr.fit(X, y) 
y_pred = svr.predict(X) 

# plot the predicted values against the true values 
plt.scatter(X, y, color='darkorange', 
			label='data') 
plt.plot(X, y_pred, color='cornflowerblue', 
		label='prediction') 
plt.legend() 
plt.show() 


https://towardsdatascience.com/machine-learning-basics-support-vector-regression-660306ac5226

https://www.scaler.com/topics/support-vector-regression/


### Explanation of the Code

1. **Import Libraries**: We import necessary libraries for data manipulation, model training, and visualization.

2. **Create Synthetic Dataset**: A synthetic dataset is created using a sine function with some added noise. This simulates a regression problem.

3. **Split the Data**: The dataset is split into training (70%) and testing (30%) sets using `train_test_split`.

4. **Create the SVR Model**: An SVR model is instantiated with a radial basis function (RBF) kernel. You can also experiment with linear or polynomial kernels.

5. **Train the Model**: The model is trained on the training data.

6. **Make Predictions**: Predictions are made on the test set.

7. **Plotting the Results**: The actual data points and the predictions made by the SVR model are plotted for visualization.

### Conclusion

This example demonstrates how to implement Support Vector Regression using `scikit-learn`. You can experiment with different kernels, hyperparameters (like `C` and `gamma`), and datasets to better understand how SVR works and its applications in regression tasks!

### SVM Kernels

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

https://medium.com/@abhishekjainindore24/svm-kernels-and-its-type-dfc3d5f2dcd8

![image.png](attachment:image.png)

https://ankitnitjsr13.medium.com/math-behind-support-vector-machine-svm-5e7376d0ee4d

https://ankitnitjsr13.medium.com/math-behind-svm-kernel-trick-5a82aa04ab04

### Real-World Applications SVM Kernels
SVM kernels are used in many real-world areas. In finance, simple linear kernels help with credit scoring and fraud detection because they're easy to understand and fast. In biology, more complex non-linear kernels like RBF help predict protein structures and analyze gene data. For images, polynomial kernels are used to figure out what objects are in pictures by looking at their details. In text tasks like figuring out if a message is positive or negative, SVMs with different kernels handle the job. Also, in healthcare, different types of Kernel in SVM help diagnose diseases and predict outcomes by finding patterns in medical data.

#### Questions

Q. Which kernel to use in SVR? 

Ans. In SVR, you pick the kernel based on how complicated the data is. Also, how the input and output variables relate. You can choose from common options like linear, polynomial, and RBF. As well as sigmoid, but you might need to try a few to see which one works best for your regression problem.

Q. Why are kernel functions used?

Ans. Kernel functions in SVMs help change data to make it easier for the model to understand. This lets the model find better boundaries between different groups of data and deal with complicated relationships between points. As well as helping it make accurate predictions in all sorts of data.

Q. What are the most popular SVM kernels?

Ans. The most common SVM kernels are linear, good for straight-line data, polynomial, and useful for curves. Radial basis function (RBF), is great for complex patterns. Also, sigmoid can handle different kinds of data changes.