## Ans : 1

The mathematical formulation for a linear Support Vector Machine (SVM) involves finding the optimal hyperplane that maximally separates the data points of different classes. The linear SVM aims to find a decision boundary in the form of a hyperplane in a higher-dimensional space. Here's the mathematical formulation:

Given a training dataset with labeled samples:
- Input features: X = {x₁, x₂, ..., xn} (each xᵢ is a vector representing a feature)
- Corresponding labels: Y = {y₁, y₂, ..., yn} (each yᵢ is a binary class label)

The goal is to find a hyperplane defined by a weight vector w and a bias term b that separates the data points into two classes with the maximum margin.

The decision function for a linear SVM is:
f(x) = sign(w^T x + b)

The sign function returns +1 for one class and -1 for the other class. The SVM seeks to find the optimal values for w and b.

The optimization problem of finding the optimal hyperplane can be formulated as:
minimize: (1/2) * ||w||² + C * ∑ξ
subject to: yᵢ(w^T xᵢ + b) ≥ 1 - ξᵢ for all data points (xᵢ, yᵢ)
            ξᵢ ≥ 0 for all data points (xᵢ, yᵢ)

In the objective function, the term (1/2) * ||w||² represents the regularization term to control the margin and prevent overfitting. The constant C is a hyperparameter that determines the trade-off between maximizing the margin and minimizing the classification error. The ∑ξ term is the sum of slack variables ξᵢ, which allow for misclassifications or samples that lie within the margin.

The constraints ensure that the samples are correctly classified, and the margin violations are minimized. The goal is to minimize the objective function while satisfying the constraints.

The optimization problem can be solved using various optimization techniques, such as quadratic programming or gradient descent, to find the optimal values for w and b that maximize the margin and minimize the classification error.

Note: This formulation applies to the case of linearly separable data. For non-linearly separable data, techniques such as the kernel trick can be used to map the data into a higher-dimensional space where a linear SVM can find a separating hyperplane.

## Ans : 2

The objective function of a linear Support Vector Machine (SVM) is to minimize the hinge loss while also regularizing the weights. The hinge loss represents the classification error or margin violations of the SVM, while regularization helps control the complexity of the model. The objective function is defined as follows:

minimize: (1/2) * ||w||² + C * ∑ξ

In this objective function:
- ||w||² represents the squared Euclidean norm of the weight vector w. This term is the regularization term and helps control the complexity of the model. Minimizing ||w||² encourages a smaller margin and a more complex decision boundary.
- C is a hyperparameter that determines the trade-off between maximizing the margin and minimizing the classification error. It adjusts the importance of the regularization term relative to the hinge loss term. A smaller C emphasizes a larger margin at the cost of potential misclassifications, while a larger C focuses more on minimizing the classification error.
- ∑ξ represents the sum of slack variables ξᵢ, which allow for misclassifications or samples that lie within the margin. The term ∑ξ measures the total hinge loss or classification error of the SVM. The objective is to minimize this term to reduce misclassifications.

The objective function combines the regularization term (||w||²) and the hinge loss term (C * ∑ξ). By minimizing this objective function, the SVM seeks to find the optimal values for the weight vector w that maximize the margin and minimize the classification error. The hyperparameter C controls the balance between these two objectives.

The optimization problem aims to find the values of w that satisfy the objective function while also satisfying the constraints of correctly classifying the training samples. Various optimization techniques, such as quadratic programming or gradient descent, can be employed to solve this optimization problem and find the optimal values of w.

## Ans : 3

The kernel trick is a technique used in Support Vector Machines (SVMs) to handle non-linearly separable data by implicitly mapping the data points into a higher-dimensional feature space. It allows SVMs to find a linear decision boundary in this higher-dimensional space, effectively solving non-linear classification problems. The kernel trick avoids the need to explicitly compute the coordinates of the data points in the higher-dimensional space, which can be computationally expensive.

In SVMs, the kernel trick is applied by replacing the dot product between input feature vectors with a kernel function. The kernel function calculates the similarity or inner product of the feature vectors in the original input space or implicitly in a higher-dimensional feature space.

The general form of the decision function using the kernel trick is:
f(x) = sign(∑ αᵢ yᵢ K(x, xᵢ) + b)

In this equation:
- αᵢ is the Lagrange multiplier associated with each training sample.
- yᵢ is the class label of the training sample.
- K(x, xᵢ) is the kernel function that measures the similarity or inner product between the feature vectors x and xᵢ.
- b is the bias term.

The kernel function allows the SVM to implicitly project the input data into a higher-dimensional space where a linear decision boundary can be found, even when the original data is not linearly separable.

Commonly used kernel functions include:
- Linear Kernel: K(x, xᵢ) = x^T xᵢ
- Polynomial Kernel: K(x, xᵢ) = (γ(x^T xᵢ) + r)^d
- Gaussian (RBF) Kernel: K(x, xᵢ) = exp(-γ||x - xᵢ||²)
- Sigmoid Kernel: K(x, xᵢ) = tanh(γ(x^T xᵢ) + r)

By applying the kernel trick, SVMs can efficiently solve non-linear classification problems by implicitly mapping the data points into a higher-dimensional feature space. This technique allows for powerful and flexible classification while avoiding the computational cost of explicitly operating in the higher-dimensional space.

## Ans : 4

In Support Vector Machines (SVMs), support vectors play a crucial role in defining the decision boundary and determining the classification of data points. Support vectors are the data points that lie closest to the decision boundary, also known as the margin. These points have the most influence on the construction of the decision boundary and the overall performance of the SVM.

The role of support vectors in SVMs can be explained with examples:

1. Linearly separable case:
   Suppose we have two classes, represented by red and blue points, that are linearly separable by a straight line. The support vectors in this case are the data points that lie on or closest to the margin or the decision boundary. These support vectors define the position and orientation of the decision boundary.


   In the example above, the red and blue circles represent the support vectors. The decision boundary is defined by the line that separates these support vectors. The support vectors determine the width of the margin, and any misclassification or movement of these points could potentially change the decision boundary.

2. Non-linearly separable case:
   When dealing with non-linearly separable data, SVMs use the kernel trick to implicitly map the data into a higher-dimensional feature space where linear separation is possible. In this case, the support vectors play a crucial role in defining the decision boundary in the higher-dimensional space.

   In the example above, the data points are not linearly separable in the original feature space (2D), but they can be separated by a circle in the implicit higher-dimensional feature space. The support vectors are the data points that lie on or closest to the margin or the decision boundary in the higher-dimensional space. These support vectors define the shape and position of the decision boundary.

   It's important to note that the SVM is a sparse model, meaning that only the support vectors contribute to the decision boundary. The remaining data points that are not support vectors do not impact the decision boundary directly.

By identifying the support vectors and considering their positions relative to the decision boundary, SVMs can make predictions on new data points efficiently and accurately. The support vectors guide the construction of the decision boundary, making SVMs a powerful tool for both linearly and non-linearly separable classification problems.

## Ans : 5

Sure! Let's start by defining each of these terms:

1. Hyperplane: In machine learning and geometry, a hyperplane is a subspace of one dimension less than its ambient space. In the context of support vector machines (SVMs), a hyperplane is a decision boundary that separates different classes of data.

2. Soft Margin: In SVM, a soft margin allows for some misclassifications in order to achieve a more flexible decision boundary. It allows data points to be on the wrong side of the decision boundary, but penalizes them with a certain cost.

3. Hard Margin: In contrast to a soft margin, a hard margin SVM aims to find a decision boundary that perfectly separates the two classes without any misclassifications. It is more strict and less tolerant of misclassifications.

4. Marginal Plane: The marginal plane is the hyperplane that is closest to the support vectors in an SVM. Support vectors are the data points that are closest to the decision boundary.

Let's illustrate these concepts with a simple example:

Suppose we have a two-dimensional dataset with two classes, represented by blue and red points. We want to find a decision boundary (hyperplane) to separate these classes.

Here's an example graph:

```
       +----------------------------------------+
       |                                        |
       |                                        |
       |               red points                |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       +----------------------------------------+
                         |
                  Decision Boundary
                         |
       +----------------------------------------+
       |                                        |
       |               blue points               |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       +----------------------------------------+
```

In the graph, the blue and red points represent the two classes. The decision boundary (hyperplane) is the line that separates the blue and red points. In the case of a soft margin SVM, some misclassifications might occur, allowing points to be on the wrong side of the decision boundary.

Here's an example of a soft margin SVM with a decision boundary:

```
       +----------------------------------------+
       |                                        |
       |               red points                |
       |                                        |
       |                   |                    |
       |                   |                    |
       |                 / | \                  |
       |                /  |  \                 |
       |               |   |   |                |
       |                \  |  /                 |
       |                 \ | /                  |
       |                   |                    |
       |                                        |
       +----------------------------------------+
                         |
                  Decision Boundary
                         |
       +----------------------------------------+
       |                                        |
       |               blue points               |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       +----------------------------------------+
```

In this case, some red points are on the blue side of the decision boundary, and some blue points are on the red side, but the SVM allows for these misclassifications.

In contrast, a hard margin SVM aims to find a decision boundary that perfectly separates the two classes without any misclassifications. Here's an example of a hard margin SVM with a decision boundary:

```
       +----------------------------------------+
       |                                        |
       |               red points                |
       |                                        |
       |                   |                    |
       |                   |                    |
       |                   |                    |
       |                   |                    |
       |                   |                    |
       |                   |                    |
       |                   |                    |
      

       |                   |                    |
       |                   |                    |
       +----------------------------------------+
                         |
                  Decision Boundary
                         |
       +----------------------------------------+
       |                                        |
       |               blue points               |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       |                                        |
       +----------------------------------------+
```

In this case, there are no misclassifications, and the decision boundary perfectly separates the two classes.

The marginal plane refers to the hyperplane that is closest to the support vectors. Support vectors are the data points that are closest to the decision boundary. In the above examples, the marginal plane would be the decision boundary itself.

I hope this helps to clarify the concepts of hyperplane, soft margin, hard margin, and marginal plane! Let me know if you have any further questions.

In [16]:
## Ans : 6

from sklearn import datasets

data=datasets.load_iris()
X=data.data
y=data.target

## train test split

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.20,random_state=42)

## train linear svm classifier

from sklearn.svm import LinearSVC
import warnings 
warnings.filterwarnings('ignore')

linear_svc=LinearSVC()

linear_svc.fit(X_train,y_train)

y_pred=linear_svc.predict(X_test)

## Check accuracy 

from sklearn.metrics import accuracy_score,confusion_matrix,classification_report

print(accuracy_score(y_pred,y_test))
print(confusion_matrix(y_pred,y_test))
print(classification_report(y_pred,y_test))

1.0
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [17]:
## regularize model 

from sklearn import datasets
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a LinearSVC classifier with regularization
svm = LinearSVC(C=0.5)  # Adjust the C value as desired

# Train the classifier on the training data
svm.fit(X_train, y_train)

# Make predictions on the test data
y_pred = svm.predict(X_test)

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 1.0
