In [None]:
Q1. The mathematical formula for a linear SVM is represented as follows:

The decision function for a linear SVM is:
f(x) = wx + b

Where:
- "f(x)" is the decision function that predicts the class of a data point "x."
- "w" is the weight vector that determines the orientation of the hyperplane.
- "x" is the input feature vector.
- "b" is the bias term (also known as the intercept).

The decision boundary (hyperplane) is the set of points for which "f(x)" is equal to zero. Points on one side of the hyperplane are classified as one class, while points on the other side are classified as the other class.

Q2. The objective function of a linear SVM aims to find the hyperplane that maximizes the margin between two classes while minimizing classification errors. The objective function can be formulated as:

Minimize: (1/2) ||w||^2
Subject to: y_i(w * x_i + b) >= 1 for all training examples (i)

In this formulation:
- "w" represents the weight vector of the hyperplane.
- "b" is the bias term.
- "x_i" are the training examples.
- "y_i" are the class labels (+1 or -1).

The objective is to find "w" and "b" that minimize the L2 norm of "w" while ensuring that all training examples are correctly classified and lie on or outside the margin.

Q3. The kernel trick in SVM is a technique that allows SVMs to handle nonlinearly separable data by implicitly mapping the input features into a higher-dimensional space. Instead of explicitly calculating the transformation, the kernel function computes the dot product between the transformed feature vectors. This allows SVMs to find nonlinear decision boundaries in the original feature space without the need to compute and store the transformed data explicitly.

Common kernel functions include the polynomial kernel, radial basis function (RBF) kernel, and sigmoid kernel, among others. These kernels make it possible to learn complex decision boundaries in the original feature space.

Q4. Support vectors in SVM are the data points that are closest to the hyperplane and play a crucial role in defining the margin and the decision boundary. 
These support vectors are the most challenging data points to classify and are essential for the SVM algorithm. Here's an explanation with an example:

Example:
Consider a binary classification problem with two classes, A and B. The dataset includes various data points from both classes. When you train an SVM, it finds the hyperplane that best separates the classes. The support vectors are the data points closest to this hyperplane, from both class A and class B.

In the example, support vectors are critical because they define the margin of the classifier. Any change in the position of these support vectors would affect the location of the decision boundary. The SVM's goal is to maximize this margin while minimizing classification errors.

Q5. Here's a brief illustration of Hyperplane, Marginal plane, Soft margin, and Hard margin in SVM:

- Hyperplane: The hyperplane is the decision boundary that separates data points of different classes. It is a mathematical concept represented by the equation "f(x) = wx + b," where "w" is the weight vector, "x" is the input feature vector, and "b" is the bias term. In a 2D feature space, it is a straight line; in higher dimensions, it's a hyperplane.

- Marginal Plane: The marginal plane is defined by the support vectors, which are the data points closest to the hyperplane. These support vectors are used to compute the margin, which is the distance between the hyperplane and the nearest support vectors. The marginal plane is parallel to the hyperplane and equidistant from it.

- Hard Margin: In a hard margin SVM, the goal is to find a hyperplane that perfectly separates the data points of different classes without any misclassification. This is suitable for linearly separable data but may fail when the data is not linearly separable.

- Soft Margin: In a soft margin SVM, the goal is to find a hyperplane that separates the data points with a margin that allows for some misclassification. It is suitable for cases where the data is not perfectly separable or when there is noise in the dataset. The regularization parameter C controls the trade-off between maximizing the margin and minimizing misclassification.

Graphs illustrating these concepts would typically involve visualizations of data points, hyperplanes, and margins in two or three dimensions, but a textual description is provided here for simplicity.

In [None]:
#6ans)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score

# Step 1: Import the necessary libraries

# Step 2: Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Step 3: Split the dataset into a training set and a testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Train a linear SVM classifier on the training set
clf = svm.SVC(kernel='linear', C=1)  # You can experiment with different values of C
clf.fit(X_train, y_train)

# Step 5: Predict the labels for the testing set
y_pred = clf.predict(X_test)

# Step 6: Compute the accuracy of the model on the testing set
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Step 7: Plot the decision boundaries using two of the features
# We'll choose the first two features for visualization
X_2d = X[:, :2]

# Create a meshgrid to plot decision boundaries
x_min, x_max = X_2d[:, 0].min() - 1, X_2d[:, 0].max() + 1
y_min, y_max = X_2d[:, 1].min() - 1, X_2d[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                     np.arange(y_min, y_max, 0.01))

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot decision boundaries
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X_2d[:, 0], X_2d[:, 1], c=y, cmap=plt.cm.coolwarm)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Decision Boundaries of Linear SVM')
plt.show()

# Step 8: Experiment with different values of C
C_values = [0.1, 1, 10]

for C_val in C_values:
    clf = svm.SVC(kernel='linear', C=C_val)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy (C={C_val}): {accuracy * 100:.2f}%")
