Q1.  What is a Support Vector Machine (SVM)

Ans1. A Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used for classification and regression tasks. It is especially effective in high-dimensional spaces and with clear margin separation.

SVM aims to find the best decision boundary (hyperplane) that maximally separates classes in the feature space.

For binary classification, SVM finds the hyperplane that:

Uses only the support vectors (data points closest to the boundary) to define the decision boundary.

Types of SVM:
Linear SVM – Works when data is linearly separable.

Non-Linear SVM – Uses kernels to separate complex data.



Q2.  What is the difference between Hard Margin and Soft Margin SVM

Ans2. Support Vector Machines (SVMs) can be categorized into Hard Margin and Soft Margin approaches based on how strictly they enforce the separation between classes.

| Aspect                 | **Hard Margin SVM**                             | **Soft Margin SVM**                                  |
| ---------------------- | ----------------------------------------------- | ---------------------------------------------------- |
| **Tolerance to Error** | Does **not** allow any misclassifications       | Allows **some** misclassifications (violations)      |
| **Assumes**            | Data is **perfectly linearly separable**        | Data is **not perfectly separable**                  |
| **Margin**             | Maximizes the margin **without** any violations | Maximizes the margin **with penalty** for violations |
| **Flexibility**        | Less flexible, sensitive to noise and outliers  | More flexible, **robust to outliers**                |
| **Use Case**           | Rare in real-world scenarios                    | Common in practice                                   |
| **Controlled by**      | No slack variables or penalty term              | Introduces **slack variables** and **C parameter**   |


Q3.  What is the mathematical intuition behind SVM

Ans3. Mathematical Intuition Behind SVM (Support Vector Machine)


1. Hyperplane Equation



Q4.  What is the role of Lagrange Multipliers in SVM
Role of Lagrange Multipliers in SVM
Lagrange multipliers play a crucial role in solving the optimization problem at the heart of Support Vector Machines (SVMs). They help convert the constrained optimization problem into a form that is easier to solve — specifically, the dual problem.

🔧 SVM Optimization Setup (Hard Margin)


| Concept                     | Role of Lagrange Multipliers                        |
| --------------------------- | --------------------------------------------------- |
| Enforcing constraints       | Transform inequality constraints into dual form     |
| Solving optimization        | Convert primal problem into easier dual form        |
| Identifying support vectors | Only points with $\alpha_i > 0$ are support vectors |
| Working with kernels        | Dual form allows use of kernel functions            |


Q5.  What are Support Vectors in SVM

Ans5. Support Vectors are the critical data points in a dataset that lie closest to the decision boundary (hyperplane) in a Support Vector Machine (SVM) model.

Define the optimal hyperplane.

Determine the margin.

Are the only points used to build the decision function.



Q6.  What is a Support Vector Classifier (SVC)

Ans6.Support Vectors are the critical data points in a dataset that lie closest to the decision boundary (hyperplane) in a Support Vector Machine (SVM) model.

Define the optimal hyperplane.

Determine the margin.

Are the only points used to build the decision function.




Q7.  What is a Support Vector Regressor (SVR)

Ans7. Support Vector Regressor (SVR) is the regression version of Support Vector Machine (SVM). Instead of classifying data, SVR predicts a continuous target value while trying to keep the predictions within a specified error margin (ε-tube) from the actual values.

📌 Key Idea

SVR tries to fit the best line (or curve) such that the predicted values deviate from the actual values by no more than ε (epsilon). It also aims to keep the model as flat (simple) as possible.


🔧 SVR Optimization (Soft Margin)

| SVR Component   | Purpose                                 |
| --------------- | --------------------------------------- |
| $\varepsilon$   | Tolerance margin for error              |
| $C$             | Penalty for errors beyond $\varepsilon$ |
| Support Vectors | Points outside the ε-tube               |
| Kernels         | Handle non-linear relationships         |


Q8.  What is the Kernel Trick in SVM

Ans8. The Kernel Trick is a mathematical technique that allows Support Vector Machines (SVM) to learn non-linear decision boundaries without explicitly transforming the input features into higher dimensions.

🔍 The Problem
SVMs are inherently linear classifiers. But what if your data is not linearly separable?

Solution: Map the original features to a higher-dimensional space where the data becomes linearly separable.

But computing the transformation explicitly (e.g., converting 2D features to 1000D) is computationally expensive.


Q9.  Compare Linear Kernel, Polynomial Kernel, and RBF Kernel

Ans9. Here is a comparison of the Linear, Polynomial, and RBF (Radial Basis Function) kernels, the most commonly used kernels in Support Vector Machines (SVM).

| Feature             | **Linear Kernel**       | **Polynomial Kernel**      | **RBF Kernel**                 |
| ------------------- | ----------------------- | -------------------------- | ------------------------------ |
| Formula             | $x^\top x'$             | $(x^\top x' + c)^d$        | $\exp(-\gamma \|x - x'\|^2)$   |
| Non-linearity       | ❌ No                    | ✅ Yes (degree-controlled)  | ✅✅ Highly non-linear           |
| Parameters          | None                    | $c, d$                     | $\gamma$                       |
| Speed               | ✅ Fast                  | ⚠️ Slower                  | ⚠️ Slowest (most computation)  |
| Risk of Overfitting | ❌ Low                   | ⚠️ Medium (if high degree) | ⚠️ High (if poorly tuned)      |
| Typical Use Cases   | Text data, linear tasks | Curved relationships       | Most classification/regression |


Q10.  What is the effect of the C parameter in SVM

Ans10.
In Support Vector Machines (SVM), the C parameter is a regularization parameter that controls the trade-off between:


| Value of `C` | Margin Width | Training Error Tolerance | Risk of Overfitting | Generalization         |
| ------------ | ------------ | ------------------------ | ------------------- | ---------------------- |
| **High `C`** | Narrow       | Low                      | High                | Poor (can overfit)     |
| **Low `C`**  | Wide         | High                     | Low                 | Better (if tuned well) |

🧠 Intuition
Think of C as the cost of violating the margin:

Higher C = “Don’t violate it!” → Fewer slack variables → Strict.

Lower C = “Mistakes are okay” → More flexibility in fitting.



Q11.  What is the role of the Gamma parameter in RBF Kernel SVM

Ans11. In an SVM using the RBF (Radial Basis Function) kernel, the gamma parameter defines how far the influence of a single training example reaches — i.e., it controls the curvature of the decision boundary.

| Parameter  | Effect                                                      |
| ---------- | ----------------------------------------------------------- |
| **High γ** | High model complexity, risk of overfitting                  |
| **Low γ**  | Low model complexity, risk of underfitting                  |
| **γ + C**  | Together control the **bias-variance trade-off** in RBF-SVM |



Q12.  What is the Naïve Bayes classifier, and why is it called "Naïve"

Ans12. Naïve Bayes is a probabilistic classification algorithm based on applying Bayes' theorem with a strong (naïve) assumption of feature independence.

| Aspect          | Description                                             |
| --------------- | ------------------------------------------------------- |
| **Type**        | Probabilistic classifier                                |
| **Based on**    | Bayes' theorem                                          |
| **Assumption**  | Features are conditionally independent given the class  |
| **Why "Naïve"** | Simplifying assumption of independence                  |
| **Strength**    | Simple, fast, works well with high-dimensional data     |
| **Common use**  | Text classification, spam filtering, sentiment analysis |



Q13.  What is Bayes’ Theorem

Ans13. Bayes’ Theorem is a fundamental rule in probability theory that describes how to update the probability of a hypothesis based on new evidence.

Q14.  Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes

Ans14. 1. Gaussian Naïve Bayes
Assumption: Features are continuous and follow a Gaussian (normal) distribution.

Use case: When features are real-valued (e.g., height, weight, temperature).

Example: Predicting species from measurements like sepal length/width (Iris dataset).

2. Multinomial Naïve Bayes
Assumption: Features represent counts or frequencies (discrete non-negative values).

Use case: Commonly used in text classification where features are word counts or term frequencies.

3. Bernoulli Naïve Bayes
Assumption: Features are binary (0/1), representing presence or absence.

Use case: Text classification when you care about whether a word appears or not, regardless of frequency.



Q15. When should you use Gaussian Naïve Bayes over other variants

Ans15.
Use Gaussian Naïve Bayes when your features are continuous numerical variables that are roughly normally (Gaussian) distributed within each class.

When to prefer Gaussian Naïve Bayes:

Data type: Features are continuous real numbers (e.g., height, weight, temperature, sensor readings).

Distribution: Each feature’s values in each class approximately follow a bell-shaped curve (normal distribution).

| Scenario                                | Recommended Naïve Bayes Variant |
| --------------------------------------- | ------------------------------- |
| Continuous, approximately Gaussian data | **Gaussian Naïve Bayes**        |
| Discrete counts (e.g., word counts)     | Multinomial Naïve Bayes         |
| Binary features (presence/absence)      | Bernoulli Naïve Bayes           |


Q16.  What are the key assumptions made by Naïve Bayes

Ans16. 1. Conditional Independence
Assumption: All features are conditionally independent given the class label.

Meaning: The presence or value of one feature does not affect or provide information about another feature once the class is known.

2. Feature Distribution Assumption
Naïve Bayes assumes a specific probability distribution for features depending on the variant:

| Assumption                   | Description                                        |
| ---------------------------- | -------------------------------------------------- |
| **Conditional Independence** | Features independent given the class               |
| **Feature Distribution**     | Features follow assumed distribution per variant   |
| **Accurate Class Priors**    | Class probabilities represent true prior knowledge |


Q17.  What are the advantages and disadvantages of Naïve Bayes

Ans17. Advantages
Simple and Fast

Works Well with High-Dimensional Data

Requires Less Training Data



Disadvantages


Strong Independence Assumption

Poor Performance with Correlated Features

Zero Frequency Problem






Q18.  Why is Naïve Bayes a good choice for text classification

Ans18.

1. Handles High-Dimensional Data Efficiently
Text data typically involves thousands of features (words or tokens).

Naïve Bayes scales well to this high-dimensional space without heavy computational cost.



2. Works Well with Sparse Data
Most documents contain only a small subset of all possible words (sparse feature vectors).

Naïve Bayes handles sparse inputs naturally, especially the Multinomial and Bernoulli variants.


Q19. Compare SVM and Naïve Bayes for classification tasks

Ans19.

| Aspect                            | SVM                                                                   | Naïve Bayes                                                 |
| --------------------------------- | --------------------------------------------------------------------- | ----------------------------------------------------------- |
| **Type of Model**                 | Discriminative, margin-based classifier                               | Generative probabilistic classifier                         |
| **Assumptions**                   | Makes no strong assumptions about feature independence                | Assumes conditional independence of features                |
| **Handling of Data**              | Effective with both linear and nonlinear data (via kernels)           | Works well with high-dimensional, sparse data (e.g., text)  |
| **Training Complexity**           | Computationally intensive, especially for large datasets              | Very fast and simple to train                               |
| **Interpretability**              | Harder to interpret decision boundary                                 | Outputs probabilities, easier to interpret                  |
| **Handling Non-linearities**      | Uses kernel trick to model complex boundaries                         | Limited to model complexity via feature transformations     |
| **Robustness to Noise/Outliers**  | Can be sensitive to noise, but soft margin SVM handles this           | Less sensitive to noise due to probabilistic nature         |
| **Performance on Small Datasets** | Performs well, especially with well-chosen kernels                    | May struggle if feature independence assumption is violated |
| **Output**                        | Hard classification boundary (decision function)                      | Probabilistic output (class probabilities)                  |
| **Use Cases**                     | Image recognition, bioinformatics, text classification (with kernels) | Text classification, spam filtering, sentiment analysis     |


Q20.  How does Laplace Smoothing help in Naïve Bayes?

Ans20.
Laplace smoothing (also called add-one smoothing) is a technique used to handle the zero-frequency problem in Naïve Bayes.

The Zero-Frequency Problem
When a feature (e.g., a word) never appears in the training data for a given class, its probability estimate becomes zero.

Since Naïve Bayes multiplies probabilities of all features, this zero probability causes the entire product to become zero, making it impossible to predict that class.


Pracical

Q21. Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy


Ans21. from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split dataset into train and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize SVM classifier (default RBF kernel)
svm_clf = SVC()

# Train the model
svm_clf.fit(X_train, y_train)

# Predict on test set
y_pred = svm_clf.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"SVM Classifier Accuracy on Iris test set: {accuracy:.2f}")


Q22.  Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies


Ans22.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Wine dataset
wine = datasets.load_wine()
X = wine.data
y = wine.target

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize SVM classifiers with Linear and RBF kernels
svm_linear = SVC(kernel='linear')
svm_rbf = SVC(kernel='rbf')

# Train both models
svm_linear.fit(X_train, y_train)
svm_rbf.fit(X_train, y_train)

# Predict on test set
y_pred_linear = svm_linear.predict(X_test)
y_pred_rbf = svm_rbf.predict(X_test)

# Calculate accuracies
accuracy_linear = accuracy_score(y_test, y_pred_linear)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Print the results
print(f"Accuracy with Linear kernel: {accuracy_linear:.2f}")
print(f"Accuracy with RBF kernel: {accuracy_rbf:.2f}")


Q23.  Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean

Ans3. from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Wine dataset
wine = datasets.load_wine()
X = wine.data
y = wine.target

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize SVM classifiers with Linear and RBF kernels
svm_linear = SVC(kernel='linear')
svm_rbf = SVC(kernel='rbf')

# Train both models
svm_linear.fit(X_train, y_train)
svm_rbf.fit(X_train, y_train)

# Predict on test set
y_pred_linear = svm_linear.predict(X_test)
y_pred_rbf = svm_rbf.predict(X_test)

# Calculate accuracies
accuracy_linear = accuracy_score(y_test, y_pred_linear)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Print the results
print(f"Accuracy with Linear kernel: {accuracy_linear:.2f}")
print(f"Accuracy with RBF kernel: {accuracy_rbf:.2f}")




Q24.  Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision boundary

Ans24.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.svm import SVC

# Load Iris dataset and select two features for easy visualization
iris = datasets.load_iris()
X = iris.data[:, :2]  # Use first two features
y = iris.target

# Train SVM classifier with polynomial kernel
svm_poly = SVC(kernel='poly', degree=3, C=1.0)
svm_poly.fit(X, y)

# Create a mesh grid for plotting decision boundaries
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 500),
                     np.linspace(y_min, y_max, 500))

# Predict class labels for each point in the mesh
Z = svm_poly.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot decision boundary and training points
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.Set1)
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Set1, edgecolors='k')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.title("SVM with Polynomial Kernel (degree=3)")
plt.show()


Q25.  Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and

Ans25.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X = cancer.data
y = cancer.target

# Split dataset into train and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Gaussian Naive Bayes classifier
gnb = GaussianNB()

# Train the classifier
gnb.fit(X_train, y_train)

# Predict on test set
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Gaussian Naive Bayes accuracy on Breast Cancer dataset: {accuracy:.2f}")


Q26.  Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20 Newsgroups dataset.

Ans26. from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Load 20 Newsgroups dataset (subset for faster training)
newsgroups = fetch_20newsgroups(subset='all', shuffle=True, random_state=42)

X = newsgroups.data
y = newsgroups.target

# Split dataset into train and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert text data to TF-IDF features
vectorizer = TfidfVectorizer(stop_words='english', max_df=0.7)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Initialize Multinomial Naive Bayes classifier
mnb = MultinomialNB()

# Train the classifier
mnb.fit(X_train_tfidf, y_train)

# Predict on test set
y_pred = mnb.predict(X_test_tfidf)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Multinomial Naive Bayes accuracy on 20 Newsgroups dataset: {accuracy:.2f}")

# Optional: Print classification report
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=newsgroups.target_names))









































