**Question 1:  What is a Support Vector Machine (SVM), and how does it work?**

 Answer:  A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks, but it is most widely applied to classification problems.

How it Works:

Decision Boundary (Hyperplane):

SVM tries to find the best separating boundary (called a hyperplane) that divides the data points of different classes.

In 2D, this boundary is a straight line; in 3D, it’s a plane; in higher dimensions, it’s called a hyperplane.

Maximizing the Margin:

Among all possible hyperplanes, SVM chooses the one that maximizes the margin — the distance between the hyperplane and the nearest data points of each class.

These closest data points are called support vectors (they “support” or define the boundary).

Handling Non-Linear Data:

When the data is not linearly separable, SVM uses the kernel trick to map data into a higher-dimensional space where it becomes separable.

Common kernels:

Linear kernel

Polynomial kernel

Radial Basis Function (RBF) kernel

Soft Margin & Regularization:

Real-world data may contain noise or overlapping classes.

SVM allows some misclassifications by introducing a soft margin controlled by a parameter C:

Large C: tries to classify everything correctly (less tolerance for errors).

Small C: allows more misclassifications for a wider margin (better generalization).

**Question 2: Explain the difference between Hard Margin and Soft Margin SVM.**

Answer: In Support Vector Machines (SVM), the concepts of Hard Margin and Soft Margin refer to how strictly the algorithm separates the classes.

1. Hard Margin SVM

Definition: A hard margin SVM tries to find a hyperplane that perfectly separates the data without allowing any misclassification.

Assumption: The data must be linearly separable (no overlap, no noise).

Characteristics:

No tolerance for errors.

Maximizes the margin with strict separation.

Works well only with clean, noise-free datasets.

Prone to overfitting if the data has outliers or overlap.

2. Soft Margin SVM

Definition: A soft margin SVM allows some data points to be misclassified in order to create a wider margin and achieve better generalization.

Controlled by Parameter C:

Large C: Focuses on minimizing misclassification (narrow margin).

Small C: Allows more misclassifications but improves margin (better generalization).

Characteristics:

Tolerant of noise and overlapping classes.

Balances margin size with classification accuracy.

More practical for real-world datasets.

| Feature           | Hard Margin SVM                | Soft Margin SVM                           |
| ----------------- | ------------------------------ | ----------------------------------------- |
| Misclassification | Not allowed                    | Allowed (with penalty)                    |
| Data Requirement  | Linearly separable, no noise   | Works with noisy/overlapping data         |
| Flexibility       | Very rigid (strict separation) | Flexible, controlled by `C`               |
| Risk              | Overfitting with noisy data    | Better generalization in real-world cases |


**Question 3: What is the Kernel Trick in SVM? Give one example of a kernel and explain its use case. **


Answer:  Kernel Trick in SVM

The Kernel Trick is a mathematical technique in Support Vector Machines (SVM) that allows the algorithm to handle non-linear data by mapping it into a higher-dimensional feature space, where it becomes linearly separable.

Instead of explicitly computing the transformation (which may be computationally expensive), the kernel trick uses a kernel function to calculate the similarity (dot product) between two data points in the higher-dimensional space without actually performing the transformation.

How it Works (Simple Idea):

Suppose data is not separable in 2D.

By applying a kernel, we can imagine projecting it into 3D (or higher) where a linear hyperplane can separate the classes.

The kernel trick makes this projection implicit and efficient.

Example Kernel: Radial Basis Function (RBF) Kernel

Formula:

𝐾
(
𝑥
,
𝑥
′
)
=
exp
⁡
(
−
𝛾
∥
𝑥
−
𝑥
′
∥
2
)

Explanation:

Measures similarity between two points based on their distance.

Nearby points → High similarity (close to 1).

Faraway points → Low similarity (close to 0).

Use Case:

Works well for problems where the decision boundary is highly non-linear, such as:

Image classification

Handwriting recognition

Bioinformatics (e.g., classifying protein sequences)

✅ In short:

The Kernel Trick allows SVM to classify non-linear data efficiently by computing similarities in higher dimensions.

Example: RBF kernel → useful when classes are not linearly separable, like in image or text classification.


**Question 4: What is a Naïve Bayes Classifier, and why is it called “naïve”? **

Answer: A Naïve Bayes Classifier is a probabilistic machine learning algorithm that applies Bayes’ Theorem for classification tasks. It is commonly used in text classification, spam detection, sentiment analysis, and medical diagnosis.

Bayes’ Theorem:

𝑃
(
𝐶
∣
𝑋
)
=
𝑃
(
𝑋
∣
𝐶
)
⋅
𝑃
(
𝐶
)/
𝑃
(
𝑋
)



𝑃
(
𝐶
∣
𝑋
)
P(C∣X): Probability of class
𝐶
C given the features
𝑋
X (posterior).

𝑃
(
𝑋
∣
𝐶
)
P(X∣C): Probability of features given the class (likelihood).

𝑃
(
𝐶
)
P(C): Prior probability of the class.

𝑃
(
𝑋
)
P(X): Probability of the features.

Why is it called “Naïve”?

It is called naïve because it makes a strong simplifying assumption:

All features are independent of each other, given the class label.

Example: In text classification, it assumes the occurrence of the word “money” is independent of the word “profit”, which is rarely true in reality.

Despite this “naïve” assumption, the algorithm performs surprisingly well in practice, especially with large, high-dimensional datasets.

Key Points:

Advantages: Fast, simple, works well with text data, effective with small training sets.

Limitations: Independence assumption may not always hold.

Applications: Email spam filtering, document categorization, medical diagnosis, sentiment analysis.


**Question 5: Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants. When would you use each one? **

Answer: Naïve Bayes has several variants, each designed for different types of data. The three most common are Gaussian, Multinomial, and Bernoulli Naïve Bayes.

1. Gaussian Naïve Bayes

Assumption: The features follow a normal (Gaussian) distribution.

Formula for likelihood:

𝑃
(
𝑥
𝑖
∣
𝐶
)
=
1
2
𝜋
𝜎
2
exp
⁡
(
−
(
𝑥
𝑖
−
𝜇
)
2
2
𝜎
2
)
P(x
i
	​

∣C)=
2πσ
2
	​

1
	​

exp(−
2σ
2
(x
i
	​

−μ)
2
	​

)

Use Case:

Suitable for continuous data (real-valued features).

Examples: medical data (blood pressure, height, weight), sensor readings.

2. Multinomial Naïve Bayes

Assumption: Features represent discrete counts or frequencies.

Intuition: The likelihood is based on how often a feature (like a word) occurs in a document.

Use Case:

Works well for text classification problems where features are word counts or term frequency.

Examples: spam filtering, document classification, topic categorization.

3. Bernoulli Naïve Bayes

Assumption: Features are binary (0 or 1) — presence or absence of a feature.

Intuition: Instead of how many times a word appears, it only matters whether it appears or not.

Use Case:

Suitable for datasets with binary/boolean features.

Examples: text classification using word presence/absence, sentiment analysis (positive vs. negative word present).

| Variant         | Data Type Assumption        | Typical Use Case                                                               |
| --------------- | --------------------------- | ------------------------------------------------------------------------------ |
| **Gaussian**    | Continuous (real values)    | Medical data, sensor data, continuous features                                 |
| **Multinomial** | Discrete counts (frequency) | Text classification with word counts (spam detection, document classification) |
| **Bernoulli**   | Binary (0/1)                | Text classification with word presence/absence, sentiment analysis             |


**Question 6: Write a Python program to: ● Load the Iris dataset ● Train an SVM Classifier with a linear kernel ● Print the model's accuracy and support vectors. (Include your Python code and output in the code box below.) **

Answer:

In [1]:
# Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# 1. Load the Iris dataset
iris = datasets.load_iris()
X = iris.data   # Features
y = iris.target # Labels

# 2. Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# 3. Train an SVM Classifier with a linear kernel
model = SVC(kernel='linear')
model.fit(X_train, y_train)

# 4. Predict on the test set
y_pred = model.predict(X_test)

# 5. Print the model's accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)

# 6. Print the support vectors
print("Support Vectors:\n", model.support_vectors_)


Model Accuracy: 1.0
Support Vectors:
 [[4.8 3.4 1.9 0.2]
 [5.1 3.3 1.7 0.5]
 [4.5 2.3 1.3 0.3]
 [5.6 3.  4.5 1.5]
 [5.4 3.  4.5 1.5]
 [6.7 3.  5.  1.7]
 [5.9 3.2 4.8 1.8]
 [5.1 2.5 3.  1.1]
 [6.  2.7 5.1 1.6]
 [6.3 2.5 4.9 1.5]
 [6.1 2.9 4.7 1.4]
 [6.5 2.8 4.6 1.5]
 [6.9 3.1 4.9 1.5]
 [6.3 2.3 4.4 1.3]
 [6.3 2.8 5.1 1.5]
 [6.3 2.7 4.9 1.8]
 [6.  3.  4.8 1.8]
 [6.  2.2 5.  1.5]
 [6.2 2.8 4.8 1.8]
 [6.5 3.  5.2 2. ]
 [7.2 3.  5.8 1.6]
 [5.6 2.8 4.9 2. ]
 [5.9 3.  5.1 1.8]
 [4.9 2.5 4.5 1.7]]


**Question 7:  Write a Python program to: ● Load the Breast Cancer dataset ● Train a Gaussian Naïve Bayes model ● Print its classification report including precision, recall, and F1-score. (Include your Python code and output in the code box below.)**

 Answer:  

In [2]:
# Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report

# 1. Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X = cancer.data   # Features
y = cancer.target # Labels

# 2. Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# 3. Train a Gaussian Naïve Bayes model
model = GaussianNB()
model.fit(X_train, y_train)

# 4. Predict on the test set
y_pred = model.predict(X_test)

# 5. Print classification report
print("Classification Report:\n")
print(classification_report(y_test, y_pred, target_names=cancer.target_names))


Classification Report:

              precision    recall  f1-score   support

   malignant       0.93      0.90      0.92        63
      benign       0.95      0.96      0.95       108

    accuracy                           0.94       171
   macro avg       0.94      0.93      0.94       171
weighted avg       0.94      0.94      0.94       171



**Question 8: Write a Python program to: ● Train an SVM Classifier on the Wine dataset using GridSearchCV to find the best C and gamma. ● Print the best hyperparameters and accuracy. (Include your Python code and output in the code box below.)**

 Answer:  

In [3]:
# Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# 1. Load the Wine dataset
wine = datasets.load_wine()
X = wine.data   # Features
y = wine.target # Labels

# 2. Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# 3. Define parameter grid for C and gamma
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1],
    'kernel': ['rbf']  # Using RBF kernel
}

# 4. Train SVM Classifier with GridSearchCV
grid = GridSearchCV(SVC(), param_grid, refit=True, cv=5, verbose=0)
grid.fit(X_train, y_train)

# 5. Predict on the test set
y_pred = grid.predict(X_test)

# 6. Print the best hyperparameters and accuracy
print("Best Hyperparameters:", grid.best_params_)
print("Model Accuracy:", accuracy_score(y_test, y_pred))


Best Hyperparameters: {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}
Model Accuracy: 0.7777777777777778


**Question 9: Write a Python program to: ● Train a Naïve Bayes Classifier on a synthetic text dataset (e.g. using sklearn.datasets.fetch_20newsgroups). ● Print the model's ROC-AUC score for its predictions. (Include your Python code and output in the code box below.)**

 Answer:  

In [4]:
# Import necessary libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import label_binarize

# 1. Load a subset of the 20 Newsgroups dataset (for speed, select few categories)
categories = ['alt.atheism', 'sci.space', 'talk.politics.misc']
newsgroups = fetch_20newsgroups(subset='all', categories=categories)

X = newsgroups.data   # Text data
y = newsgroups.target # Labels

# 2. Convert text data to TF-IDF features
vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
X_tfidf = vectorizer.fit_transform(X)

# 3. Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X_tfidf, y, test_size=0.3, random_state=42
)

# 4. Train a Multinomial Naïve Bayes Classifier
model = MultinomialNB()
model.fit(X_train, y_train)

# 5. Predict probabilities on the test set
y_proba = model.predict_proba(X_test)

# 6. Compute ROC-AUC score (multi-class, using 'ovr')
y_test_bin = label_binarize(y_test, classes=[0, 1, 2])
roc_auc = roc_auc_score(y_test_bin, y_proba, multi_class='ovr')

# 7. Print the ROC-AUC score
print("ROC-AUC Score:", roc_auc)


ROC-AUC Score: 0.9990086115884346


**Question 10: Imagine you’re working as a data scientist for a company that handles email communications. Your task is to automatically classify emails as Spam or Not Spam. The emails may contain: ● Text with diverse vocabulary ● Potential class imbalance (far more legitimate emails than spam) ● Some incomplete or missing data Explain the approach you would take to: ● Preprocess the data (e.g. text vectorization, handling missing data) ● Choose and justify an appropriate model (SVM vs. Naïve Bayes) ● Address class imbalance ● Evaluate the performance of your solution with suitable metrics And explain the business impact of your solution. (Include your Python code and output in the code box below.) **

Answer:  Approach
1. Data Preprocessing

Handling Missing Data:

Fill missing email text with an empty string ("").

Drop rows if critical labels are missing.

Text Vectorization:

Use TF-IDF (Term Frequency–Inverse Document Frequency) to convert text into numerical features.

Helps handle diverse vocabulary and reduces the impact of very common words.

2. Model Choice (SVM vs. Naïve Bayes)

Naïve Bayes works well for text data, is fast, and handles high-dimensional sparse features (like word counts) efficiently.

SVM provides strong decision boundaries but is slower on large text datasets.

✅ Chosen Model: Multinomial Naïve Bayes → efficient, interpretable, and widely used in spam filtering.

3. Handling Class Imbalance

Spam datasets often have far more legitimate emails than spam.

Approaches:

Use class weights to penalize misclassification of minority class.

Apply resampling techniques (SMOTE or undersampling).

Focus on precision & recall rather than just accuracy.

4. Model Evaluation

Use Confusion Matrix, Precision, Recall, F1-score, and ROC-AUC.

Recall (Sensitivity) is important → missing spam emails (false negatives) is costly.

Precision is also important → false positives may annoy users.

5. Business Impact

Reduces the number of spam emails reaching users.

Saves employee time and prevents security risks (phishing, malware).

Improves trust in the company’s email communication system.

Python Code Implementation

In [5]:
# Import libraries
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
from sklearn.utils.class_weight import compute_class_weight
import numpy as np
import pandas as pd

# Example synthetic dataset (replace with real email dataset)
data = {
    "text": [
        "Congratulations, you won a free lottery ticket",
        "Please find attached the project report",
        "Limited offer, buy now and save 50%",
        "Let's schedule the meeting for tomorrow",
        "Earn money quickly from home",
        "Your invoice is attached"
    ],
    "label": ["spam", "ham", "spam", "ham", "spam", "ham"]
}

df = pd.DataFrame(data)

# 1. Handle missing values
df['text'] = df['text'].fillna("")

# 2. Vectorize text using TF-IDF
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(df['text'])
y = df['label'].map({"ham": 0, "spam": 1})  # Encode labels

# 3. Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# 4. Train Naïve Bayes model
model = MultinomialNB()
model.fit(X_train, y_train)

# 5. Predictions
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)[:,1]

# 6. Evaluation
print("Classification Report:\n")
print(classification_report(y_test, y_pred, target_names=["ham", "spam"]))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("ROC-AUC Score:", roc_auc_score(y_test, y_proba))


Classification Report:

              precision    recall  f1-score   support

         ham       0.50      1.00      0.67         1
        spam       0.00      0.00      0.00         1

    accuracy                           0.50         2
   macro avg       0.25      0.50      0.33         2
weighted avg       0.25      0.50      0.33         2

Confusion Matrix:
 [[1 0]
 [1 0]]
ROC-AUC Score: 1.0


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
