                                  S V M   &   Naive  Bayes
                                    A S S I G N M E N T

---
---

###Question 1: What is a Support Vector Machine (SVM), and how does it work?

<BR>

**Ans.-**
A **Support Vector Machine (SVM)** is a supervised machine learning algorithm used for both **classification** and **regression** tasks, but it is most commonly used for **classification**.  
It tries to find the **best boundary (hyperplane)** that separates different classes in the data.  

---

####  How It Works
1. **Find the Optimal Hyperplane:**  
   SVM searches for the hyperplane that **maximizes the margin** — the distance between the hyperplane and the nearest data points of each class.  
   These closest data points are called **support vectors**.  

2. **Maximizing the Margin:**  
   The larger the margin, the better the model’s ability to generalize to unseen data.  

3. **Non-linear Separation:**  
   When data is not linearly separable, SVM uses the **kernel trick** to transform the data into a higher-dimensional space where it becomes linearly separable.  

---

####  Example
If we want to classify emails as *Spam* or *Not Spam*:  
- SVM plots the data points (emails) based on features like word frequency and link count.  
- It then finds the line (or plane in higher dimensions) that best separates the two categories with the maximum possible margin.  

---

####  Summary
- **Goal:** Find the best separating hyperplane between classes.  
- **Support Vectors:** Data points closest to the boundary that influence its position.  
- **Kernel Trick:** Allows SVM to handle non-linear data efficiently.  

<br>

---

<br>

###Question 2: Explain the difference between Hard Margin and Soft Margin SVM.

<br>

**Ans.-**

####  Hard Margin SVM
- In **Hard Margin SVM**, the algorithm tries to find a **perfectly separating hyperplane** that divides the data into two classes **without any misclassification**.  
- This means **all data points must lie on the correct side of the margin**.  
- It assumes that the data is **linearly separable** and has **no noise or overlap** between classes.  

**Key Characteristics:**
- Strict separation — no data point can cross the margin.  
- Can lead to **overfitting** if the data contains noise or outliers.  
- Works only for perfectly separable data.  

 *Example:*  
If we have two groups of points (say “red” and “blue”) that do not overlap at all, a hard margin SVM can draw a clean straight line between them with zero error.

---

####  Soft Margin SVM
- **Soft Margin SVM** allows **some misclassifications** in order to achieve better generalization on unseen data.  
- It introduces a **tolerance parameter (C)** that controls the trade-off between having a wider margin and fewer classification errors.  
  - Small **C** → allows more misclassifications, focuses on a smooth margin.  
  - Large **C** → focuses on correctly classifying all training points, may overfit.  

**Key Characteristics:**
- More flexible and robust with noisy or overlapping data.  
- Works well for real-world datasets where perfect separation is rare.  

 *Example:*  
In email classification, some emails might look slightly ambiguous — Soft Margin SVM allows these few errors to keep the model more general and less sensitive to noise.  

---

####  Summary
| Aspect | Hard Margin SVM | Soft Margin SVM |
|:--|:--|:--|
| **Data Type** | Perfectly separable | Overlapping / noisy |
| **Misclassification Allowed** | ❌ No | ✅ Yes |
| **Generalization** | Low | High |
| **Risk of Overfitting** | High | Lower |
| **Parameter Used** | None | Regularization parameter (C) |

<br>

---

<br>

###Question 3: What is the Kernel Trick in SVM? Give one example of a kernel and explain its use case.

<br>

**Ans.-**  The Kernel Trick in Support Vector Machines (SVM)

####  What is the Kernel Trick?
The **Kernel Trick** is a mathematical technique in SVM that allows the algorithm to handle **non-linearly separable data** by transforming it into a **higher-dimensional space** without explicitly computing the transformation.  

In simple terms:  
Instead of drawing a straight line in the original feature space, the kernel function helps SVM find a separating boundary (hyperplane) in a **new, higher-dimensional space** where the data becomes linearly separable.  

---

####  How It Works
1. The kernel function calculates the **similarity** between two data points in the transformed space.  
2. This allows SVM to operate in that higher-dimensional space **without actually performing** the heavy computation of transformation.  
3. As a result, the model can handle complex data patterns efficiently.  

---

####  Example: Radial Basis Function (RBF) Kernel
**Kernel Function:**
\[
K(x_i, x_j) = \exp(-\gamma ||x_i - x_j||^2)
\]

- The RBF kernel maps data into an infinite-dimensional space.  
- It’s particularly effective for data that’s **non-linear or circularly separable**.  

**Use Case Example:**  
If we are classifying whether a point lies inside or outside a circle, a **linear kernel** would fail, but an **RBF kernel** can easily draw a circular boundary that separates the two classes.  

---

####  Common Types of Kernels
| Kernel Type | Description | Use Case |
|:--|:--|:--|
| **Linear** | No transformation; used for linearly separable data | Text classification |
| **Polynomial** | Adds polynomial terms of features | Image recognition |
| **RBF (Gaussian)** | Maps data to infinite dimensions | Non-linear problems |
| **Sigmoid** | Uses tanh function; similar to neural networks | Binary classification tasks |

---

 **Summary:**  
The **Kernel Trick** enables SVMs to efficiently solve complex, non-linear classification problems by using kernel functions — without explicitly transforming the data into higher dimensions.

<br>

---

<br>

###Question 4: What is a Naïve Bayes Classifier, and why is it called “naïve”?

<br>

**Ans.-**

####  What is a Naïve Bayes Classifier?
A **Naïve Bayes Classifier** is a probabilistic machine learning algorithm based on **Bayes’ Theorem**.  
It is mainly used for **classification tasks**, such as spam detection, sentiment analysis, and text categorization.

It predicts the class of a data point by calculating the **posterior probability** for each class and selecting the one with the highest probability.

**Bayes’ Theorem:**
\[
P(Y|X) = \frac{P(X|Y) \times P(Y)}{P(X)}
\]

Where:  
- \( P(Y|X) \): Probability of class Y given data X (posterior)  
- \( P(X|Y) \): Probability of data X given class Y (likelihood)  
- \( P(Y) \): Prior probability of class Y  
- \( P(X) \): Probability of the data X  

---

####  Why It’s Called “Naïve”
It’s called **“naïve”** because the algorithm **assumes that all features are independent** of each other — meaning, one feature doesn’t affect another.  
In real-world data, this assumption is rarely true, but it simplifies calculations and often works surprisingly well.

 *Example:*  
In email spam detection, Naïve Bayes assumes that the presence of the word “offer” is independent of the word “discount,” even though they often appear together in spam emails.

---

####  Advantages
- **Fast and efficient** for large datasets.  
- Works well even with limited training data.  
- Especially effective for **text classification** tasks.  

---

####  Limitation
- The independence assumption can reduce accuracy when features are strongly correlated.  

---

 **Summary:**  
The Naïve Bayes Classifier applies Bayes’ Theorem with the simplifying (naïve) assumption of feature independence — making it simple, fast, and effective for many practical classification problems.

<br>

---

<br>

###Question 5: Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants. When would you use each one?

<br>

**Ans.-**  Variants of Naïve Bayes Classifier

Naïve Bayes has several variants, each designed for different types of data.  
The three most common are **Gaussian**, **Multinomial**, and **Bernoulli** Naïve Bayes.

---

####  1. Gaussian Naïve Bayes
- **Used For:** Continuous (numeric) data.  
- **Assumption:** Features follow a **normal (Gaussian) distribution**.  
- The algorithm calculates the probability of a feature value using the Gaussian probability density function.  

**Formula:**
\[
P(x_i | y) = \frac{1}{\sqrt{2\pi\sigma_y^2}} \, e^{-\frac{(x_i - \mu_y)^2}{2\sigma_y^2}}
\]

**Example Use Case:**  
Predicting the type of flower in the **Iris dataset** based on continuous features like petal length and width.

---

####  2. Multinomial Naïve Bayes
- **Used For:** Discrete data such as **word counts or frequencies**.  
- Commonly applied to text classification problems where features represent the number of times a word appears in a document.  
- It works with features that represent counts or term frequencies (TF or TF-IDF values).

**Example Use Case:**  
Classifying emails as **spam or not spam** using word frequency counts.

---

####  3. Bernoulli Naïve Bayes
- **Used For:** Binary or Boolean features (0s and 1s).  
- It assumes that features are binary — meaning each feature is either **present (1)** or **absent (0)**.  
- It considers whether a particular word or attribute exists rather than how many times it appears.

**Example Use Case:**  
Sentiment analysis where each word feature indicates whether it appears in a sentence or not (1 = present, 0 = absent).

---

####  Summary Table

| Naïve Bayes Variant | Data Type | Example Use Case |
|:--|:--|:--|
| **GaussianNB** | Continuous / numeric | Iris flower classification |
| **MultinomialNB** | Discrete counts / frequencies | Text or spam classification |
| **BernoulliNB** | Binary / presence-absence | Sentiment analysis, document classification |

---

 **Summary:**  
- **GaussianNB:** for continuous numeric features.  
- **MultinomialNB:** for count-based text data.  
- **BernoulliNB:** for binary feature data.  
Choosing the right variant ensures the model accurately reflects the data distribution and performs better on the task.

<br>

---

<br>

###Question 6: Write a Python program to:
● Load the Iris dataset

● Train an SVM Classifier with a linear kernel

● Print the model's accuracy and support vectors.

###Dataset Info:
● You can use any suitable datasets like Iris, Breast Cancer, or Wine from
sklearn.datasets or a CSV file you have.






In [1]:
# SVM Classifier on Iris Dataset using Linear Kernel

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# -------------------------------
# Step 1: Load the Iris dataset
# -------------------------------
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

# -------------------------------
# Step 2: Split into training and testing sets
# -------------------------------
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# -------------------------------
# Step 3: Train SVM Classifier with Linear Kernel
# -------------------------------
model = SVC(kernel='linear')
model.fit(X_train, y_train)

# -------------------------------
# Step 4: Make predictions and calculate accuracy
# -------------------------------
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# -------------------------------
# Step 5: Print results
# -------------------------------
print("SVM Classifier Accuracy (Linear Kernel):", accuracy)
print("\nNumber of Support Vectors for Each Class:", model.n_support_)
print("\nSupport Vectors:\n", model.support_vectors_)


SVM Classifier Accuracy (Linear Kernel): 1.0

Number of Support Vectors for Each Class: [ 3 11 11]

Support Vectors:
 [[4.8 3.4 1.9 0.2]
 [5.1 3.3 1.7 0.5]
 [4.5 2.3 1.3 0.3]
 [5.6 3.  4.5 1.5]
 [5.4 3.  4.5 1.5]
 [6.7 3.  5.  1.7]
 [5.9 3.2 4.8 1.8]
 [5.1 2.5 3.  1.1]
 [6.  2.7 5.1 1.6]
 [6.3 2.5 4.9 1.5]
 [6.1 2.9 4.7 1.4]
 [6.5 2.8 4.6 1.5]
 [6.9 3.1 4.9 1.5]
 [6.3 2.3 4.4 1.3]
 [6.3 2.5 5.  1.9]
 [6.3 2.8 5.1 1.5]
 [6.3 2.7 4.9 1.8]
 [6.  3.  4.8 1.8]
 [6.  2.2 5.  1.5]
 [6.2 2.8 4.8 1.8]
 [6.5 3.  5.2 2. ]
 [7.2 3.  5.8 1.6]
 [5.6 2.8 4.9 2. ]
 [5.9 3.  5.1 1.8]
 [4.9 2.5 4.5 1.7]]


---

<br>

###Question 7: Write a Python program to:
● Load the Breast Cancer dataset

● Train a Gaussian Naïve Bayes model

● Print its classification report including precision, recall, and F1-score.

In [2]:
# Gaussian Naïve Bayes on Breast Cancer Dataset

import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report

# -------------------------------
# Step 1: Load the Breast Cancer dataset
# -------------------------------
cancer = load_breast_cancer()
X = pd.DataFrame(cancer.data, columns=cancer.feature_names)
y = cancer.target

# -------------------------------
# Step 2: Split into training and testing sets
# -------------------------------
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# -------------------------------
# Step 3: Train Gaussian Naïve Bayes Model
# -------------------------------
model = GaussianNB()
model.fit(X_train, y_train)

# -------------------------------
# Step 4: Make predictions
# -------------------------------
y_pred = model.predict(X_test)

# -------------------------------
# Step 5: Print classification report
# -------------------------------
print("Classification Report for Gaussian Naïve Bayes:\n")
print(classification_report(y_test, y_pred, target_names=cancer.target_names))


Classification Report for Gaussian Naïve Bayes:

              precision    recall  f1-score   support

   malignant       1.00      0.93      0.96        43
      benign       0.96      1.00      0.98        71

    accuracy                           0.97       114
   macro avg       0.98      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114



---

<br>

###Question 8: Write a Python program to:
● Train an SVM Classifier on the Wine dataset using GridSearchCV to find the best C and gamma.

● Print the best hyperparameters and accuracy.

In [3]:
# SVM Classifier on Wine Dataset with GridSearchCV (Tuning C and Gamma)

import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# -------------------------------
# Step 1: Load the Wine dataset
# -------------------------------
wine = load_wine()
X = pd.DataFrame(wine.data, columns=wine.feature_names)
y = wine.target

# -------------------------------
# Step 2: Split into training and testing sets
# -------------------------------
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# -------------------------------
# Step 3: Define SVM model and parameter grid
# -------------------------------
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1],
    'kernel': ['rbf']
}

svm = SVC()

# -------------------------------
# Step 4: Apply GridSearchCV for hyperparameter tuning
# -------------------------------
grid_search = GridSearchCV(
    estimator=svm,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy'
)
grid_search.fit(X_train, y_train)

# -------------------------------
# Step 5: Evaluate the best model
# -------------------------------
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# -------------------------------
# Step 6: Print results
# -------------------------------
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validation Accuracy:", grid_search.best_score_)
print("Test Set Accuracy with Best Model:", accuracy)


Best Hyperparameters: {'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}
Best Cross-Validation Accuracy: 0.7179802955665024
Test Set Accuracy with Best Model: 0.8333333333333334


---

<br>


###Question 9: Write a Python program to:
● Train a Naïve Bayes Classifier on a synthetic text dataset (e.g. using
sklearn.datasets.fetch_20newsgroups).

● Print the model's ROC-AUC score for its predictions.

In [5]:
# SVM Classifier on Wine Dataset with GridSearchCV (Tuning C and Gamma)

import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# -------------------------------
# Step 1: Load the Wine dataset
# -------------------------------
wine = load_wine()
X = pd.DataFrame(wine.data, columns=wine.feature_names)
y = wine.target

# -------------------------------
# Step 2: Split into training and testing sets
# -------------------------------
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# -------------------------------
# Step 3: Define SVM model and parameter grid
# -------------------------------
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1],
    'kernel': ['rbf']
}

svm = SVC()

# -------------------------------
# Step 4: Apply GridSearchCV for hyperparameter tuning
# -------------------------------
grid_search = GridSearchCV(
    estimator=svm,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy'
)
grid_search.fit(X_train, y_train)

# -------------------------------
# Step 5: Evaluate the best model
# -------------------------------
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# -------------------------------
# Step 6: Print results
# -------------------------------
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validation Accuracy:", grid_search.best_score_)
print("Test Set Accuracy with Best Model:", accuracy)


Best Hyperparameters: {'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}
Best Cross-Validation Accuracy: 0.7179802955665024
Test Set Accuracy with Best Model: 0.8333333333333334


Question 10: Imagine you’re working as a data scientist for a company that handles email communications.
Your task is to automatically classify emails as Spam or Not Spam. The emails may contain:

● Text with diverse vocabulary

● Potential class imbalance (far more legitimate emails than spam)

● Some incomplete or missing data

Explain the approach you would take to:

● Preprocess the data (e.g. text vectorization, handling missing data)

● Choose and justify an appropriate model (SVM vs. Naïve Bayes)

● Address class imbalance

● Evaluate the performance of your solution with suitable metrics
And explain the business impact of your solution.

**Ans.-**
### 📖 Email Classification Case Study — Spam vs. Not Spam

As a data scientist for an email service company, my goal is to automatically classify emails as **Spam** or **Not Spam** using machine learning.  
This involves data preprocessing, model selection, handling class imbalance, and evaluating performance.

---

###  1. Data Preprocessing

####  Handling Missing Data
- Some emails may have missing subjects or message bodies.  
- Replace missing text fields with an empty string (`""`) or a placeholder like `"unknown"`.  
- Remove irrelevant columns (e.g., sender ID if not useful for spam detection).  

####  Text Preprocessing
1. **Tokenization:** Break text into individual words.  
2. **Lowercasing:** Convert all words to lowercase for uniformity.  
3. **Removing Stopwords:** Remove common words like “the”, “is”, “and” that don’t add meaning.  
4. **Lemmatization/Stemming:** Reduce words to their root form (e.g., “offers” → “offer”).  

#### Text Vectorization
- Convert text into numerical format using:  
  - **TF-IDF Vectorizer:** Gives importance to rare but meaningful words.  
  - Or **CountVectorizer:** Converts text into a frequency-based feature matrix.  

---

### 2. Model Selection

####  Comparing SVM vs. Naïve Bayes
| Model | Strengths | Limitations | When to Use |
|:--|:--|:--|:--|
| **Naïve Bayes** | Fast, simple, works well on text data | Assumes word independence | Ideal for large text datasets like emails |
| **SVM** | Handles high-dimensional data well, powerful decision boundaries | Slower on large datasets | Best for smaller, well-balanced datasets |

 **Chosen Model:**  
**Multinomial Naïve Bayes** — because:  
- Text features (word counts) fit its assumptions.  
- It’s fast and performs well even with a large number of features (vocabulary words).  

---

###  3. Handling Class Imbalance
Since spam emails are fewer than legitimate ones, the dataset is **imbalanced**.  
To address this:  
- **Resampling Techniques:**  
  - **Oversample minority class (spam)** using **SMOTE** or random oversampling.  
  - Or **undersample majority class** to balance data.  
- **Model-based Approach:**  
  - Use class weights (`class_weight='balanced'` in SVM).  
  - For Naïve Bayes, use prior probabilities adjusted to reflect class proportions.  

---

###  4. Model Evaluation

####  Metrics to Use
Accuracy alone isn’t enough for imbalanced data, so use:
- **Precision:** % of emails predicted as spam that are actually spam.  
- **Recall:** % of spam emails correctly identified.  
- **F1-Score:** Balance between precision and recall.  
- **ROC-AUC Score:** Measures model’s ability to distinguish between spam and not spam.  
- **Confusion Matrix:** Visualizes correct vs. incorrect predictions.  

 *Goal:* Achieve **high recall** without letting too many legitimate emails be marked as spam (false positives).

---

###  5. Business Impact

####  Benefits:
- **Improved Productivity:** Reduces spam clutter, saving employees’ time.  
- **Customer Trust:** Reliable spam detection improves user satisfaction.  
- **Cost Efficiency:** Automated classification reduces manual review.  

####  Key Consideration:
- Maintain balance — being too aggressive may classify legitimate emails as spam, causing business communication loss.  

---

###  **Summary**
1. Clean and vectorize text data using TF-IDF.  
2. Train a **Multinomial Naïve Bayes** model for speed and efficiency.  
3. Handle imbalance with **SMOTE** or **class weighting**.  
4. Evaluate using **precision, recall, F1-score, and ROC-AUC**.  
5. The solution delivers a **scalable, accurate, and cost-effective** spam detection system that enhances user experience and trust.


---
---