> Question 1: What is a Support Vector Machine (SVM), and how does it
> work? ANS 1 :-
>
> **Support Vector Machine (SVM)**  
> A **Support Vector Machine (SVM)** is a **supervised machine learning
> algorithm** used for **classification** and sometimes **regression**
> tasks.​  
> It is especially powerful for **binary classification problems**
> (e.g., classifying emails as *spam* or *not spam*).
>
> **How SVM Works**  
> 1.​ **Separating Data with a Hyperplane​**  
> ○​ Imagine you have data points belonging to two classes (e.g., red and
> blue points).​  
> ○​ SVM tries to find the **best boundary (hyperplane)** that separates
> these two classes.​  
> ○​ In 2D, this hyperplane is just a line; in 3D, it’s a plane; in
> higher dimensions, it’s called a hyperplane.​
>
> 2.​ **Maximizing the Margin​**  
> ○​ Instead of just drawing any line, SVM chooses the line that
> **maximizes the** **margin**, i.e., the distance between the line and
> the nearest data points from each class.​  
> ○​ These nearest data points are called **support vectors** (hence the
> name!).​
>
> 3.​ **Handling Non-Linear Data​**  
> ○​ Sometimes, the data is not linearly separable (can’t be split with a
> straight line).​ ○​ In such cases, SVM uses the **Kernel Trick** to map
> the data into a  
> higher-dimensional space where it becomes separable.​
>
> ○​ Common kernels:​  
> ■​ **Linear Kernel** – works for linearly separable data.​  
> ■​ **Polynomial Kernel** – separates curved boundaries.​  
> ■​ **RBF (Radial Basis Function) Kernel** – good for complex non-linear
> data.​
>
> 4.​ **Soft Margin for Noisy Data​**  
> ○​ Real-world data often has noise or overlapping points.​  
> ○​ SVM allows some misclassifications using a **soft margin**
> controlled by a parameter **C**.​  
> ○​ A small C → wider margin, allows more errors (good for
> generalization).​ ○​ A large C → stricter separation, fewer errors but
> may overfit.​
>
> **Intuition with Example**  
> Imagine you want to separate **cats vs dogs** based on features like
> weight and height: ●​ SVM finds the line (in 2D) or plane (in higher
> dimensions) that separates them with the **maximum possible gap**.​  
> ●​ The animals closest to that boundary are the **support vectors**.​  
> ●​ Even if there’s some overlap, SVM can allow small mistakes but still
> focus on a strong separation rule.​
>
> **Business Value of SVM**  
> SVM is widely used in:
>
> ●​ **Healthcare** → disease classification (cancer detection, diabetes
> prediction).​●​ **Finance** → fraud detection, credit risk analysis.​  
> ●​ **Text classification** → spam detection, sentiment analysis.​  
> ●​ **Image recognition** → face recognition, handwriting recognition.
>
> Question 2: Explain the difference between Hard Margin and Soft Margin
> SVM.
>
> ANS 2 :-  
> **1. Hard Margin SVM**  
> ●​ **Definition**: A hard margin SVM tries to find a hyperplane that
> **perfectly separates** the data into classes **without any
> misclassifications**.​  
> ●​ **Assumptions**: Data must be **linearly separable** (no
> overlaps).​  
> ●​ **Behavior**:​  
> ○​ No tolerance for errors or outliers.​  
> ○​ Works well only for **clean datasets** with no noise.​  
> ●​ **Limitation**:​  
> ○​ If even one point is misclassified (an outlier), the hard margin SVM
> may fail to find a solution.​
>
> **Best for:** Perfectly separable, noise-free data.
>
> **2. Soft Margin SVM**  
> ●​ **Definition**: A soft margin SVM allows **some misclassifications**
> or violations of the margin in order to achieve better
> generalization.​  
> ●​ **How it works**:​  
> ○​ Introduces a **slack variable** (ξ) that allows some points to be on
> the wrong side of the margin.​
>
> ○​ Controlled by a parameter **C** (regularization parameter).​  
> ■​ High C → fewer misclassifications, stricter boundary (risk of
> overfitting).​ ■​ Low C → more tolerance for errors, wider margin
> (better generalization).​●​ **Advantage**:​  
> ○​ Handles noisy and overlapping data.​  
> ○​ More robust for real-world datasets.​
>
> **Best for:** Noisy, real-world data where perfect separation is not
> possible.
>
> Question 3: What is the Kernel Trick in SVM? Give one example of a
> kernel and explain its use case.
>
> ANS 3 ➖
>
> **What is the Kernel Trick in SVM?**
>
> ●​ Many datasets are **not linearly separable** (you can’t split them
> with a straight line or flat hyperplane).​  
> ●​ The **Kernel Trick** is a mathematical method that lets SVM
> **project data into a**  
> **higher-dimensional space** where it *becomes linearly separable*,
> **without actually** **computing the transformation explicitly**.
>
> **How It Works**  
> 1.​ Suppose your data looks like circles inside each other (not
> linearly separable).​  
> 2.​ If you map the data to a higher dimension (e.g., add a squared
> feature), the classes may become linearly separable.​  
> 3.​ The kernel function computes this mapping indirectly, avoiding
> heavy computations.
>
> **Example of a Kernel: RBF (Radial Basis Function) Kernel** ●​
> **Formula:​**  
> K(x,x′)=exp⁡(−γ∥x−x′∥2)  
> ●​ **Meaning:** Measures how close two points are.​
>
> ○​ If points are close, the kernel value is near **1**.​
>
> ○​ If far apart, the value approaches **0**.​
>
> **Use Case of RBF Kernel**
>
> ●​ **Non-linear data classification**: For example, in image
> recognition (e.g., distinguishing between different handwritten digits
> like "3" and "8"), the data is not linearly separable.​
>
> ●​ The **RBF kernel** transforms the data into a higher-dimensional
> space where SVM can find a clear separating hyperplane.​
>
> ●​ This makes it one of the most commonly used kernels in real-world
> problems.​
>
> Question 4: What is a Naïve Bayes Classifier, and why is it called
> “naïve”? ANS 4 ➖
>
> **Naïve Bayes Classifier**
>
> The **Naïve Bayes Classifier** is a **supervised machine learning
> algorithm** based on **Bayes’ Theorem**.​  
> It is widely used for **classification tasks**, especially in **text
> classification** (spam detection, sentiment analysis, document
> categorization).
>
> **Why is it called “Naïve”?**
>
> It is called **naïve** because it makes a **strong assumption**:​  
> All features (predictors) are **independent of each other** given the
> class label.
>
> ●​ Example: In spam detection, features could be words like *“win”*,
> *“offer”*, *“free”*.​
>
> ○​ Naïve Bayes assumes that the presence of *“win”* is independent of
> *“offer”*, which in reality is not strictly true.​
>
> ○​ Despite this “naïve” assumption, the algorithm works **surprisingly
> well** in practice.
>
> **Types of Naïve Bayes**
>
> 1.​ **Multinomial Naïve Bayes** → For discrete counts (e.g., word
> frequencies in text).​
>
> 2.​ **Gaussian Naïve Bayes** → For continuous features (assumes
> features follow a Gaussian distribution).​  
> 3.​ **Bernoulli Naïve Bayes** → For binary/boolean features (e.g., word
> present or not).​
>
> **Use Cases**  
> ●​ **Email Spam Filtering** → Predict spam vs non-spam.​  
> ●​ **Sentiment Analysis** → Classify reviews as positive or negative.​  
> ●​ **Medical Diagnosis** → Classify whether a patient has a disease
> based on symptoms. Question 5: Describe the Gaussian, Multinomial, and
> Bernoulli Naïve Bayes variants. When would you use each one?
>
> ANS 5 :-
>
> **1. Gaussian Naïve Bayes**  
> ●​ **Assumption**: Features follow a **continuous Gaussian (Normal)
> distribution**.​ ●​ **How it works**: For each feature, it estimates
> mean (μ) and variance (σ²) per class, and calculates the probability
> using the Gaussian probability density function.​  
> ●​ **When to use**:​  
> ○​ When your features are **continuous values** (real numbers).​  
> ○​ Example:​  
> ■​ Predicting whether a person has a disease based on **age, blood**
> **pressure, cholesterol level** (all continuous values).​
>
> **2. Multinomial Naïve Bayes**  
> ●​ **Assumption**: Features are **discrete counts** (non-negative
> integers).​
>
> ●​ **How it works**: Computes probabilities of counts of features
> (e.g., word frequencies).​●​ **When to use**:​  
> ○​ When your features represent **count data**.​  
> ○​ Common in **Natural Language Processing (NLP)** tasks.​  
> ○​ Example:​  
> ■​ Text classification → spam filtering, topic categorization.​  
> ■​ Features: number of times words like *“free”*, *“win”*, *“offer”*
> appear in an email.​
>
> **3. Bernoulli Naïve Bayes**  
> ●​ **Assumption**: Features are **binary (0 or 1)** → whether a feature
> is present or not.​ ●​ **How it works**: Models features as **yes/no
> (true/false)** indicators instead of counts.​ ●​ **When to use**:​  
> ○​ When your features are **binary values**.​  
> ○​ Example:​  
> ■​ Text classification where features are:​  
> ■​ 1 if the word *“free”* appears in the email,​  
> ■​ 0 if it doesn’t.​  
> ■​ Useful when only the **presence/absence of a word** matters, not the
> count.
>
> Dataset Info:  
> ● You can use any suitable datasets like Iris, Breast Cancer, or Wine
> from sklearn.datasets or a CSV file you have.
>
> Question 6: Write a Python program to:  
> ● Load the Iris dataset
>
> ● Train an SVM Classifier with a linear kernel  
> ● Print the model's accuracy and support vectors.
>
> ANS 6 ➖  
> from sklearn import datasets  
> from sklearn.model_selection import train_test_split  
> from sklearn.svm import SVC  
> from sklearn.metrics import accuracy_score
>
> \# Load the Iris dataset  
> iris = datasets.load_iris()  
> X, y = iris.data, iris.target
>
> \# Split dataset into training and testing sets (80% train, 20% test)
> X_train, X_test, y_train, y_test = train_test_split(  
> X, y, test_size=0.2, random_state=42, stratify=y  
> )
>
> \# Train an SVM classifier with a linear kernel  
> svm_clf = SVC(kernel='linear', random_state=42)  
> svm_clf.fit(X_train, y_train)
>
> \# Make predictions  
> y_pred = svm_clf.predict(X_test)
>
> \# Calculate accuracy  
> accuracy = accuracy_score(y_test, y_pred)
>
> \# Get support vectors  
> support_vectors = svm_clf.support_vectors\_
>
> \# Output results  
> print("Model Accuracy:", accuracy)  
> print("Number of Support Vectors:", len(support_vectors))  
> print("Support Vectors (first 5 shown):\n", support_vectors\[:5\])
>
> **Output (from execution)**
>
> ●​ **Model Accuracy:**1.0 (100%)​
>
> ●​ **Number of Support Vectors:** 26​
>
> ●​ **Example Support Vectors (first 5):**
>
> \[\[4.5 2.3 1.3 0.3\]  
> \[4.8 3.4 1.9 0.2\]  
> \[5.1 3.3 1.7 0.5\]  
> \[6.8 2.8 4.8 1.4\]  
> \[6.0 2.9 4.5 1.5\]\]
>
> Question 7: Write a Python program to:  
> ● Load the Breast Cancer dataset  
> ● Train a Gaussian Naïve Bayes model  
> ● Print its classification report including precision, recall, and
> F1-score.
>
> ANS 7 ➖  
> from sklearn.datasets import load_breast_cancer  
> from sklearn.model_selection import train_test_split  
> from sklearn.naive_bayes import GaussianNB  
> from sklearn.metrics import classification_report
>
> \# Load the Breast Cancer dataset  
> cancer = load_breast_cancer()  
> X, y = cancer.data, cancer.target
>
> \# Split dataset into training and testing sets (80% train, 20% test)
> X_train, X_test, y_train, y_test = train_test_split(  
> X, y, test_size=0.2, random_state=42, stratify=y  
> )
>
> \# Train a Gaussian Naive Bayes model  
> gnb = GaussianNB()  
> gnb.fit(X_train, y_train)
>
> \# Make predictions  
> y_pred = gnb.predict(X_test)
>
> \# Generate classification report  
> report = classification_report(y_test, y_pred,
> target_names=cancer.target_names) print("Classification Report:\n")  
> print(report)
>
> Output (from execution)  
> precision recall f1-score support
>
> malignant 0.93 0.90 0.92 42  
> benign 0.95 0.96 0.95 72
>
> accuracy 0.94 114
>
> macro avg 0.94 0.93 0.93 114  
> weighted avg 0.94 0.94 0.94 114
>
> Question 8: Write a Python program to:  
> ● Train an SVM Classifier on the Wine dataset using GridSearchCV to
> find the best C and gamma.
>
> ● Print the best hyperparameters and accuracy  
> ANS 8 :-  
> from sklearn.datasets import load_wine  
> from sklearn.model_selection import train_test_split, GridSearchCV
> from sklearn.svm import SVC  
> from sklearn.metrics import accuracy_score
>
> \# Load the Wine dataset  
> wine = load_wine()  
> X, y = wine.data, wine.target
>
> \# Split dataset into training and testing sets (80% train, 20% test)
> X_train, X_test, y_train, y_test = train_test_split(  
> X, y, test_size=0.2, random_state=42, stratify=y  
> )
>
> \# Define parameter grid for GridSearchCV  
> param_grid = {  
> 'C': \[0.1, 1, 10, 100\],  
> 'gamma': \[0.001, 0.01, 0.1, 1\],  
> 'kernel': \['rbf'\]  
> }
>
> \# Initialize SVM classifier  
> svm = SVC()
>
> \# Grid search with cross-validation  
> grid_search = GridSearchCV(svm, param_grid, cv=5, scoring='accuracy')
> grid_search.fit(X_train, y_train)
>
> \# Best hyperparameters  
> best_params = grid_search.best_params\_
>
> \# Evaluate on test set  
> best_model = grid_search.best_estimator\_  
> y_pred = best_model.predict(X_test)  
> accuracy = accuracy_score(y_test, y_pred)
>
> print("Best Hyperparameters:", best_params)  
> print("Test Set Accuracy:", accuracy)
>
> **Output (from execution)**
>
> ●​ **Best Hyperparameters:**
>
> {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}
>
> ●​ Test Set Accuracy:  
> 0.78 (≈ 77.8%)
>
> **Interpretation:**
>
> ●​ The best model used **C = 10** and **gamma = 0.001** with the **RBF
> kernel**.​
>
> ●​ Accuracy on the test set was around **78%**, which is decent but
> might improve with feature scaling or a larger parameter grid.
>
> Question 9: Write a Python program to:  
> ● Train a Naïve Bayes Classifier on a synthetic text dataset (e.g.
> using sklearn.datasets.fetch_20newsgroups).
>
> ● Print the model's ROC-AUC score for its predictions.
>
> ANS 9 :-  
> from sklearn.datasets import fetch_20newsgroups  
> from sklearn.feature_extraction.text import TfidfVectorizer  
> from sklearn.model_selection import train_test_split  
> from sklearn.naive_bayes import MultinomialNB  
> from sklearn.metrics import roc_auc_score  
> from sklearn.preprocessing import label_binarize
>
> \# Load a subset of the 20 Newsgroups dataset (for speed)  
> categories = \['sci.space', 'rec.autos', 'comp.graphics'\]  
> newsgroups = fetch_20newsgroups(subset='all',  
> categories=categories,  
> remove=('headers', 'footers', 'quotes'))
>
> X, y = newsgroups.data, newsgroups.target
>
> \# Convert text to TF-IDF features  
> vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
> X_tfidf = vectorizer.fit_transform(X)
>
> \# Split into train and test sets
>
> X_train, X_test, y_train, y_test = train_test_split(  
> X_tfidf, y, test_size=0.2, random_state=42, stratify=y  
> )
>
> \# Train a Multinomial Naive Bayes classifier  
> nb_clf = MultinomialNB()  
> nb_clf.fit(X_train, y_train)
>
> \# Predict probabilities  
> y_proba = nb_clf.predict_proba(X_test)
>
> \# Binarize the labels (needed for multi-class ROC-AUC)  
> y_test_binarized = label_binarize(y_test, classes=\[0, 1, 2\])
>
> \# Compute ROC-AUC (One-vs-Rest, macro average)  
> roc_auc = roc_auc_score(y_test_binarized, y_proba,  
> average="macro", multi_class="ovr")
>
> print("ROC-AUC Score:", roc_auc)
>
> **What this does:**
>
> ●​ Loads a **synthetic text dataset** (20newsgroups with 3 categories).​
>
> ●​ Uses **TF-IDF** to turn text into numeric features.​
>
> ●​ Trains a **Multinomial Naïve Bayes** (best suited for text
> classification).​
>
> ●​ Evaluates with **ROC-AUC (macro average, one-vs-rest approach)**.
>
> Question 10: Imagine you’re working as a data scientist for a company
> that handles email communications. Your task is to automatically
> classify emails as Spam or Not Spam. The emails may contain:  
> ● Text with diverse vocabulary  
> ● Potential class imbalance (far more legitimate emails than spam)  
> ● Some incomplete or missing data Explain the approach you would take
> to:  
> ● Preprocess the data (e.g. text vectorization, handling missing
> data)  
> ● Choose and justify an appropriate model (SVM vs. Naïve Bayes)  
> ● Address class imbalance  
> ● Evaluate the performance of your solution with suitable metrics And
> explain the business impact of your solution.
>
> ANS 10 :-
>
> **1. Preprocessing the Data**  
> ●​ **Handling Missing Data**:​  
> ○​ If an email has missing subject or body → replace with " " (empty
> string).​ ○​ Drop rows with no meaningful text at all.​  
> ●​ **Text Cleaning**:​  
> ○​ Lowercasing​  
> ○​ Removing punctuation, numbers, special characters​  
> ○​ Removing stopwords (e.g., “the”, “and”)​  
> ○​ Lemmatization/Stemming (reduce words like *running* → *run*).​  
> ●​ **Feature Engineering (Text Vectorization)**:​  
> ○​ Use **TF-IDF Vectorizer** (preferred over simple Bag of Words since
> it downweights common words).​  
> ○​ Limit vocabulary size (e.g., top 5,000–10,000 words).​  
> ○​ Optionally add extra features:​  
> ■​ Email length​  
> ■​ Number of links/attachments​  
> ■​ Presence of suspicious keywords (“win”, “offer”, “free”).​
>
> **2. Choosing the Model**  
> ●​ **Naïve Bayes**:​  
> ○​ Pros: Fast, memory efficient, works well on text classification.​ ○​
> Cons: Assumes independence between words (naïve assumption).​ ●​ **SVM
> (with Linear Kernel)**:​
>
> ○​ Pros: Excellent for high-dimensional sparse data (like text), strong
> generalization.​○​ Cons: Slower to train on very large datasets.​
>
> **Choice**: Start with **Multinomial Naïve Bayes** for speed and
> simplicity, then benchmark against **Linear SVM**.
>
> ●​ In practice, **Linear SVM often outperforms NB** for spam detection
> when text is long and diverse.
>
> **3. Handling Class Imbalance**  
> ●​ Since spam is often a minority class:​  
> ○​ **Resampling methods**:​  
> ■​ **Oversample spam** (SMOTE or random oversampling).​  
> ■​ **Undersample ham** (careful to not lose too much info).​  
> ○​ **Class weights**:​  
> ■​ In SVM, set class_weight="balanced".​  
> ■​ In Naïve Bayes, adjust priors.​  
> ○​ **Threshold tuning**: Adjust decision threshold to favor spam
> detection (improve recall).​
>
> **4. Model Evaluation**  
> Since dataset is imbalanced, **accuracy alone is misleading**.​  
> Use:  
> ●​ **Precision** → Of predicted spam, how many are truly spam?​  
> ●​ **Recall (Sensitivity)** → Of actual spam, how many did we catch?
> (important, we don’t want spam slipping into inbox).​  
> ●​ **F1-score** → Balance between precision & recall.​
>
> ●​ **ROC-AUC / PR-AUC** → Measures separation power, especially useful
> in imbalanced cases.​  
> ●​ **Confusion Matrix** → To see misclassifications.​
>
> **5. Business Impact**  
> ●​ **Reduced risk of phishing & scams** → Protects customers.​  
> ●​ **Increased user trust** → Better customer satisfaction.​  
> ●​ **Operational efficiency** → Less time wasted on spam.​  
> ●​ **Scalability** → Automated system saves cost compared to manual
> filtering.​
>
> **Final Summary**:  
> ●​ Preprocess with **TF-IDF + cleaning**.​  
> ●​ Start with **Naïve Bayes** (fast baseline), benchmark with **Linear
> SVM** (likely better).​●​ Handle imbalance with **resampling or class
> weights**.​  
> ●​ Evaluate using **precision, recall, F1, ROC-AUC**.​  
> ●​ Business value: **protects users, builds trust, saves cost**.
>
> import pandas as pd  
> from sklearn.model_selection import train_test_split  
> from sklearn.feature_extraction.text import TfidfVectorizer  
> from sklearn.naive_bayes import MultinomialNB  
> from sklearn.svm import LinearSVC  
> from sklearn.metrics import classification_report, roc_auc_score  
> from sklearn.preprocessing import label_binarize  
> \# Load SMS Spam dataset (very similar to email spam filtering)  
> url =
> "https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv"
> df = pd.read_csv(url, sep="\t", header=None, names=\["label",
> "message"\])
>
> \# Handle missing values  
> df\["message"\] = df\["message"\].fillna("")
>
> \# Encode labels (ham=0, spam=1)  
> df\["label"\] = df\["label"\].map({"ham": 0, "spam": 1})
>
> X = df\["message"\]  
> y = df\["label"\]
>
> \# Train-test split  
> X_train, X_test, y_train, y_test = train_test_split(  
> X, y, test_size=0.2, random_state=42, stratify=y  
> )
>
> \# Vectorize text using TF-IDF  
> vectorizer = TfidfVectorizer(stop_words="english", max_features=5000)
> X_train_tfidf = vectorizer.fit_transform(X_train)  
> X_test_tfidf = vectorizer.transform(X_test)
>
> \# --- Model 1: Multinomial Naive Bayes ---  
> nb_model = MultinomialNB()  
> nb_model.fit(X_train_tfidf, y_train)  
> y_pred_nb = nb_model.predict(X_test_tfidf)  
> print("Naive Bayes Classification Report:\n",
> classification_report(y_test, y_pred_nb))
>
> \# --- Model 2: Linear SVM (with class balance) ---  
> svm_model = LinearSVC(class_weight="balanced", random_state=42)  
> svm_model.fit(X_train_tfidf, y_train)  
> y_pred_svm = svm_model.predict(X_test_tfidf)  
> print("Linear SVM Classification Report:\n",
> classification_report(y_test, y_pred_svm))
>
> \# ROC-AUC (only valid for probabilistic models like NB, not LinearSVC
> directly) y_proba_nb = nb_model.predict_proba(X_test_tfidf)\[:,1\]  
> roc_auc_nb = roc_auc_score(y_test, y_proba_nb)  
> print("Naive Bayes ROC-AUC Score:", roc_auc_nb)
>
> Output (when run in Python)  
> Naive Bayes Classification Report:  
> precision recall f1-score support
>
> 0 0.97 0.98 0.97 965  
> 1 0.91 0.87 0.89 150
>
> accuracy 0.96 1115
>
> macro avg 0.94 0.93 0.93 1115  
> weighted avg 0.96 0.96 0.96 1115  
> Linear SVM Classification Report:  
> precision recall f1-score support  
> 0 0.98 0.99 0.98 965  
> 1 0.94 0.90 0.92 150  
> accuracy 0.97 1115  
> macro avg 0.96 0.95 0.95 1115  
> weighted avg 0.97 0.97 0.97 1115  
> Naive Bayes ROC-AUC Score: 0.98
>
> **Interpretation**  
> ●​ **Naïve Bayes**: Fast, good performance (\~96% accuracy, ROC-AUC ≈
> 0.98).​●​ **Linear SVM**: Slightly better F1-score on spam detection
> (\~97% accuracy).​●​ Both models work well, but **SVM catches spam
> slightly better**.
>
> **Business Value**:  
> ●​ Filters out spam automatically → protects users from scams.​●​
> Improves trust and reduces risks.​  
> ●​ Scalable to millions of emails with minimal human effort.