In [None]:
1.  What is a Support Vector Machine (SVM), and how does it work?
  -> A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression.
# Here's a breakdown of how SVMs work:

# 1. Finding the Optimal Hyperplane:
    *  In a multi-dimensional space, a hyperplane acts as a decision boundary.
    *  SVM aims to find the hyperplane that best separates the different classes of data points with the widest possible margin.
    *  The margin is the distance between the hyperplane and the closest data points (support vectors).
    *  Maximizing the margin helps in better generalization of the model, meaning it performs well on unseen data.
# 2. Support Vectors:
    *  These are the data points that lie closest to the hyperplane and influence its position.
    *  SVM focuses on these critical points to define the decision boundary, ignoring other data points.
# 3. Handling Non-linear Data:
     *  For data that is not linearly separable (cannot be separated by a straight line or hyperplane),
       SVM uses kernel functions to map the data into a higher-dimensional space where it becomes linearly separable.
    *  Common kernel functions include polynomial kernels, radial basis function (RBF) kernels, and sigmoid kernels.

    # 4. Types of SVM:
Linear SVM:
   *  Used for linearly separable data.
Non-linear SVM:
     Uses kernel functions to handle non-linearly separable data.
Support Vector Regression (SVR): Used for regression problems, predicting continuous values rather than discrete classes.
# 5. Applications:
Classification:
      Spam detection, image classification, fraud detection.
Regression:
     Predicting house prices, stock prices.


In [None]:
2.   Explain the difference between Hard Margin and Soft Margin SVM.
    #  ->  Hard Margin SVM:

# Assumes linear separability:
      Hard margin SVMs can only be applied to datasets where a hyperplane can perfectly divide the data into classes without any errors.
# Maximizes margin:
      It aims to find the hyperplane that maximizes the distance (margin) between the hyperplane and the closest data points of each class.
# Sensitive to outliers:
     If even a single data point is misclassified, the hard margin SVM will fail to find a suitable hyperplane.
# Not robust to noise:
     It struggles with datasets that have outliers or data points that are not linearly separable.
     This video explains how hard margin SVM works and how it can fail with non-linearly separable data


# Soft Margin SVM:

# Handles non-linear separability:
     Soft margin SVMs are designed to work with datasets that are not perfectly linearly separable.
# Introduces slack variables:
     It allows for some misclassifications by introducing "slack variables" which penalize misclassified data points in the optimization process.
# Balances margin maximization and error minimization:
     Soft margin SVMs find a balance between maximizing the margin and minimizing the number of misclassifications.
# More robust to outliers:
     By allowing some misclassifications, soft margin SVMs are more robust to outliers and noisy data compared to hard margin SVMs




In [None]:
3.  What is the Kernel Trick in SVM? Give one example of a kernel and explain its use case.
   ->  The Kernel Trick is a mathematical technique used in SVM to handle non-linearly separable data.
    * The Kernel Trick in Support Vector Machines (SVMs) is a technique that allows SVMs to find non-linear
      decision boundaries in the original feature space by implicitly mapping the data into a higher-dimensional
       feature space where it becomes linearly separable.

  # EXAMPLE:
       One common example of a kernel function is the Radial Basis Function (RBF) kernel, also known as the Gaussian kernel.
    # Its formula is:

    K(x_i, x_j) = exp(-gamma * ||x_i - x_j||^2)
# where:
   *  x_i and x_j are two data points.
   *  ||x_i - x_j||^2 is the squared Euclidean distance between x_i and x_j.
   *  gamma is a hyperparameter that controls the influence of individual training samples.


# Use Case:
The RBF kernel is particularly useful in situations where the data is not linearly separable and exhibits a complex,
 non-linear relationship between features and classes. For instance, consider a dataset where data points of one class
  are clustered in the center, while data points of another class form a ring around this cluster. In the original 2D space,
  a linear boundary cannot separate these classes. The RBF kernel implicitly maps this data into a higher-dimensional space
  where a linear hyperplane can effectively separate the central
 cluster from the surrounding ring, resulting in a circular or non-linear decision boundary in the original 2D space.


In [None]:
4.  What is a Naïve Bayes Classifier, and why is it called “naïve”?
   -> A Naive Bayes classifier is a simple probabilistic classifier that applies Bayes
     ' theorem with strong independence assumptions between the features
# Bayes' Theorem:
   The core of the classifier is based on Bayes' theorem, which calculates the probability of a hypothesis (in this case, a class label) given some observed evidence (the features).
  #  Independence Assumption:
   The "naive" part comes from the assumption that the features are independent of each other, meaning the presence or absence of one feature doesn't affect the probability of another feature, given the class label.
# Why "Naive"?
   This assumption is often not true in real-world data, where features are often correlated.
   However, despite this simplification,
   Naive Bayes classifiers can be surprisingly effective, especially in text classification and other domains with high dimensionality.
# Example:
Imagine classifying emails as spam or not spam. A Naive Bayes classifier might consider features
 like the presence of certain words (e.g., "discount," "free") and the sender's address. It would assume that the presence of
  "discount" in an email is independent of the sender's address, which might not always be the case, but the classifier still works well.
# Benefits:
    Naive Bayes classifiers are computationally efficient, easy to implement, and perform well with high-dimensional data.
   They are also robust to irrelevant features, meaning they can still perform reasonably well even if some features are not informative

In [None]:
5.  Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants. When would you use each one?

   -> Gaussian Naive Bayes is used for continuous data assuming a normal distribution, Multinomial Naive Bayes handles discrete data with counts
       (like word frequencies), and Bernoulli Naive Bayes is appropriate for binary or boolean features representing the presence or absence of an attribute.

# Gaussian Naive Bayes:

# Data Type:
   Continuous, numerical data that follows a normal (Gaussian) distribution.
# Example:
   Predicting flower species based on petal length and width, where these measurements are assumed to be normally distributed.
# How it works:
   It assumes that each feature within a class is normally distributed and uses the Gaussian Probability Density Function (PDF)
  to calculate the likelihood of a data point belonging to a specific class.
# When to use:
   When dealing with data where features are continuous and approximately normally distributed, such as sensor readings,
    test scores, or financial data.

# Multinomial Naive Bayes:

# Data Type:
    Discrete data with counts or frequencies, like word counts in text documents.
# Example:
    Spam filtering, where the frequency of certain words (like "free," "discount," etc.) is a strong indicator of spam.
# How it works:
    It uses the Multinomial Probability Mass Function (PMF) to calculate the probability of observing a specific set of feature
   counts (word frequencies) given a class (e.g., spam or not spam).



In [1]:
6.  Write a Python program to:
● Load the Iris dataset
● Train an SVM Classifier with a linear kernel
● Print the model's accuracy and support vectors


from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# 1. Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 2. Train an SVM Classifier with a linear kernel
# C is the regularization parameter. A smaller C means more regularization.
svm_model = SVC(kernel='linear', C=1.0, random_state=42)
svm_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = svm_model.predict(X_test)

# 3. Print the model's accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.4f}")

# Print the support vectors
# Support vectors are the data points closest to the hyperplane.
# svm_model.support_vectors_ contains the coordinates of the support vectors.
print("\nSupport Vectors:")
for sv in svm_model.support_vectors_:
    print(sv)


Model Accuracy: 1.0000

Support Vectors:
[4.8 3.4 1.9 0.2]
[5.1 3.3 1.7 0.5]
[4.5 2.3 1.3 0.3]
[5.6 3.  4.5 1.5]
[5.4 3.  4.5 1.5]
[6.7 3.  5.  1.7]
[5.9 3.2 4.8 1.8]
[5.1 2.5 3.  1.1]
[6.  2.7 5.1 1.6]
[6.3 2.5 4.9 1.5]
[6.1 2.9 4.7 1.4]
[6.5 2.8 4.6 1.5]
[6.9 3.1 4.9 1.5]
[6.3 2.3 4.4 1.3]
[6.3 2.8 5.1 1.5]
[6.3 2.7 4.9 1.8]
[6.  3.  4.8 1.8]
[6.  2.2 5.  1.5]
[6.2 2.8 4.8 1.8]
[6.5 3.  5.2 2. ]
[7.2 3.  5.8 1.6]
[5.6 2.8 4.9 2. ]
[5.9 3.  5.1 1.8]
[4.9 2.5 4.5 1.7]


In [2]:
7.  Write a Python program to:
● Load the Breast Cancer dataset
● Train a Gaussian Naïve Bayes model
● Print its classification report including precision, recall, and F1-score.


# Import necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report

# Step 1: Load the dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Step 2: Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Step 3: Train Gaussian Naïve Bayes model
model = GaussianNB()
model.fit(X_train, y_train)

# Step 4: Make predictions
y_pred = model.predict(X_test)

# Step 5: Print classification report
print("Classification Report:\n")
print(classification_report(y_test, y_pred, target_names=data.target_names))

Classification Report:

              precision    recall  f1-score   support

   malignant       0.93      0.90      0.92        63
      benign       0.95      0.96      0.95       108

    accuracy                           0.94       171
   macro avg       0.94      0.93      0.94       171
weighted avg       0.94      0.94      0.94       171



In [3]:
8. Write a Python program to:
● Train an SVM Classifier on the Wine dataset using GridSearchCV to find the best
C and gamma.
● Print the best hyperparameters and accuracy.



from sklearn.datasets import load_wine
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define the parameter grid for GridSearchCV
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1],
    'kernel': ['rbf']  # RBF kernel is commonly used with C and gamma
}

# Create an SVM classifier
svm = SVC()

# Create GridSearchCV object
grid_search = GridSearchCV(svm, param_grid, cv=5, scoring='accuracy', n_jobs=-1)

# Fit GridSearchCV to the training data
grid_search.fit(X_train, y_train)

# Print the best hyperparameters
print("Best Hyperparameters:", grid_search.best_params_)

# Get the best estimator (model with best hyperparameters)
best_svm = grid_search.best_estimator_

# Make predictions on the test set using the best model
y_pred = best_svm.predict(X_test)

# Calculate and print the accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy on Test Set:", accuracy)

Best Hyperparameters: {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}
Accuracy on Test Set: 0.7777777777777778


In [4]:
9.  Write a Python program to:
● Train a Naïve Bayes Classifier on a synthetic text dataset (e.g. using
sklearn.datasets.fetch_20newsgroups).
● Print the model's ROC-AUC score for its predictions.

# Import libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import roc_auc_score

# Step 1: Load dataset (using a binary classification subset for ROC-AUC)
categories = ['alt.atheism', 'sci.space']  # two categories for binary classification
data = fetch_20newsgroups(subset='all', categories=categories, remove=('headers','footers','quotes'))

X = data.data
y = data.target

# Step 2: Convert text to numerical features using TF-IDF
vectorizer = TfidfVectorizer(stop_words='english')
X_tfidf = vectorizer.fit_transform(X)

# Step 3: Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X_tfidf, y, test_size=0.3, random_state=42
)

# Step 4: Train Naïve Bayes model
model = MultinomialNB()
model.fit(X_train, y_train)

# Step 5: Predict probabilities
y_proba = model.predict_proba(X_test)[:, 1]  # probability for class 1

# Step 6: Calculate ROC-AUC score
roc_auc = roc_auc_score(y_test, y_proba)
print("ROC-AUC Score: {:.4f}".format(roc_auc))


ROC-AUC Score: 0.9918


In [None]:
10.: Imagine you’re working as a data scientist for a company that handles
email communications.
Your task is to automatically classify emails as Spam or Not Spam. The emails may
contain:
● Text with diverse vocabulary
● Potential class imbalance (far more legitimate emails than spam)
● Some incomplete or missing data
Explain the approach you would take to:
● Preprocess the data (e.g. text vectorization, handling missing data)
● Choose and justify an appropriate model (SVM vs. Naïve Bayes)
● Address class imbalance
● Evaluate the performance of your solution with suitable metrics
And explain the business impact of your solution.


# Data Preprocessing:
# Text Cleaning:
Lowercase conversion: Convert all text to lowercase for consistent comparison.
Punctuation removal: Remove punctuation marks that might not be meaningful for classification (e.g., commas, periods, exclamation points).
Stop word removal: Eliminate common words like "the," "and," "a" that don't add much meaning to the classification task.
Stemming/Lemmatization: Normalize words to their root form to reduce variations (e.g., "running," "runs," and "ran" would become "run").
# Text Vectorization:
Bag-of-Words (BoW): Represent each email as a vector where each element corresponds to a word in the vocabulary, and the value represents the word count in the email.
Term Frequency-Inverse Document Frequency (TF-IDF): Weights words based on their frequency in the document and overall corpus, giving more importance to rare but relevant words.
N-grams: Consider sequences of n words (unigrams, bigrams, trigrams) to capture context.
# Handling Missing Data:
Imputation:
     Fill missing values with a predefined value (e.g., "unknown" for text fields, average value for numerical features).
Dropping Missing Values:
     If the percentage of missing values is high in a specific feature, consider dropping that feature.
# Model Selection:
Naïve Bayes:
Advantages:
      Simple, efficient, performs well with high-dimensional data like text, handles sparse data well, and works well with class imbalance.
Disadvantages:
     Assumes independence of features, which might not be entirely true in real-world data.
# Support Vector Machines (SVM):
Advantages:
    Effective for classification tasks, especially with non-linear data, can handle high-dimensional data, and robust to outliers.
Disadvantages:
   Can be computationally expensive for large datasets, requires parameter tuning.
# Addressing Class Imbalance:
Oversampling:
     Replicate minority class examples to balance the dataset.
# Undersampling:
     Randomly remove examples from the majority class.
# Cost-sensitive learning:
    Assign higher weights to misclassified instances of the minority class.
# Evaluation Metrics:
Accuracy:
     Overall proportion of correctly classified emails.
Precision:
     Proportion of predicted spam emails that are actually spam.
Recall:
      Proportion of actual spam emails that are correctly predicted as spam.
F1-Score:
    Harmonic mean of precision and recall, providing a balanced measure of performance.
ROC-AUC (Receiver Operating Characteristic - Area Under Curve): Measures the model's ability to distinguish between spam and non-spam emails at different threshold levels.
# Business Impact:
Reduced Spam:
    Implementing a spam filter significantly reduces the number of unwanted emails received by users, improving user experience and productivity.
# Improved Security:
    Spam emails can contain phishing attempts or malicious links, so a robust spam filter protects users from security threats.
# Cost Savings:
      By filtering out spam, companies can save on storage and processing resources.