**Question 1:** What is a Support Vector Machine (SVM), and how does it work?

**Answer:**

A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification, regression, and outlier detection. It is particularly well known for its ability to perform binary classification with high accuracy, even in high-dimensional spaces.

At its core, a Support Vector Machine tries to find the best decision boundary (called a hyperplane) that separates data points of different classes in such a way that the margin (distance between the hyperplane and the nearest data points from each class) is maximized.

**How Does SVM Work**

Step-by-step:
* Input labeled training data.
* Map the data into a high-dimensional space (if needed, using the kernel trick).
* Find the optimal hyperplane that separates classes with the maximum margin.
* Classify new data points based on which side of the hyperplane they fall on.

**Question 2:** Explain the difference between Hard Margin and Soft Margin SVM.

**Answer:**

**Hard Margin SVM**

Concept:
* Assumes that the data is perfectly linearly separable — no overlap, no noise.
* Tries to find a hyperplane that separates the classes with the maximum margin and no misclassification is allowed.

Limitations:
* Very sensitive to outliers.
* Fails if the data isn’t perfectly separable (which is common in real-world data).

**Soft Margin SVM**

Concept:
* Introduced to handle noisy, non-linearly separable data.
* Allows some misclassifications (i.e., some points can be on the wrong side of the margin or even the hyperplane).
* Introduces slack variables (ξᵢ) to tolerate violations.

Adds a penalty:
* Balances between maximizing the margin and minimizing classification error.

**Key Differences Table**

| Feature                             | Hard Margin           | Soft Margin                 |
| ----------------------------------- | --------------------- | --------------------------- |
| **Data Type**                       | Perfectly separable   | Non-separable or noisy      |
| **Tolerance for Misclassification** |  No                  |  Yes                       |
| **Slack Variables**                 |  None                |  Used ($\xi_i$)            |
| **Outlier Sensitivity**             |  Very high          |  More robust              |
| **Use Case**                        | Clean, ideal datasets | Real-world, noisy datasets  |
| **Parameter $C$**                   | Not used              | Used to control flexibility |


**Question 3:** What is the Kernel Trick in SVM? Give one example of a kernel and explain its use case.

**Answer:**

The Kernel Trick is a mathematical technique that allows Support Vector Machines (SVMs) to solve non-linearly separable problems without explicitly mapping data to a higher-dimensional space.

Instead of transforming the data manually, the kernel trick uses a kernel function to compute the dot product of the data in the higher-dimensional space — efficiently and implicitly.

In many real-world problems, data cannot be separated by a straight line (or hyperplane) in its original space.

Rather than mapping all data points to a high-dimensional space (which is computationally expensive), the kernel trick lets us:
* Use non-linear boundaries in the original space.
* Keep the computation efficient and scalable.

**Example: Radial Basis Function (RBF) Kernel**

Formula:

    K(xi,xj)=exp(−γ ∥ xi −x j ∥^2 )

* γ controls the influence of a single training example.
  * Large γ: closer points matter more (tighter boundaries).
  * Small γ: points farther away still affect the decision boundary.

Use Case:
* RBF kernel is widely used for non-linear classification problems, especially when:
  * The data is not linearly separable.
  * You have complex, curved decision boundaries.
* Example: Image recognition, bioinformatics, handwriting classification.



**Question 4:** What is a Naïve Bayes Classifier, and why is it called “naïve”?

**Answer:**

A Naïve Bayes Classifier is a probabilistic machine learning algorithm used for classification tasks, based on Bayes' Theorem.

Bayes' Theorem provides a way to calculate the probability of a class given a set of features, using prior knowledge:

    P(C∣X) = P(X∣C)⋅P(C) / P(X)

Where:

* P(C∣X): Posterior probability of class C given data X
* P(X∣C): Likelihood of data X given class C
* P(C): Prior probability of class C
* P(X): Evidence (probability of the data, can be treated as a normalizing constant)

**Why is it called “naïve”**

It’s called naïve because it assumes that all features are conditionally independent of each other given the class.

That is:

    P(X1,X2,...,Xn∣C) = P(X1∣C)⋅P(X2∣C)⋅…⋅P(Xn∣C)

This assumption is rarely true in real-world data, but it greatly simplifies the computation, and surprisingly, Naïve Bayes often works very well in practice — especially with high-dimensional data like text.



**Question 5:** Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants.
When would you use each one?

**Answer:**

Gaussian Naive Bayes is for continuous data, Multinomial Naive Bayes is for discrete count data (like word frequencies in text), and Bernoulli Naive Bayes is for binary/boolean features.

**Gaussian Naive Bayes:**

Description:

Assumes features are normally distributed within each class and uses the mean and variance of each feature to calculate probabilities.

Use Cases:

Best for continuous data like sensor readings, measurements, or any data that can be reasonably approximated by a normal distribution.

Example:

Predicting house prices based on features like area, number of bedrooms, etc., where these features are continuous.

**Multinomial Naive Bayes:**

Description:

Uses the multinomial distribution to model discrete data, often used for text classification where features represent word counts or term frequencies.

Use Cases:

Text classification (spam detection, sentiment analysis), document categorization, where feature vectors represent the frequency of words or terms.

Example:

Classifying emails as spam or not spam based on the frequency of certain words.

**Bernoulli Naive Bayes:**

Description: Suitable for binary or boolean features (presence/absence of a feature).

Use Cases: Feature vectors where each feature represents a binary outcome, like whether a word appears in a document (1 for present, 0 for absent).

Example: Determining if a document is relevant to a topic based on the presence or absence of specific keywords.

**Dataset Info:**
* You can use any suitable datasets like Iris, Breast Cancer, or Wine from sklearn.datasets or a CSV file you have.

**Question 6:** Write a Python program to:
* Load the Iris dataset
* Train an SVM Classifier with a linear kernel
* Print the model's accuracy and support vectors.

(Include your Python code and output in the code box below.)

**Answer:**


In [361]:
# Load the Iris dataset
from sklearn.datasets import load_iris
iris = load_iris()

In [362]:
X, y = iris.data, iris.target

In [363]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [364]:
# Split dataset into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [365]:
# Train an SVM Classifier with a linear kernel
from sklearn.svm import SVC
svm_model = SVC(kernel='linear')
svm_model

In [366]:
svm_model.fit(X_train, y_train)

In [367]:
# Predict on the test set
y_pred = svm_model.predict(X_test)
y_pred

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
       0])

In [368]:
# Calculate accuracy
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
accuracy

1.0

**Question 7:** Write a Python program to:
* Load the Breast Cancer dataset
* Train a Gaussian Naïve Bayes model
* Print its classification report including precision, recall, and F1-score.

(Include your Python code and output in the code box below.)

**Answer:**


In [369]:
# Load the Breast Cancer dataset
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

In [370]:
X, y = data.data, data.target

In [371]:
X

array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,
        1.189e-01],
       [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,
        8.902e-02],
       [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,
        8.758e-02],
       ...,
       [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,
        7.820e-02],
       [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,
        1.240e-01],
       [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,
        7.039e-02]])

In [372]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
       0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,

In [373]:
# Split dataset into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [374]:
# Train a Gaussian Naïve Bayes model
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb

In [375]:
gnb.fit(X_train, y_train)

In [376]:
# Predict on the test set
y_pred = gnb.predict(X_test)
y_pred

array([1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1,
       0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1,
       1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0])

In [377]:
# Print the classification report
from sklearn.metrics import classification_report
cls_rep = classification_report(y_test, y_pred, target_names=data.target_names)
cls_rep

'              precision    recall  f1-score   support\n\n   malignant       0.93      0.90      0.92        63\n      benign       0.95      0.96      0.95       108\n\n    accuracy                           0.94       171\n   macro avg       0.94      0.93      0.94       171\nweighted avg       0.94      0.94      0.94       171\n'

In [378]:
print(cls_rep)

              precision    recall  f1-score   support

   malignant       0.93      0.90      0.92        63
      benign       0.95      0.96      0.95       108

    accuracy                           0.94       171
   macro avg       0.94      0.93      0.94       171
weighted avg       0.94      0.94      0.94       171



**Question 8:** Write a Python program to:
* Train an SVM Classifier on the Wine dataset using GridSearchCV to find the best C and gamma.
* Print the best hyperparameters and accuracy.

(Include your Python code and output in the code box below.)

**Answer:**


In [379]:
# Load the Wine dataset
from sklearn.datasets import load_wine
data = load_wine()

In [380]:
X, y = data.data, data.target

In [381]:
X

array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,
        1.065e+03],
       [1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,
        1.050e+03],
       [1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,
        1.185e+03],
       ...,
       [1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,
        8.350e+02],
       [1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,
        8.400e+02],
       [1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,
        5.600e+02]])

In [382]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2])

In [383]:
# Split into train and test sets
from sklearn.model_selection import train_test_split, GridSearchCV
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [384]:
# Define the parameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1],
    'kernel': ['rbf']
}
param_grid

{'C': [0.1, 1, 10, 100], 'gamma': [0.001, 0.01, 0.1, 1], 'kernel': ['rbf']}

In [385]:
# Set up the GridSearchCV
from sklearn.svm import SVC
grid = GridSearchCV(SVC(), param_grid, cv=5)
grid

In [386]:
grid.fit(X_train, y_train)

In [387]:
# Predict using the best model
best_model = grid.best_estimator_
best_model

In [388]:
y_pred = best_model.predict(X_test)
y_pred

array([2, 0, 2, 0, 1, 0, 1, 2, 0, 0, 2, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1,
       2, 2, 2, 1, 1, 2, 1, 0, 0, 1, 2, 0, 0, 0, 2, 2, 2, 1, 0, 1, 1, 0,
       2, 0, 2, 1, 2, 0, 1, 0, 0, 2])

In [389]:
grid.best_params_

{'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}

In [390]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

0.7777777777777778

**Question 9:** Write a Python program to:
* Train a Naïve Bayes Classifier on a synthetic text dataset (e.g. using sklearn.datasets.fetch_20newsgroups).
* Print the model's ROC-AUC score for its predictions.

(Include your Python code and output in the code box below.)

**Answer:**


In [391]:
# Load a binary classification subset of 20 Newsgroups (e.g., 'sci.space' vs 'rec.sport.baseball')
from sklearn.datasets import fetch_20newsgroups
from sklearn.preprocessing import label_binarize
categories = ['sci.space', 'rec.sport.baseball']
categories

['sci.space', 'rec.sport.baseball']

In [392]:
newsgroups = fetch_20newsgroups(subset='all', categories=categories, remove=('headers', 'footers', 'quotes'))

In [393]:
# Split data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(newsgroups.data, newsgroups.target, test_size=0.3, random_state=42)

In [394]:
# Convert text to TF-IDF features
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

In [395]:
vectorizer

In [396]:
# Train Multinomial Naïve Bayes
from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()
nb.fit(X_train_tfidf, y_train)

In [397]:
# Predict probabilities
y_prob = nb.predict_proba(X_test_tfidf)[:, 1]  # Probabilities for class 1

In [398]:
# Compute ROC-AUC
from sklearn.metrics import roc_auc_score
roc_auc = roc_auc_score(y_test, y_prob)
roc_auc

np.float64(0.9952927331568109)

In [399]:
print(f"ROC-AUC Score: {roc_auc:.2f}")

ROC-AUC Score: 1.00


**Question 10:** Imagine you’re working as a data scientist for a company that handles
email communications.

Your task is to automatically classify emails as Spam or Not Spam. The emails may
contain:
* Text with diverse vocabulary
* Potential class imbalance (far more legitimate emails than spam)
* Some incomplete or missing data

Explain the approach you would take to:
* Preprocess the data (e.g. text vectorization, handling missing data)
* Choose and justify an appropriate model (SVM vs. Naïve Bayes)
* Address class imbalance
* Evaluate the performance of your solution with suitable metrics

  And explain the business impact of your solution.

(Include your Python code and output in the code box below.)

**Answer:**


In [400]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.utils import resample
import numpy as np

In [401]:
# Step 1: Simulate Spam vs Not-Spam emails
categories = ['rec.sport.hockey', 'talk.politics.misc']  # Not spam vs Spam proxy
categories

['rec.sport.hockey', 'talk.politics.misc']

In [402]:
data = fetch_20newsgroups(subset='all', categories=categories, remove=('headers', 'footers', 'quotes'))

In [403]:
X_raw = np.array(data.data)
y_raw = np.array(data.target)  # 0 = not spam, 1 = spam

In [404]:
y_raw

array([1, 1, 0, ..., 0, 1, 0])

In [405]:
# Step 2: Handle missing data
valid_indices = [i for i, text in enumerate(X_raw) if text.strip()]
X = X_raw[valid_indices]
y = y_raw[valid_indices]

In [406]:
# Step 3: Simulate class imbalance (more legit than spam)
X_spam = X[y == 1]
X_ham = X[y == 0]
y_spam = y[y == 1]
y_ham = y[y == 0]

In [407]:
X_ham_down, y_ham_down = resample(X_ham, y_ham, n_samples=len(X_spam)*2, random_state=42)

In [408]:
X_balanced = np.concatenate([X_spam, X_ham_down])
y_balanced = np.concatenate([y_spam, y_ham_down])

In [409]:
y_balanced

array([1, 1, 1, ..., 0, 0, 0])

In [410]:
# Step 4: Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_balanced, y_balanced, test_size=0.3, stratify=y_balanced, random_state=42)

In [411]:
# Step 5: TF-IDF + Naive Bayes pipeline
vectorizer = TfidfVectorizer(stop_words='english', lowercase=True)
vectorizer

In [412]:
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

In [413]:
model = MultinomialNB()
model.fit(X_train_vec, y_train)

In [414]:
# Step 6: Predict and evaluate
y_pred = model.predict(X_test_vec)
y_pred

array([0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0,

In [415]:
y_prob = model.predict_proba(X_test_vec)[:, 1]

In [416]:
print(classification_report(y_test, y_pred, target_names=["Not Spam", "Spam"]))

              precision    recall  f1-score   support

    Not Spam       0.86      1.00      0.92       454
        Spam       1.00      0.67      0.81       227

    accuracy                           0.89       681
   macro avg       0.93      0.84      0.86       681
weighted avg       0.91      0.89      0.88       681



In [417]:
roc_auc = roc_auc_score(y_test, y_prob)
roc_auc

np.float64(0.9987191678472317)