In [None]:
# 1 What is a Support Vector Machine (SVM), and how does it work
"""A Support Vector Machine (SVM) is a supervised machine learning algorithm used mainly for classification.
It works by finding the best hyperplane that separates classes of data with the maximum margin.
The closest points to this hyperplane are called support vectors, which define the boundary.
For non-linear data, SVM uses the kernel trick (like polynomial or RBF) to map data into higher dimensions where separation is possible.
A soft margin allows some misclassifications, balancing accuracy and generalization.
Overall, SVM is powerful for high-dimensional data but can be slow on very large or noisy datasets.
"""

'A Support Vector Machine (SVM) is a supervised machine learning algorithm used mainly for classification.\nIt works by finding the best hyperplane that separates classes of data with the maximum margin.\nThe closest points to this hyperplane are called support vectors, which define the boundary.\nFor non-linear data, SVM uses the kernel trick (like polynomial or RBF) to map data into higher dimensions where separation is possible.\nA soft margin allows some misclassifications, balancing accuracy and generalization.\nOverall, SVM is powerful for high-dimensional data but can be slow on very large or noisy datasets.\n'

In [None]:
# 2 : Explain the difference between Hard Margin and Soft Margin SVM.
"""Hard Margin SVM: Tries to find a hyperplane that perfectly separates the classes with no misclassification. It only works if the data is linearly separable and has no noise/outliers.

Soft Margin SVM: Allows some misclassification or overlap by introducing a tolerance (slack variables). A parameter C controls the trade-off: a high C means fewer misclassifications (narrow margin), while a low C means more tolerance (wider margin, better generalization).
"""

'Hard Margin SVM: Tries to find a hyperplane that perfectly separates the classes with no misclassification. It only works if the data is linearly separable and has no noise/outliers.\n\nSoft Margin SVM: Allows some misclassification or overlap by introducing a tolerance (slack variables). A parameter C controls the trade-off: a high C means fewer misclassifications (narrow margin), while a low C means more tolerance (wider margin, better generalization).\n'

In [None]:
# 3  What is the Kernel Trick in SVM? Give one example of a kernel and explain its use case.
"""The Kernel Trick in SVM is a method that allows the algorithm to handle data that is not linearly separable by mapping it into a higher-dimensional space without explicitly computing that transformation. Instead of transforming the data directly, the kernel trick computes the dot product in the higher-dimensional space using a kernel function, which is much more efficient.

xample: Radial Basis Function (RBF) Kernel

The RBF kernel is defined as:

𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
exp
⁡
(
−
𝛾
∣
∣
𝑥
𝑖
−
𝑥
𝑗
∣
∣
2
)
K(x
i
	​

,x
j
	​

)=exp(−γ∣∣x
i
	​

−x
j
	​

∣∣
2
)
"""

'The Kernel Trick in SVM is a method that allows the algorithm to handle data that is not linearly separable by mapping it into a higher-dimensional space without explicitly computing that transformation. Instead of transforming the data directly, the kernel trick computes the dot product in the higher-dimensional space using a kernel function, which is much more efficient.\n\nxample: Radial Basis Function (RBF) Kernel\n\nThe RBF kernel is defined as:\n\n𝐾\n(\n𝑥\n𝑖\n,\n𝑥\n𝑗\n)\n=\nexp\n\u2061\n(\n−\n𝛾\n∣\n∣\n𝑥\n𝑖\n−\n𝑥\n𝑗\n∣\n∣\n2\n)\nK(x\ni\n\t\u200b\n\n,x\nj\n\t\u200b\n\n)=exp(−γ∣∣x\ni\n\t\u200b\n\n−x\nj\n\t\u200b\n\n∣∣\n2\n)\n'

In [None]:
# 4 What is a Naïve Bayes Classifier, and why is it called “naïve”
"""A Naïve Bayes Classifier is a supervised machine learning algorithm based on Bayes’ Theorem, which predicts the probability that a data point belongs to a certain class. It is especially popular for text classification tasks (like spam filtering, sentiment analysis).
"""

'A Naïve Bayes Classifier is a supervised machine learning algorithm based on Bayes’ Theorem, which predicts the probability that a data point belongs to a certain class. It is especially popular for text classification tasks (like spam filtering, sentiment analysis).\n'

In [None]:
# 5 Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants. When would you use each one
"""aussian Naïve Bayes is used when features are continuous and assumed to follow a normal distribution (e.g., height, weight, medical data).

Multinomial Naïve Bayes is used when features are discrete counts, such as word frequencies in text classification.

Bernoulli Naïve Bayes is used when features are binary (0/1), indicating presence or absence of something (e.g., whether a word appears in a document).
"""

'aussian Naïve Bayes is used when features are continuous and assumed to follow a normal distribution (e.g., height, weight, medical data).\n\nMultinomial Naïve Bayes is used when features are discrete counts, such as word frequencies in text classification.\n\nBernoulli Naïve Bayes is used when features are binary (0/1), indicating presence or absence of something (e.g., whether a word appears in a document).\n'

In [None]:
# 6 You can use any suitable datasets like Iris, Breast Cancer, or Wine from
 #sklearn.datasets or a CSV file you have.
 #Question 6: Write a Python program to:
# Load the Iris dataset
# Train an SVM Classifier with a linear kernel
# Print the model's accuracy and support vectors.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data   # features
y = iris.target # labels

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train an SVM classifier with a linear kernel
svm_clf = SVC(kernel='linear')
svm_clf.fit(X_train, y_train)

# Predict on test data
y_pred = svm_clf.predict(X_test)

# Print accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)

# Print support vectors
print("Support Vectors:\n", svm_clf.support_vectors_)


Model Accuracy: 1.0
Support Vectors:
 [[4.8 3.4 1.9 0.2]
 [5.1 3.3 1.7 0.5]
 [4.5 2.3 1.3 0.3]
 [5.6 3.  4.5 1.5]
 [5.4 3.  4.5 1.5]
 [6.7 3.  5.  1.7]
 [5.9 3.2 4.8 1.8]
 [5.1 2.5 3.  1.1]
 [6.  2.7 5.1 1.6]
 [6.3 2.5 4.9 1.5]
 [6.1 2.9 4.7 1.4]
 [6.5 2.8 4.6 1.5]
 [6.9 3.1 4.9 1.5]
 [6.3 2.3 4.4 1.3]
 [6.3 2.8 5.1 1.5]
 [6.3 2.7 4.9 1.8]
 [6.  3.  4.8 1.8]
 [6.  2.2 5.  1.5]
 [6.2 2.8 4.8 1.8]
 [6.5 3.  5.2 2. ]
 [7.2 3.  5.8 1.6]
 [5.6 2.8 4.9 2. ]
 [5.9 3.  5.1 1.8]
 [4.9 2.5 4.5 1.7]]


In [None]:
# 7 ● Load the Breast Cancer dataset
# Train a Gaussian Naïve Bayes model
# Print its classification report including precision, recall, and F1-score

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X = cancer.data
y = cancer.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train a Gaussian Naive Bayes model
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Print classification report (precision, recall, f1-score)
print("Classification Report:\n")
print(classification_report(y_test, y_pred, target_names=cancer.target_names))


Classification Report:

              precision    recall  f1-score   support

   malignant       0.93      0.90      0.92        63
      benign       0.95      0.96      0.95       108

    accuracy                           0.94       171
   macro avg       0.94      0.93      0.94       171
weighted avg       0.94      0.94      0.94       171



In [None]:
# 8 Train an SVM Classifier on the Wine dataset using GridSearchCV to find the bestC and gamma.
# Print the best hyperparameters and accuracy.
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = datasets.load_wine()
X = wine.data
y = wine.target

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Define parameter grid for GridSearch
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1],
    'kernel': ['rbf']   # RBF kernel for non-linear decision boundaries
}

# Train with GridSearchCV
grid = GridSearchCV(SVC(), param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

# Best hyperparameters
print("Best Hyperparameters:", grid.best_params_)

# Predict on test data
y_pred = grid.best_estimator_.predict(X_test)

# Print accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Test Accuracy:", accuracy)


Best Hyperparameters: {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}
Test Accuracy: 0.7777777777777778


In [10]:
# 9 Train a Naïve Bayes Classifier on a synthetic text dataset (e.g. usingsklearn.datasets.fetch_20newsgroups).
# Print the model's ROC-AUC score for its predictions
# Import libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.preprocessing import label_binarize
from sklearn.metrics import roc_auc_score

# 1. Load dataset
categories = ['alt.atheism', 'comp.graphics', 'sci.space', 'talk.religion.misc']
newsgroups = fetch_20newsgroups(subset='all', categories=categories)
X = newsgroups.data
y = newsgroups.target

# 2. Convert text to TF-IDF features
vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
X_tfidf = vectorizer.fit_transform(X)

# 3. Split dataset
X_train, X_test, y_train, y_test = train_test_split(X_tfidf, y, test_size=0.2, random_state=42)

# 4. Train Naïve Bayes classifier
nb_model = MultinomialNB()
nb_model.fit(X_train, y_train)

# 5. Predict probabilities for ROC-AUC
y_prob = nb_model.predict_proba(X_test)

# 6. Binarize labels for multi-class ROC-AUC
y_test_bin = label_binarize(y_test, classes=[0,1,2,3])

# 7. Compute ROC-AUC score (macro-average)
roc_auc = roc_auc_score(y_test_bin, y_prob, average='macro', multi_class='ovr')

print("ROC-AUC Score (Macro, One-vs-Rest):", roc_auc)


ROC-AUC Score (Macro, One-vs-Rest): 0.9921642075021605


In [11]:
# 10
"""Email Spam Detection

Data & Preprocessing:

Clean email text, remove missing emails.

Convert text to TF-IDF vectors with unigrams and bigrams.

Model Choice:

Multinomial Naïve Bayes: works well for sparse text data.

Class Imbalance Handling:

Use class_weight='balanced' (if using SVM).

Oversample spam emails using SMOTE or simple duplication.

Evaluation Metrics:

Precision, Recall, F1-Score for the spam class.

ROC-AUC for model separability.

Business Impact:

Reduce spam reaching inbox, improve user trust.

Protect users from phishing/malware.

Example 2: Customer Churn Prediction

Data & Preprocessing:

Handle missing values in demographics and usage patterns (impute with median or mode).

Encode categorical variables with one-hot encoding.

Standardize numerical features.

Model Choice:

Random Forest Classifier: robust, handles mixed data types, interpretable feature importance.

Class Imbalance Handling:

Use class_weight='balanced' or oversample churned customers.

Evaluation Metrics:

Precision/Recall/F1 for churned customers.

ROC-AUC to measure overall performance.

Business Impact:

Proactively retain customers, reducing revenue loss.

Optimize marketing campaigns.

Example 3: Fraud Transaction Detection

Data & Preprocessing:

Handle missing transaction metadata.

Normalize numerical features (amount, time).

Encode categorical fields like merchant type.

Model Choice:

Gradient Boosting (XGBoost or LightGBM): effective for rare event detection.

Class Imbalance Handling:

Oversample fraud cases or set scale_pos_weight.

Threshold tuning to minimize false negatives.

Evaluation Metrics:

Recall and Precision for fraudulent transactions.

ROC-AUC and PR-AUC due to extreme imbalance.

Business Impact:

Prevent financial loss, protect customers.

Reduce chargeback costs and increase trust.

Example 4: Sentiment Analysis for Product Reviews

Data & Preprocessing:

Clean review text, remove stopwords/punctuation.

Convert to TF-IDF or embeddings (e.g., Word2Vec, BERT).

Model Choice:

SVM with linear kernel: performs well on high-dimensional text data.

Class Imbalance Handling:

Weighted SVM or oversample minority sentiment class.

Evaluation Metrics:

Accuracy, F1-score for positive/negative classes.

Confusion matrix to understand misclassification.

Business Impact:

Understand customer sentiment to improve products.

Inform marketing strategies and product improvements.
"""

"Email Spam Detection\n\nData & Preprocessing:\n\nClean email text, remove missing emails.\n\nConvert text to TF-IDF vectors with unigrams and bigrams.\n\nModel Choice:\n\nMultinomial Naïve Bayes: works well for sparse text data.\n\nClass Imbalance Handling:\n\nUse class_weight='balanced' (if using SVM).\n\nOversample spam emails using SMOTE or simple duplication.\n\nEvaluation Metrics:\n\nPrecision, Recall, F1-Score for the spam class.\n\nROC-AUC for model separability.\n\nBusiness Impact:\n\nReduce spam reaching inbox, improve user trust.\n\nProtect users from phishing/malware.\n\nExample 2: Customer Churn Prediction\n\nData & Preprocessing:\n\nHandle missing values in demographics and usage patterns (impute with median or mode).\n\nEncode categorical variables with one-hot encoding.\n\nStandardize numerical features.\n\nModel Choice:\n\nRandom Forest Classifier: robust, handles mixed data types, interpretable feature importance.\n\nClass Imbalance Handling:\n\nUse class_weight='bala