Q1 - What is a Support Vector Machine (SVM)?

Ans - SVM is a supervised learning algorithm used for classification and regression tasks. It finds the optimal hyperplane that best separates different classes in a dataset.

Q2. What is the difference between Hard Margin and Soft Margin SVM?

Ans - Hard Margin SVM strictly separates classes with a hyperplane, allowing no misclassification but requiring data to be linearly separable.
Soft Margin SVM allows some misclassification by introducing a penalty (C parameter) to handle noise and overlapping data.

Q3. What is the mathematical intuition behind SVM?

Ans - SVM maximizes the margin between the closest points of different classes (support vectors) while minimizing classification error using a cost function.

Q4 - What is the role of Lagrange Multipliers in SVM?

Ans - Lagrange multipliers are used to transform SVM optimization into a constrained problem, making it easier to solve using quadratic programming.

Q5 - What are Support Vectors in SVM?

Ans - Support vectors are the data points closest to the decision boundary, determining the hyperplane's position and orientation.

Q6 - What is a Support Vector Classifier (SVC)?

Ans - SVC is an SVM model used for classification tasks.

Q7 - What is a Support Vector Regressor (SVR)?

Ans- SVR is an SVM model used for regression, predicting continuous values while maintaining a margin of tolerance.

Q8 - What is the Kernel Trick in SVM?

Ans - The Kernel Trick maps input data into a higher-dimensional space to make it linearly separable without explicitly computing transformations.

Q9 - Compare Linear Kernel, Polynomial Kernel, and RBF Kernel.

Ans - Linear Kernel: Best for linearly separable data.
Polynomial Kernel: Captures non-linearity but may be computationally expensive.
RBF Kernel: Most commonly used as it works well with non-linear relationships.

Q10 - What is the effect of the C parameter in SVM?

Ans - The C parameter controls the trade-off between maximizing margin and minimizing classification error. A higher C values focus more on correct classification.

Q11 - What is the role of the Gamma parameter in RBF Kernel SVM?
Gamma defines how far the influence of a single training point reaches. Higher gamma results in more complex models, while lower gamma makes a simpler model.

Q12 - What is the Naïve Bayes classifier, and why is it called "Naïve"?

Ans - It is a probabilistic classifier based on Bayes' theorem, assuming that features are independent given the class label, hence "Naïve."

Q13 - What is Bayes’ Theorem?

Ans - Bayes' theorem states that the probability of a class given the data is proportional to the prior probability of the class times the likelihood of the data given the class.

Q14 - Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes.

Ans - Gaussian Naïve Bayes: Assumes features are normally distributed.
Multinomial Naïve Bayes: Used for text classification with word counts.
Bernoulli Naïve Bayes: Works with binary data (presence or absence of features).

Q15 - When should you use Gaussian Naïve Bayes over other variants?
Ans - When the dataset features are continuous and normally distributed.

Q16 - What are the key assumptions made by Naïve Bayes?

Ans - Features are independent.
All features contribute equally.
The prior probabilities of classes are correct.

Q17 - What are the advantages and disadvantages of Naïve Bayes?

Ans - Advantages: Simple, fast, works well with small datasets and text classification.
Disadvantages: Assumes feature independence, which is rarely true.

Q18 - Why is Naïve Bayes a good choice for text classification?
Ans - Because of its efficiency and ability to handle high-dimensional data effectively.

Q19 - Compare SVM and Naïve Bayes for classification tasks:

Ans - SVM is better for complex and large-scale datasets with non-linearity.
Naïve Bayes is faster and works well for text classification.

Q20 - How does Laplace Smoothing help in Naïve Bayes?

Ans - It prevents zero probability issues by adding a small smoothing factor (like +1) to all probabilities.

Q21 - Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM classifier
svm_clf = SVC(kernel='linear')
svm_clf.fit(X_train, y_train)

# Predict and evaluate accuracy
y_pred = svm_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Q22 - Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
wine = datasets.load_wine()
X, y = wine.data, wine.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM classifiers with different kernels
svm_linear = SVC(kernel='linear')
svm_rbf = SVC(kernel='rbf')

svm_linear.fit(X_train, y_train)
svm_rbf.fit(X_train, y_train)

# Predictions
y_pred_linear = svm_linear.predict(X_test)
y_pred_rbf = svm_rbf.predict(X_test)

# Compare accuracies
accuracy_linear = accuracy_score(y_test, y_pred_linear)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

print(f"Linear Kernel Accuracy: {accuracy_linear:.2f}")
print(f"RBF Kernel Accuracy: {accuracy_rbf:.2f}")

Q23 - Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean Squared Error (MSE).

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Load dataset
housing = datasets.fetch_california_housing()
X, y = housing.data, housing.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVR model
svr = SVR(kernel='rbf')
svr.fit(X_train, y_train)

# Predict and evaluate MSE
y_pred = svr.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

Q24 -  Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision boundary.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.svm import SVC

# Create dataset
X, y = make_moons(n_samples=200, noise=0.2, random_state=42)

# Train SVM classifier with polynomial kernel
svm_poly = SVC(kernel='poly', degree=3)
svm_poly.fit(X, y)

# Plot decision boundary
xx, yy = np.meshgrid(np.linspace(X[:, 0].min(), X[:, 0].max(), 100),
                     np.linspace(X[:, 1].min(), X[:, 1].max(), 100))
Z = svm_poly.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')
plt.title("SVM with Polynomial Kernel")
plt.show()

Q25 - Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
breast_cancer = datasets.load_breast_cancer()
X, y = breast_cancer.data, breast_cancer.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Gaussian Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Predict and evaluate accuracy
y_pred = gnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Q26 - Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20 Newsgroups dataset.

In [None]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
newsgroups = fetch_20newsgroups(subset='all', categories=['sci.space', 'comp.graphics'])
X, y = newsgroups.data, newsgroups.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create pipeline for text processing and classification
text_clf = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', MultinomialNB())
])

# Train classifier
text_clf.fit(X_train, y_train)

# Predict and evaluate accuracy
y_pred = text_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Q27 - Train an SVM Classifier with different C values and compare decision boundaries visually.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.svm import SVC

# Generate synthetic data
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, n_clusters_per_class=1, random_state=42)

# Define different C values
C_values = [0.1, 1, 10]
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for idx, C in enumerate(C_values):
    svm_clf = SVC(kernel='linear', C=C)
    svm_clf.fit(X, y)

    # Plot decision boundary
    xx, yy = np.meshgrid(np.linspace(X[:, 0].min(), X[:, 0].max(), 100),
                         np.linspace(X[:, 1].min(), X[:, 1].max(), 100))
    Z = svm_clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    axes[idx].contourf(xx, yy, Z, levels=[Z.min(), 0, Z.max()], alpha=0.2)
    axes[idx].scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')
    axes[idx].set_title(f"SVM with C={C}")

plt.show()

Q28 - Train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with binary features.

In [None]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score

# Generate binary dataset
X, y = make_classification(n_samples=200, n_features=10, n_classes=2, random_state=42)
X = (X > 0).astype(int)  # Convert to binary features

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train BernoulliNB classifier
bnb = BernoulliNB()
bnb.fit(X_train, y_train)

# Evaluate accuracy
y_pred = bnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Q29 - Apply feature scaling before training an SVM model and compare results with unscaled data.

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

# Train SVM without scaling
svm_unscaled = SVC(kernel='rbf')
svm_unscaled.fit(X_train, y_train)
accuracy_unscaled = accuracy_score(y_test, svm_unscaled.predict(X_test))

# Train SVM with scaling
pipeline = Pipeline([('scaler', StandardScaler()), ('svm', SVC(kernel='rbf'))])
pipeline.fit(X_train, y_train)
accuracy_scaled = accuracy_score(y_test, pipeline.predict(X_test))

print(f"Unscaled Accuracy: {accuracy_unscaled:.2f}")
print(f"Scaled Accuracy: {accuracy_scaled:.2f}")

Q30 - Train a Gaussian Naïve Bayes model and compare predictions before and after Laplace Smoothing.

In [None]:
from sklearn.naive_bayes import GaussianNB

# Train GaussianNB without Laplace smoothing
gnb_no_smooth = GaussianNB(var_smoothing=1e-9)  # Default smoothing
gnb_no_smooth.fit(X_train, y_train)
accuracy_no_smooth = accuracy_score(y_test, gnb_no_smooth.predict(X_test))

# Train GaussianNB with increased Laplace smoothing
gnb_smooth = GaussianNB(var_smoothing=1e-2)
gnb_smooth.fit(X_train, y_train)
accuracy_smooth = accuracy_score(y_test, gnb_smooth.predict(X_test))

print(f"Accuracy without Laplace Smoothing: {accuracy_no_smooth:.2f}")
print(f"Accuracy with Laplace Smoothing: {accuracy_smooth:.2f}")

Q31 - Train an SVM Classifier and use GridSearchCV to tune hyperparameters (C, gamma, kernel).

In [None]:
from sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {'C': [0.1, 1, 10], 'gamma': ['scale', 'auto'], 'kernel': ['linear', 'rbf']}

# Grid search with cross-validation
grid_search = GridSearchCV(SVC(), param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Best parameters and accuracy
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Accuracy: {grid_search.best_score_:.2f}")

Q32 - Train an SVM Classifier on an imbalanced dataset and apply class weighting.

In [None]:
from sklearn.utils.class_weight import compute_class_weight

# Generate imbalanced dataset
X_imb, y_imb = make_classification(n_samples=1000, weights=[0.9, 0.1], n_classes=2, random_state=42)

# Compute class weights
class_weights = compute_class_weight(class_weight='balanced', classes=np.unique(y_imb), y=y_imb)
class_weight_dict = {i: weight for i, weight in enumerate(class_weights)}

# Train SVM with class weighting
svm_weighted = SVC(kernel='rbf', class_weight=class_weight_dict)
svm_weighted.fit(X_train, y_train)
accuracy_weighted = accuracy_score(y_test, svm_weighted.predict(X_test))

print(f"Weighted SVM Accuracy: {accuracy_weighted:.2f}")

Q33 - Implement a Naïve Bayes classifier for spam detection using email data.

In [None]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.pipeline import Pipeline

# Example email dataset (use actual spam dataset if available)
emails = ["Free money now!!!", "Hey, are we meeting tomorrow?", "Limited time offer, claim now!", "Project update attached"]
labels = [1, 0, 1, 0]  # 1 = spam, 0 = not spam

# Train Naïve Bayes classifier
spam_clf = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', MultinomialNB())
])
spam_clf.fit(emails, labels)

# Test on new email
print(spam_clf.predict(["Win a free iPhone!"]))

Q34 - Train an SVM Classifier and a Naïve Bayes Classifier on the same dataset and compare accuracy.

In [None]:
from sklearn.naive_bayes import GaussianNB

# Train SVM
svm_clf = SVC(kernel='rbf')
svm_clf.fit(X_train, y_train)
accuracy_svm = accuracy_score(y_test, svm_clf.predict(X_test))

# Train Naïve Bayes
nb_clf = GaussianNB()
nb_clf.fit(X_train, y_train)
accuracy_nb = accuracy_score(y_test, nb_clf.predict(X_test))

print(f"SVM Accuracy: {accuracy_svm:.2f}")
print(f"Naïve Bayes Accuracy: {accuracy_nb:.2f}")

Q35 - Perform feature selection before training a Naïve Bayes classifier and compare results.

In [None]:
from sklearn.feature_selection import SelectKBest, chi2

# Feature selection
selector = SelectKBest(chi2, k=5)  # Select top 5 features
X_train_selected = selector.fit_transform(X_train, y_train)
X_test_selected = selector.transform(X_test)

# Train Naïve Bayes classifier with selected features
nb_selected = GaussianNB()
nb_selected.fit(X_train_selected, y_train)
accuracy_selected = accuracy_score(y_test, nb_selected.predict(X_test_selected))

print(f"Accuracy with Feature Selection: {accuracy_selected:.2f}")

Q36 - Train an SVM Classifier using One-vs-Rest (OvR) and One-vs-One (OvO) on the Wine dataset.

In [None]:
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load Wine dataset
wine = datasets.load_wine()
X, y = wine.data, wine.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train One-vs-Rest (OvR)
ovr_clf = OneVsRestClassifier(SVC(kernel='linear'))
ovr_clf.fit(X_train, y_train)
ovr_acc = accuracy_score(y_test, ovr_clf.predict(X_test))

# Train One-vs-One (OvO)
ovo_clf = OneVsOneClassifier(SVC(kernel='linear'))
ovo_clf.fit(X_train, y_train)
ovo_acc = accuracy_score(y_test, ovo_clf.predict(X_test))

print(f"OvR Accuracy: {ovr_acc:.2f}")
print(f"OvO Accuracy: {ovo_acc:.2f}")

Q37 - Train an SVM Classifier using Linear, Polynomial, and RBF kernels on the Breast Cancer dataset.

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC

# Load dataset
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target

# Train and compare kernels
kernels = ['linear', 'poly', 'rbf']
for kernel in kernels:
    svm_clf = SVC(kernel=kernel)
    svm_clf.fit(X_train, y_train)
    acc = accuracy_score(y_test, svm_clf.predict(X_test))
    print(f"{kernel.capitalize()} Kernel Accuracy: {acc:.2f}")

Q38 - Train an SVM Classifier using Stratified K-Fold Cross-Validation and compute average accuracy.

In [None]:
from sklearn.model_selection import StratifiedKFold, cross_val_score

# Stratified K-Fold
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
svm_clf = SVC(kernel='linear')

# Cross-validation accuracy
scores = cross_val_score(svm_clf, X, y, cv=skf, scoring='accuracy')
print(f"Average Accuracy: {scores.mean():.2f}")

Q39 - Train a Naïve Bayes classifier using different prior probabilities and compare performance.

In [None]:
from sklearn.naive_bayes import GaussianNB

# Define different priors
priors_list = [[0.7, 0.3], [0.5, 0.5], [0.3, 0.7]]

for priors in priors_list:
    nb_clf = GaussianNB(priors=priors)
    nb_clf.fit(X_train, y_train)
    acc = accuracy_score(y_test, nb_clf.predict(X_test))
    print(f"Prior {priors} - Accuracy: {acc:.2f}")

Q40 - Perform Recursive Feature Elimination (RFE) before training an SVM Classifier.

In [None]:
from sklearn.feature_selection import RFE

# Perform RFE
svm_clf = SVC(kernel='linear')
rfe = RFE(estimator=svm_clf, n_features_to_select=5)
X_train_rfe = rfe.fit_transform(X_train, y_train)
X_test_rfe = rfe.transform(X_test)

# Train SVM with selected features
svm_clf.fit(X_train_rfe, y_train)
acc_rfe = accuracy_score(y_test, svm_clf.predict(X_test_rfe))

print(f"Accuracy after RFE: {acc_rfe:.2f}")

Q41 - Train an SVM Classifier and evaluate using Precision, Recall, and F1-Score.

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

# Train SVM
svm_clf = SVC(kernel='rbf')
svm_clf.fit(X_train, y_train)
y_pred = svm_clf.predict(X_test)

# Compute metrics
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print(f"Precision: {precision:.2f}, Recall: {recall:.2f}, F1-Score: {f1:.2f}")

Q42 - Train a Naïve Bayes Classifier and evaluate using Log Loss (Cross-Entropy Loss).

In [None]:
from sklearn.metrics import log_loss

# Train Naïve Bayes
nb_clf = GaussianNB()
nb_clf.fit(X_train, y_train)

# Compute Log Loss
y_prob = nb_clf.predict_proba(X_test)
log_loss_value = log_loss(y_test, y_prob)

print(f"Log Loss: {log_loss_value:.2f}")

Q43 - Train an SVM Classifier and visualize the Confusion Matrix using seaborn.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

# Compute Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

# Plot using seaborn
plt.figure(figsize=(5,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=cancer.target_names, yticklabels=cancer.target_names)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

Q44 - Train an SVM Regressor (SVR) and evaluate using Mean Absolute Error (MAE).

In [None]:
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error
from sklearn.datasets import make_regression

# Generate regression data
X_reg, y_reg = make_regression(n_samples=200, n_features=5, noise=0.1, random_state=42)
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)

# Train SVR
svr = SVR(kernel='rbf')
svr.fit(X_train_reg, y_train_reg)

# Compute MAE
y_pred_reg = svr.predict(X_test_reg)
mae = mean_absolute_error(y_test_reg, y_pred_reg)

print(f"Mean Absolute Error (MAE): {mae:.2f}")

Q45 - Train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC score.

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import label_binarize

# Train Naïve Bayes
nb_clf = GaussianNB()
nb_clf.fit(X_train, y_train)

# Predict probabilities
y_prob = nb_clf.predict_proba(X_test)

# Binarize labels for multi-class classification
y_test_bin = label_binarize(y_test, classes=[0, 1, 2])

# Compute ROC-AUC score for each class and average
roc_auc = roc_auc_score(y_test_bin, y_prob, multi_class="ovr")

print(f"ROC-AUC Score: {roc_auc:.2f}")

Q46 - Train an SVM Classifier and visualize the Precision-Recall Curve.

In [None]:
import matplotlib.pyplot as plt
from sklearn.metrics import precision_recall_curve, auc
from sklearn.svm import SVC

# Train SVM Classifier
svm_clf = SVC(kernel='rbf', probability=True)
svm_clf.fit(X_train, y_train)

# Predict probabilities
y_prob = svm_clf.predict_proba(X_test)[:, 1]  # Taking probability of class 1

# Compute Precision-Recall curve
precision, recall, _ = precision_recall_curve(y_test, y_prob)

# Compute AUC
pr_auc = auc(recall, precision)

# Plot Precision-Recall Curve
plt.plot(recall, precision, marker='.', label=f'PR AUC = {pr_auc:.2f}')
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve")
plt.legend()
plt.grid()
plt.show()