1.What is a Support Vector Machine (SVM)?
-F

ANS:A Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used for both classification and regression tasks. It works by finding the optimal hyperplane that best separates data points into different classes, maximizing the margin between them. This margin maximization helps the SVM generalize well to new, unseen data.

2: What is the difference between Hard Margin and Soft Margin SVM.-F

In Support Vector Machines (SVM), both hard margin and soft margin aim to find a hyperplane that best separates data points into different classes. The key difference lies in their handling of misclassifications and linearly separable data. Hard margin SVM requires perfectly separable data and finds the widest possible margin, while soft margin SVM allows for some misclassifications to handle cases where data isn't perfectly separable or when a wide margin is desired.

3. What is the mathematical intuition behind SVM.IF

ANS:The core mathematical idea behind Support Vector Machines (SVM) is to find the optimal hyperplane that best separates data points into different classes, maximizing the margin between the hyperplane and the closest data points (support vectors). This margin acts as a buffer, improving the model's ability to generalize to unseen data and handle noisy data points.

4 What is the role of Lagrange Multipliers in SVM?-F

ANS: Support Vector Machines (SVMs), Lagrange multipliers are used to transform the problem of finding the optimal hyperplane that separates data into different classes into a more convenient form for optimization, specifically the dual problem

5. What are Support Vectors in SVM?-F

ANS:In Support Vector Machines (SVMs), support vectors are the data points that are closest to the decision boundary (hyperplane) separating different classes. These points are crucial because they determine the position and orientation of the hyperplane and ultimately the margin between classes

6. What is a Support Vector Classifier (SVC)?-F

ANS:A Support Vector Classifier (SVC) is a specific type of Support Vector Machine (SVM) used for classification tasks. It aims to find the best hyperplane that separates data points into different classes, maximizing the margin between the hyperplane and the nearest data points (support vectors). Essentially, SVC is an SVM tailored for predicting categorical outcomes.

7. What is a Support Vector Regressor (SVR)?-F

ANS:A Support Vector Regressor (SVR) is a machine learning algorithm used for regression tasks, which aims to predict continuous numerical values. It's an extension of the Support Vector Machine (SVM) algorithm, adapted for regression rather than classification. SVR works by finding a hyperplane that best fits the data while maximizing the margin (the distance between the hyperplane and the nearest data points).

8. What is the Kernel Trick in SVM?-F

ANS:The kernel trick in Support Vector Machines (SVMs) is a technique that allows SVMs to classify data that is not linearly separable by implicitly mapping the data into a higher-dimensional feature space where a linear separator can be used.

9.Compare Linear Kernel, Polynomial Kernel, and RBF Kernel?-F


ANS:Linear, polynomial, and RBF (Radial Basis Function) kernels are fundamental components of Support Vector Machines (SVMs), each with distinct characteristics and applications. Linear kernels are simple and efficient for linearly separable data, while polynomial and RBF kernels handle non-linear data, with RBF often being the default choice due to its ability to capture complex relationships.

10.:What is the effect of the C parameter in SVM?-F


ANS:In Support Vector Machines (SVMs), the C parameter acts as a regularization parameter that controls the trade-off between maximizing the margin (the distance between the hyperplane and the nearest data points) and minimizing the number of misclassifications in the training data.

11.: What is the role of the Gamma parameter in RBF Kernel SVM?


ANS:In an RBF (Radial Basis Function) kernel SVM, the gamma parameter controls the influence of individual training examples on the decision boundary. A small gamma value means that a training example will have a wider influence, creating a smoother, simpler decision boundary. Conversely, a large gamma value means a training example will have a more local influence, resulting in a more complex, potentially overfitted decision boundary.

12.What is the Naïve Bayes classifier, and why is it called "Naïve"?

ANS:The Naïve Bayes classifier is a simple probabilistic classifier that applies Bayes' theorem with the strong assumption of feature independence. It's called "naïve" because it assumes that the presence of one feature in a class is unrelated to the presence of any other feature. This simplification, while often unrealistic, makes the algorithm computationally efficient and surprisingly effective for many real-world problems.

13. What is Bayes’ Theorem?


ANS:Bayes' Theorem is a mathematical formula that describes the probability of an event based on prior knowledge of conditions related to the event.

14: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes?


ANS:Bernoulli Naive bayes is good at handling boolean/binary attributes, while Multinomial Naive bayes is good at handling discrete values and Gaussian naive bayes is good at handling continuous values.

15: When should you use Gaussian Naïve Bayes over other variants?


ANS:Gaussian Naive Bayes is best used when dealing with datasets containing continuous numerical features that are roughly normally distributed. This means the data points tend to cluster around a mean value, forming a bell curve shape. If your features are not continuous or don't follow a normal distribution, other Naive Bayes variants like Multinomial Naive Bayes (for discrete features) or Bernoulli Naive Bayes (for binary features) might be more appropriate.

16. What are the key assumptions made by Naïve Bayes?



ANS:The basic assumption in Naïve Bayes is one of conditional independence between all independent variable features. Conditional independence ensures that how one feature affects an outcome in no way interacts with how another variable affects the same outcome.

17: What are the advantages and disadvantages of Naïve Bayes?


ANS:Naive Bayes classifiers offer advantages like simplicity, speed, and efficiency with large datasets, especially in text-based applications. However, they are based on the strong assumption of feature independence, which rarely holds true in real-world scenarios, potentially impacting accuracy.

18 Why is Naïve Bayes a good choice for text classification?


ANS:Naïve Bayes is a popular choice for text classification due to its simplicity, speed, and effectiveness with high-dimensional data like text. Its ability to handle large datasets and its relatively low computational cost make it well-suited for tasks like spam filtering, sentiment analysis, and document categorization.

19 Compare SVM and Naïve Bayes for classification tasks.


ANS:SVM and Naive Bayes are both popular classification algorithms, but they differ significantly in their approaches and performance characteristics. SVM, with its focus on finding the optimal hyperplane, generally outperforms Naive Bayes in complex, non-linearly separable datasets, especially when dealing with interactions between features. However, Naive Bayes excels in speed and simplicity, making it a good choice for large datasets or when computational resources are limited.

20. How does Laplace Smoothing help in Naïve Bayes?

ANS:Laplace smoothing, also known as add-one smoothing, is a technique used in Naive Bayes to prevent zero probabilities, which can lead to inaccurate classifications. It works by adding a small value (usually 1) to all the frequency counts in the training data, ensuring that even unseen features have a non-zero probability. This prevents the model from completely disregarding an instance during classification simply because it contains a word not encountered during training.

In [None]:
#: Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy
# 1. Import necessary libraries
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# 2. Load the Iris dataset
iris = sns.load_dataset("iris")
X = iris.iloc[:, :-1].values  # features: sepal/petal measurements
y = iris['species'].values    # labels: setosa, versicolor, virginica

# 3. Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=0, stratify=y
)

# 4. Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# 5. Initialize and train the SVM classifier
svm = SVC(kernel='linear', C=1.0, random_state=42)
svm.fit(X_train, y_train)

# 6. Make predictions on the test set
y_pred = svm.predict(X_test)

# 7. Evaluation
print("Training accuracy: {:.3f}".format(accuracy_score(y_train, svm.predict(X_train))))
print("Test accuracy:     {:.3f}".format(accuracy_score(y_test, y_pred)))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=iris['species'].unique()))

# 8. (Optional) Cross-validation for a more robust estimate
cv_scores = cross_val_score(svm, X, y, cv=5)
print("5‑fold cross‑validation accuracy: {:.3f} ± {:.3f}".format(cv_scores.mean(), cv_scores.std()))


In [None]:
# 1. Imports
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# 2. Load dataset
data = load_wine()
X, y = data.data, data.target

# 3. Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# 4. Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# 5. Initialize SVM models
svm_lin = SVC(kernel='linear', C=1.0, random_state=42)
svm_rbf = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)

# 6. Train
svm_lin.fit(X_train, y_train)
svm_rbf.fit(X_train, y_train)

# 7. Predict
y_pred_lin = svm_lin.predict(X_test)
y_pred_rbf = svm_rbf.predict(X_test)

# 8. Evaluate
acc_lin = accuracy_score(y_test, y_pred_lin)
acc_rbf = accuracy_score(y_test, y_pred_rbf)

print(f"Linear SVM Test Accuracy: {acc_lin:.4f}")
print(f"RBF SVM Test Accuracy:    {acc_rbf:.4f}\n")

print("Linear Kernel Metrics:")
print(confusion_matrix(y_test, y_pred_lin))
print(classification_report(y_test, y_pred_lin))

print("RBF Kernel Metrics:")
print(confusion_matrix(y_test, y_pred_rbf))
print(classification_report(y_test, y_pred_rbf))

# 9. (Optional) Cross-validation comparison
cv_lin = cross_val_score(svm_lin, X, y, cv=5)
cv_rbf = cross_val_score(svm_rbf, X, y, cv=5)
print(f"5‑fold CV Accuracy: Linear={cv_lin.mean():.4f}±{cv_lin.std():.4f}, RBF={cv_rbf.mean():.4f}±{cv_rbf.std():.4f}")


In [None]:
# 1. Imports
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

# 2. Load data
X, y = fetch_california_housing(return_X_y=True)

# 3. Train–test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 4. Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 5. Initialize SVR model (RBF kernel by default)
svr = SVR(kernel='rbf', C=10.0, gamma='scale', epsilon=0.1)

# 6. Train model
svr.fit(X_train_scaled, y_train)

# 7. Predict on test data
y_pred = svr.predict(X_test_scaled)

# 8. Evaluate
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error (test set): {mse:.4f}")
print(f"R² score (test set): {r2:.4f}")

# 9. Optional: Hyperparameter tuning via GridSearchCV
param_grid = {
    'C': [1.0, 10.0, 100.0],
    'epsilon': [0.01, 0.1, 0.5],
    'kernel': ['rbf', 'linear'],
    'gamma': ['scale', 'auto']
}
grid = GridSearchCV(SVR(), param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)
grid.fit(X_train_scaled, y_train)

best = grid.best_estimator_
y_pred_best = best.predict(X_test_scaled)
mse_best = mean_squared_error(y_test, y_pred_best)
r2_best = r2_score(y_test, y_pred_best)

print("\nBest parameters found:", grid.best_params_)
print(f"Tuned SVR Mean Squared Error: {mse_best:.4f}")
print(f"Tuned SVR R² score: {r2_best:.4f}")


Mean Squared Error (test set): 0.3237
R² score (test set): 0.7530


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, svm
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.inspection import DecisionBoundaryDisplay

# 1️⃣ Load dataset & select 2D features
iris = datasets.load_iris()
X = iris.data[:, :2]       # use only the first two features for plotting
y = iris.target

# 2️⃣ Build an SVM pipeline with polynomial kernel
poly_svm = make_pipeline(
    StandardScaler(),
    svm.SVC(kernel='poly', degree=3, gamma='auto', coef0=1, C=1)
)
poly_svm.fit(X, y)

# 3️⃣ Plot decision boundary for polynomial SVM
disp = DecisionBoundaryDisplay.from_estimator(
    poly_svm, X, response_method="predict",
    grid_resolution=200,
    cmap=plt.cm.coolwarm, alpha=0.8
)

# 4️⃣ Overlay data points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, edgecolors='k')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.title("SVM with Polynomial Kernel (degree 3)")
plt.show()


In [None]:
# 1. Import necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import seaborn as sns
import matplotlib.pyplot as plt

# 2. Load the dataset
data = load_breast_cancer()
X, y = data.data, data.target  # target: 0=malignant, 1=benign

# 3. Split into train and test sets (80/20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# 4. Feature scaling (recommended but not strictly required)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# 5. Initialize and train the Gaussian Naïve Bayes model
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# 6. Predict on the test set
y_pred = gnb.predict(X_test)

# 7. Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Test-set Accuracy: {accuracy:.4f}")

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues",
            xticklabels=data.target_names,
            yticklabels=data.target_names)
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix - GaussianNB")
plt.show()

# Classification report (precision, recall, F1-score)
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))


In [None]:
# 1. Import libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# 2. Load a subset of categories (optional)
categories = ['alt.atheism', 'comp.graphics', 'sci.med', 'soc.religion.christian']
newsgroups = fetch_20newsgroups(subset='all', categories=categories, remove=('headers','footers','quotes'))

# 3. Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    newsgroups.data, newsgroups.target,
    test_size=0.2, random_state=42, stratify=newsgroups.target
)

# 4. Build a pipeline for feature extraction (count → TF-IDF) and model
from sklearn.pipeline import Pipeline
text_clf = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', MultinomialNB(alpha=1.0)),
])

# 5. Train the classifier
text_clf.fit(X_train, y_train)

# 6. Predict and evaluate
y_pred = text_clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f"Test-set Accuracy: {acc:.4f}")

print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=newsgroups.target_names))

# 7. Optional: Plot confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6,5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=newsgroups.target_names,
            yticklabels=newsgroups.target_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix – MultinomialNB')
plt.tight_layout()
plt.show()



In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# Load the Iris dataset (only the first two features for easy visualization)
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target

# Define different C (regularization) values to compare
C_values = [0.1, 1, 10, 100]

# Generate meshgrid for plotting
h = 0.02
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))

# Plot setup
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.flatten()

for idx, C in enumerate(C_values):
    clf = make_pipeline(StandardScaler(),
                        svm.SVC(kernel='linear', C=C))
    clf.fit(X, y)

    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)

    ax = axes[idx]
    ax.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.6)
    ax.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, edgecolors='k')
    ax.set_title(f"C = {C}")
    ax.set_xlabel(iris.feature_names[0])
    ax.set_ylabel(iris.feature_names[1])
    ax.set_xticks(())
    ax.set_yticks(())

plt.tight_layout()
plt.show()


In [None]:
# 1. Imports
import numpy as np
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# 2. Create a synthetic binary dataset
X = np.array([
    [1, 0, 1],
    [0, 1, 0],
    [1, 1, 1],
    [0, 0, 0],
    [1, 0, 0],
    [0, 1, 1],
    [1, 1, 0],
    [0, 0, 1]
])
y = np.array([1, 0, 1, 0, 1, 0, 1, 0])  # binary target labels

# 3. Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)

# 4. Initialize and train BernoulliNB
bnb = BernoulliNB(alpha=1.0, binarize=None)  # default smoothing
bnb.fit(X_train, y_train)

# 5. Predict and evaluate
y_pred = bnb.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
param_grid = {'alpha': [0.5, 1.0, 2.0], 'binarize': [0.0, 0.5, None]}
grid = GridSearchCV(BernoulliNB(), param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)
best = grid.best_estimator_

print("Best params:", grid.best_params_)
y_pred_best = best.predict(X_test)
print(f"Tuned Accuracy: {accuracy_score(y_test, y_pred_best):.4f}")


In [None]:
# 1. Imports
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# 2. Load data & split
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# 3. Train & evaluate without scaling
svm_no_scaling = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)
svm_no_scaling.fit(X_train, y_train)
y_pred_no = svm_no_scaling.predict(X_test)
acc_no = accuracy_score(y_test, y_pred_no)

# 4. Apply scaling: fit scaler on training data only
scaler = StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 5. Train & evaluate with scaling
svm_scaled = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)
svm_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = svm_scaled.predict(X_test_scaled)
acc_scaled = accuracy_score(y_test, y_pred_scaled)

# 6. Print results
print(f"Test accuracy without scaling: {acc_no:.4f}")
print(f"Test accuracy with scaling:    {acc_scaled:.4f}")


In [None]:
# 1. Imports
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.metrics import accuracy_score

# 2. Load dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target
X = X[y != 2]  # Use only two classes for binary classification
y = y[y != 2]

# 3. Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 4. Train Gaussian Naïve Bayes
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred_gnb = gnb.predict(X_test)
acc_gnb = accuracy_score(y_test, y_pred_gnb)

# 5. Train Multinomial Naïve Bayes with Laplace smoothing
# Note: MultinomialNB is typically used for discrete features
# Here, we simulate discrete-like data by adding a small constant
X_train_mnb = X_train + 1e-9
X_test_mnb = X_test + 1e-9
mnb = MultinomialNB(alpha=1.0)  # Laplace smoothing with alpha=1
mnb.fit(X_train_mnb, y_train)
y_pred_mnb = mnb.predict(X_test_mnb)
acc_mnb = accuracy_score(y_test, y_pred_mnb)

# 6. Display results
print(f"Gaussian Naïve Bayes Accuracy: {acc_gnb:.4f}")
print(f"Multinomial Naïve Bayes (Laplace smoothed) Accuracy: {acc_mnb:.4f}")


In [None]:
# 1. Imports
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# 2. Load Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# 3. Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# 4. Define parameter grid for GridSearchCV
param_grid = {
    'C': [0.1, 1, 10],
    'gamma': [0.1, 1, 'scale', 'auto'],
    'kernel': ['linear', 'rbf', 'poly']
}

# 5. Initialize SVM classifier
svm = SVC()

# 6. Set up GridSearchCV
grid_search = GridSearchCV(
    estimator=svm, param_grid=param_grid, cv=5, n_jobs=-1, verbose=2
)

# 7. Fit GridSearchCV
grid_search.fit(X_train, y_train)

# 8. Best hyperparameters and accuracy
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-validation Accuracy: {:.2f}%".format(grid_search.best_score_ * 100))

# 9. Evaluate on test set
best_svm = grid_search.best_estimator_
y_pred = best_svm.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)
print("Test Accuracy: {:.2f}%".format(test_accuracy * 100))


In [None]:
# 1. Imports
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score

# 2. Load Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# 3. Create an imbalanced dataset by removing some samples from the minority class
X, y = X[y != 2], y[y != 2]  # Keep only classes 0 and 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# 4. Train SVM without class weighting
svm_no_weight = SVC(kernel='linear', random_state=42)
svm_no_weight.fit(X_train, y_train)
y_pred_no_weight = svm_no_weight.predict(X_test)

# 5. Train SVM with class weighting
svm_with_weight = SVC(kernel='linear', class_weight='balanced', random_state=42)
svm_with_weight.fit(X_train, y_train)
y_pred_with_weight = svm_with_weight.predict(X_test)

# 6. Evaluate both models
print("SVM without class weighting:")
print("Accuracy:", accuracy_score(y_test, y_pred_no_weight))
print(classification_report(y_test, y_pred_no_weight))

print("\nSVM with class weighting:")
print("Accuracy:", accuracy_score(y_test, y_pred_with_weight))
print(classification_report(y_test, y_pred_with_weight))


In [None]:
import os
import re
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Define paths to the dataset
ham_dir = 'path_to_easy_ham'
spam_dir = 'path_to_spam'

# Function to read emails from a directory
def read_emails_from_dir(directory, label):
    emails = []
    for filename in os.listdir(directory):
        with open(os.path.join(directory, filename), 'r', encoding='latin-1') as file:
            emails.append(file.read())
    return pd.DataFrame({'text': emails, 'label': [label] * len(emails)})

# Load ham and spam emails
ham_emails = read_emails_from_dir(ham_dir, 'ham')
spam_emails = read_emails_from_dir(spam_dir, 'spam')

# Combine ham and spam emails into a single DataFrame
emails_df = pd.concat([ham_emails, spam_emails], ignore_index=True)

# Shuffle the dataset
emails_df = emails_df.sample(frac=1, random_state=42).reset_index(drop=True)

# Preprocess the text data
emails_df['text'] = emails_df['text'].apply(lambda x: re.sub(r'\W', ' ', x.lower()))

# Split the dataset into features and labels
X = emails_df['text']
y = emails_df['label']

# Convert text data to numerical features using TF-IDF
vectorizer = TfidfVectorizer(max_features=5000)
X_tfidf = vectorizer.fit_transform(X)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_tfidf, y, test_size=0.2, random_state=42)

# Initialize and train the Naïve Bayes classifier
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = nb_classifier.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
print("Classification Report:")
print(classification_report(y_test, y_pred))


In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
# Load dataset (replace with your dataset path)
df = pd.read_csv('path_to_your_dataset.csv')

# Preprocessing: Convert text to lowercase
df['text'] = df['text'].str.lower()

# Split dataset into features and target variable
X = df['text']
y = df['label']

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)
# Initialize and train SVM classifier
svm_classifier = SVC(kernel='linear', random_state=42)
svm_classifier.fit(X_train_tfidf, y_train)

# Make predictions
svm_predictions = svm_classifier.predict(X_test_tfidf)

# Evaluate performance
svm_accuracy = accuracy_score(y_test, svm_predictions)
print("SVM Accuracy:", svm_accuracy)
print("Classification Report:")
print(classification_report(y_test, svm_predictions))
# Initialize and train Naïve Bayes classifier
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train_tfidf, y_train)

# Make predictions
nb_predictions = nb_classifier.predict(X_test_tfidf)

# Evaluate performance
nb_accuracy = accuracy_score(y_test, nb_predictions)
print("Naïve Bayes Accuracy:", nb_accuracy)
print("Classification Report:")
print(classification_report(y_test, nb_predictions))



In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
# Load the 20 Newsgroups dataset
newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))

# Convert text data to lowercase
X = newsgroups.data
y = newsgroups.target

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)
# Select top 1000 features using chi-square test
selector = SelectKBest(chi2, k=1000)
X_train_selected = selector.fit_transform(X_train_tfidf, y_train)
X_test_selected = selector.transform(X_test_tfidf)
# Initialize and train Naïve Bayes classifier
nb_classifier_all = MultinomialNB()
nb_classifier_all.fit(X_train_tfidf, y_train)

# Make predictions
y_pred_all = nb_classifier_all.predict(X_test_tfidf)

# Evaluate performance
accuracy_all = accuracy_score(y_test, y_pred_all)
print("Naïve Bayes with All Features Accuracy:", accuracy_all)
print("Classification Report:")
print(classification_report(y_test, y_pred_all))
# Initialize and train Naïve Bayes classifier
nb_classifier_selected = MultinomialNB()
nb_classifier_selected.fit(X_train_selected, y_train)

# Make predictions
y_pred_selected = nb_classifier_selected.predict(X_test_selected)

# Evaluate performance
accuracy_selected = accuracy_score(y_test, y_pred_selected)
print("Naïve Bayes with Selected Features Accuracy:", accuracy_selected)
print("Classification Report:")
print(classification_report(y_test, y_pred_selected))


In [None]:
# Load the Wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize and train One-vs-Rest SVM classifier
svm_ovr = OneVsRestClassifier(SVC(kernel='linear', random_state=42))
svm_ovr.fit(X_train, y_train)

# Make predictions
y_pred_ovr = svm_ovr.predict(X_test)

# Evaluate performance
accuracy_ovr = accuracy_score(y_test, y_pred_ovr)
print("One-vs-Rest SVM Accuracy:", accuracy_ovr)
print("Classification Report:")
print(classification_report(y_test, y_pred_ovr))
# Initialize and train One-vs-One SVM classifier
svm_ovo = OneVsOneClassifier(SVC(kernel='linear', random_state=42))
svm_ovo.fit(X_train, y_train)

# Make predictions
y_pred_ovo = svm_ovo.predict(X_test)

# Evaluate performance
accuracy_ovo = accuracy_score(y_test, y_pred_ovo)
print("One-vs-One SVM Accuracy:", accuracy_ovo)
print("Classification Report:")
print(classification_report(y_test, y_pred_ovo))
print(f"Accuracy with One-vs-Rest: {accuracy_ovr * 100:.2f}%")
print(f"Accuracy with One-vs-One: {accuracy_ovo * 100:.2f}%")



In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler
# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Initialize and train SVM classifier with linear kernel
svm_linear = SVC(kernel='linear', random_state=42)
svm_linear.fit(X_train_scaled, y_train)

# Make predictions
y_pred_linear = svm_linear.predict(X_test_scaled)

# Evaluate performance
accuracy_linear = accuracy_score(y_test, y_pred_linear)
print("Linear Kernel SVM Accuracy:", accuracy_linear)
print("Classification Report:")
print(classification_report(y_test, y_pred_linear))
# Initialize and train SVM classifier with polynomial kernel
svm_poly = SVC(kernel='poly', degree=3, random_state=42)
svm_poly.fit(X_train_scaled, y_train)

# Make predictions
y_pred_poly = svm_poly.predict(X_test_scaled)

# Evaluate performance
accuracy_poly = accuracy_score(y_test, y_pred_poly)
print("Polynomial Kernel SVM Accuracy:", accuracy_poly)
print("Classification Report:")
print(classification_report(y_test, y_pred_poly))
# Initialize and train SVM classifier with RBF kernel
svm_rbf = SVC(kernel='rbf', random_state=42)
svm_rbf.fit(X_train_scaled, y_train)

# Make predictions
y_pred_rbf = svm_rbf.predict(X_test_scaled)

# Evaluate performance
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)
print("RBF Kernel SVM Accuracy:", accuracy_rbf)
print("Classification Report:")
print(classification_report(y_test, y_pred_rbf))


In [None]:
#Write a Python program to train an SVM Classifier using Stratified K-Fold Cross-Validation and compute the
#average accuracy
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Initialize Stratified K-Fold with 5 splits
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# List to store accuracy scores
accuracies = []

# Perform Stratified K-Fold Cross-Validation
for train_index, test_index in skf.split(X_scaled, y):
    X_train, X_test = X_scaled[train_index], X_scaled[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Initialize and train SVM classifier with RBF kernel
    svm = SVC(kernel='rbf', random_state=42)
    svm.fit(X_train, y_train)

    # Make predictions
    y_pred = svm.predict(X_test)

    # Compute accuracy
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)

# Compute average accuracy
average_accuracy = np.mean(accuracies)
print(f"Average Accuracy: {average_accuracy * 100:.2f}%")


In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
def evaluate_naive_bayes_with_priors(priors):
    # Initialize Gaussian Naïve Bayes classifier with custom priors
    model = GaussianNB(priors=priors)

    # Train the model
    model.fit(X_train, y_train)

    # Predict on the test set
    y_pred = model.predict(X_test)

    # Calculate accuracy
    accuracy = accuracy_score(y_test, y_pred)
    return accuracy
# Define different prior probabilities
priors_list = [
    [0.5, 0.5],  # Equal priors
    [0.3, 0.7],  # Skewed priors
    [0.7, 0.3]   # Skewed priors
]

# Evaluate performance for each set of priors
for priors in priors_list:
    accuracy = evaluate_naive_bayes_with_priors(priors)
    print(f"Priors: {priors} => Accuracy: {accuracy * 100:.2f}%")


In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.feature_selection import RFE
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Initialize and train SVM classifier with RBF kernel
svm_no_rfe = SVC(kernel='rbf', random_state=42)
svm_no_rfe.fit(X_train_scaled, y_train)

# Make predictions
y_pred_no_rfe = svm_no_rfe.predict(X_test_scaled)

# Evaluate performance
accuracy_no_rfe = accuracy_score(y_test, y_pred_no_rfe)
print(f"Accuracy without RFE: {accuracy_no_rfe * 100:.2f}%")
# Initialize SVM classifier with RBF kernel
svm = SVC(kernel='rbf', random_state=42)

# Initialize RFE with SVM estimator and select top 10 features
rfe = RFE(estimator=svm, n_features_to_select=10)
rfe.fit(X_train_scaled, y_train)

# Select the features identified by RFE
X_train_rfe = rfe.transform(X_train_scaled)
X_test_rfe = rfe.transform(X_test_scaled)

# Train SVM classifier on selected features
svm_rfe = SVC(kernel='rbf', random_state=42)
svm_rfe.fit(X_train_rfe, y_train)

# Make predictions
y_pred_rfe = svm_rfe.predict(X_test_rfe)

# Evaluate performance
accuracy_rfe = accuracy_score(y_test, y_pred_rfe)
print(f"Accuracy with RFE: {accuracy_rfe * 100:.2f}%")


In [None]:


 Write a Python program to train an SVM Classifier and evaluate its performance using Precision, Recall, and
F1-Score instead of accuracy

import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.feature_selection import RFE
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Initialize and train SVM classifier with RBF kernel
svm_no_rfe = SVC(kernel='rbf', random_state=42)
svm_no_rfe.fit(X_train_scaled, y_train)

# Make predictions
y_pred_no_rfe = svm_no_rfe.predict(X_test_scaled)

# Evaluate performance
accuracy_no_rfe = accuracy_score(y_test, y_pred_no_rfe)
print(f"Accuracy without RFE: {accuracy_no_rfe * 100:.2f}%")
# Initialize SVM classifier with RBF kernel
svm = SVC(kernel='rbf', random_state=42)

# Initialize RFE with SVM estimator and select top 10 features
rfe = RFE(estimator=svm, n_features_to_select=10)
rfe.fit(X_train_scaled, y_train)

# Select the features identified by RFE
X_train_rfe = rfe.transform(X_train_scaled)
X_test_rfe = rfe.transform(X_test_scaled)

# Train SVM classifier on selected features
svm_rfe = SVC(kernel='rbf', random_state=42)
svm_rfe.fit(X_train_rfe, y_train)

# Make predictions
y_pred_rfe = svm_rfe.predict(X_test_rfe)

# Evaluate performance
accuracy_rfe = accuracy_score(y_test, y_pred_rfe)
print(f"Accuracy with RFE: {accuracy_rfe * 100:.2f}%")


In [None]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import log_loss
from sklearn.preprocessing import StandardScaler
# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Initialize and train Naïve Bayes classifier
nb = GaussianNB()
nb.fit(X_train_scaled, y_train)

# Predict probabilities
y_pred_proba = nb.predict_proba(X_test_scaled)

# Calculate Log Loss
loss = log_loss(y_test, y_pred_proba)
print(f"Log Loss: {loss:.4f}")


In [None]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler
# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Initialize and train SVM classifier with RBF kernel
svm = SVC(kernel='rbf', random_state=42)
svm.fit(X_train_scaled, y_train)

# Make predictions
y_pred = svm.predict(X_test_scaled)

# Compute confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Plot confusion matrix using Seaborn
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False,
            xticklabels=['Malignant', 'Benign'],
            yticklabels=['Malignant', 'Benign'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix for SVM Classifier')
plt.show()


In [None]:
import numpy as np
from sklearn.datasets import load_boston  # or fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error

# Load a housing dataset (for example purposes)
data = load_boston()
X, y = data.data, data.target

# Split into train/test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# 1. Without scaling
svr = SVR(kernel='rbf', C=1.0, gamma='scale')
svr.fit(X_train, y_train)
y_pred_no_scaling = svr.predict(X_test)
mae_no = mean_absolute_error(y_test, y_pred_no_scaling)
print(f"MAE without scaling: {mae_no:.3f}")

# 2. With feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

svr_scaled = SVR(kernel='rbf', C=1.0, gamma='scale')
svr_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = svr_scaled.predict(X_test_scaled)
mae_scaled = mean_absolute_error(y_test, y_pred_scaled)
print(f"MAE with scaling:    {mae_scaled:.3f}")


In [None]:
 #Write a Python program to train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC

#score
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import roc_curve, roc_auc_score, RocCurveDisplay
from sklearn.preprocessing import StandardScaler

# 1. Load dataset
data = load_breast_cancer()
X, y = data.data, data.target  # binary target (0/1)

# 2. Split train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# 3. Scale features
scaler = StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 4. Train Gaussian Naïve Bayes
gnb = GaussianNB()
gnb.fit(X_train_scaled, y_train)

# 5. Predict probabilities for ROC-AUC
y_probs = gnb.predict_proba(X_test_scaled)[:, 1]

# 6. Compute ROC curve and AUC
fpr, tpr, thresholds = roc_curve(y_test, y_probs)
auc_score = roc_auc_score(y_test, y_probs)

print(f"ROC AUC Score: {auc_score:.4f}")

# 7. Plot ROC curve
RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=auc_score, estimator_name="GaussianNB").plot()
plt.title("ROC Curve for Gaussian Naïve Bayes")
plt.show()


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import average_precision_score, precision_recall_curve, PrecisionRecallDisplay

# Load binary classification dataset
iris = datasets.load_iris()
X = iris.data[iris.target < 2]
y = iris.target[iris.target < 2]

# Split into train & test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=42, stratify=y
)

# Build pipeline: scaling + SVM with probability estimates enabled
svm_clf = make_pipeline(
    StandardScaler(),
    SVC(kernel='linear', probability=True, random_state=42)
)

# Train classifier
svm_clf.fit(X_train, y_train)

# Compute decision scores or probabilities
y_scores = svm_clf.predict_proba(X_test)[:, 1]

# Compute average precision (area under PR curve)
avg_prec = average_precision_score(y_test, y_scores)

print(f"Average precision‑recall score = {avg_prec:.2f}")

# Compute precision and recall for all thresholds
precision, recall, thresholds = precision_recall_curve(y_test, y_scores)

# Plot manually (optional)
plt.figure(figsize=(6,4))
plt.step(recall, precision, where="post", color="b", alpha=0.5)
plt.fill_between(recall, precision, step="post", alpha=0.2, color="b")
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.ylim([0.0, 1.05])
plt.xlim([0.0, 1.0])
plt.title(f"Precision‑Recall curve: AP={avg_prec:.2f}")
plt.show()

# Or use built‑in display API
disp = PrecisionRecallDisplay(precision=precision, recall=recall, average_precision=avg_prec)
disp.plot()
plt.title("Precision‑Recall curve (via PrecisionRecallDisplay)")
plt.show()
