**Theoretical**

In [None]:
### Q.1) What is a Support Vector Machine (SVM)?

In [None]:
ans) A Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks, though it is more commonly used for classification. SVM works by finding the optimal hyperplane that best separates different classes in a dataset.

In [None]:
### Q.2) What is the difference between Hard Margin and Soft Margin SVM?

In [None]:
ans)  1) Hard Margin SVM: Requires perfect separation of classes, only works with linearly separable data.
2)  Soft Margin SVM: Allows some misclassifications, making it more robust for noisy or non-linearly separable data.

In [None]:
### Q.3)  What is the mathematical intuition behind SVM?

In [None]:
ans)  SVM aims to maximize the margin between two classes while minimizing misclassification. It uses Lagrange multipliers and kernel functions to handle non-linearly separable data.

In [None]:
### Q.4) What is the role of Lagrange Multipliers in SVM?

In [None]:
ans)  Lagrange Multipliers are used to transform the constrained optimization problem of SVM into an unconstrained dual problem, making it easier to solve using quadratic programming.



In [None]:
### Q.5) What are Support Vectors in SVM?

In [None]:
ans) Support vectors are the data points closest to the decision boundary. They determine the position of the optimal hyperplane.

In [None]:
### Q.6) What is a Support Vector Classifier (SVC)?

In [None]:
ans) SVC is the classification variant of SVM that assigns data points to predefined categories based on learned decision boundaries.

In [None]:
### Q.7) What is a Support Vector Regressor (SVR)?

In [None]:
ans) SVR is the regression variant of SVM that finds a function that deviates from the true outputs by a small margin while minimizing error.

In [None]:
### Q.8) What is the Kernel Trick in SVM?

In [None]:
ans) The kernel trick is a mathematical function that transforms input space into higher dimensions to make data linearly separable, without explicitly computing the transformation.

In [None]:
### Q.9) Compare Linear Kernel, Polynomial Kernel, and RBF Kernel.

In [None]:
ans) 1) Linear Kernel: Works well for linearly separable data.
2) Polynomial Kernel: Captures complex relationships, using polynomial degrees to fit curved boundaries.
3) RBF Kernel: Maps data into an infinite-dimensional space, effective for non-linearly separable data.

In [None]:
### Q.10) What is the effect of the C parameter in SVM?

In [None]:
ans) The C parameter controls the trade-off between maximizing margin and minimizing classification error. A high C leads to less margin but fewer misclassifications, while a low C allows a larger margin with more misclassifications.

In [None]:
### Q.11) What is the role of the Gamma parameter in RBF Kernel SVM?

In [None]:
ans) The Gamma parameter defines how far the influence of a single training example reaches. Higher gamma makes the model more complex, leading to overfitting.

In [None]:
### Q.12) What is the Naïve Bayes classifier, and why is it called "Naïve"?

In [None]:
ans) Naïve Bayes is a probabilistic classification algorithm based on Bayes' Theorem. It is called "naïve" because it assumes that features are conditionally independent, which is often not true in real-world data.

In [None]:
### Q.13) What is Bayes' Theorem?

In [None]:
ans)  P(A|B) = [P(B|A) × P(A)] / P(B)
Where:

P(A|B) is the posterior probability - the probability of A given that B occurred
P(B|A) is the likelihood - the probability of B given that A is true
P(A) is the prior probability of A
P(B) is the probability of B occurring

In [None]:
### Q.14) Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes.

In [None]:
ans) 1) Gaussian Naïve Bayes: Used for continuous numerical data, assumes a normal distribution.
2) Multinomial Naïve Bayes: Used for discrete data, effective for text classification (word counts).
3) Bernoulli Naïve Bayes: Works with binary features (0 or 1), useful for text classification with presence/absence of words.

In [None]:
### Q.15) When should you use Gaussian Naïve Bayes over other variants?

In [None]:
ans) When working with continuous numerical data that follows a normal distribution.



In [None]:
### Q.16) What are the key assumptions made by Naïve Bayes?

In [None]:
ans) 1) Features are independent given the class label.
2) Each feature contributes equally to the final prediction.

In [None]:
### Q.17) What are the advantages and disadvantages of Naïve Bayes?

In [None]:
ans)  Advantages: Simple, fast, works well with small datasets and high-dimensional data.
Disadvantages: Assumes feature independence, struggles with correlated features.

In [None]:
### Q.18) Why is Naïve Bayes a good choice for text classification?

In [None]:
ans) It handles high-dimensional data efficiently and performs well with sparse data, such as word frequencies in documents.

In [None]:
### Q.19) Compare SVM and Naïve Bayes for classification tasks.

In [None]:
ans) 1) SVM: More accurate but computationally expensive, works well for small datasets.
2) Naïve Bayes: Faster and works well with large-scale text data but assumes feature independence.

In [None]:
### Q.20) How does Laplace Smoothing help in Naïve Bayes?

In [None]:
ans) It prevents zero probabilities by adding a small constant to all probability estimates, making the model more robust for unseen words in text classification.



**Practical**

In [None]:
### Q.21) Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy.

In [None]:
ans) from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the SVM Classifier
svm_classifier = SVC(kernel='rbf', random_state=42)
svm_classifier.fit(X_train, y_train)

# Make predictions
y_pred = svm_classifier.predict(X_test)

# Calculate and print accuracy
accuracy = svm_classifier.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2f}")

# Print classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))


In [None]:
### Q.22)  Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.


In [None]:
ans) from sklearn import datasets, svm, metrics
from sklearn.model_selection import train_test_split

# Load the Wine dataset
wine = datasets.load_wine()
X_train, X_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.2, random_state=42)

# Train SVM with Linear kernel
linear_svm = svm.SVC(kernel='linear')
linear_svm.fit(X_train, y_train)

# Train SVM with RBF kernel
rbf_svm = svm.SVC(kernel='rbf')
rbf_svm.fit(X_train, y_train)

# Evaluate and compare accuracies
print("Linear Kernel Accuracy:", linear_svm.score(X_test, y_test))
print("RBF Kernel Accuracy:", rbf_svm.score(X_test, y_test))


In [None]:
### Q.23) Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean Squared Error (MSE).

In [None]:
ans) from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Load the California housing dataset
housing = datasets.fetch_california_housing()
X, y = housing.data, housing.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features by scaling to unit variance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train the SVM Regressor with a radial basis function kernel
svr = SVR(kernel='rbf', C=1.0, epsilon=0.1)
svr.fit(X_train, y_train)

# Make predictions
y_pred = svr.predict(X_test)

# Evaluate using Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')


In [None]:
### Q.24) Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision boundary.


In [None]:
ans) import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler

# Generate sample data
X, y = make_moons(n_samples=100, noise=0.15, random_state=42)

# Scale the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Create and train the SVM with polynomial kernel
svm = SVC(kernel='poly', degree=3, C=1.0, random_state=42)
svm.fit(X_scaled, y)

# Create a mesh grid to plot the decision boundary
x_min, x_max = X_scaled[:, 0].min() - 0.5, X_scaled[:, 0].max() + 0.5
y_min, y_max = X_scaled[:, 1].min() - 0.5, X_scaled[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))

# Make predictions on the mesh grid
Z = svm.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot the decision boundary and data points
plt.figure(figsize=(10, 8))
plt.contourf(xx, yy, Z, alpha=0.4)
plt.scatter(X_scaled[y == 0, 0], X_scaled[y == 0, 1], color='blue', label='Class 0')
plt.scatter(X_scaled[y == 1, 0], X_scaled[y == 1, 1], color='red', label='Class 1')

# Plot support vectors
plt.scatter(svm.support_vectors_[:, 0], svm.support_vectors_[:, 1],
           s=100, linewidth=1, facecolors='none', edgecolors='black',
           label='Support Vectors')

plt.title('SVM with Polynomial Kernel Decision Boundary')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.grid(True)

# Calculate and print accuracy
accuracy = svm.score(X_scaled, y)
print(f"Model Accuracy: {accuracy:.2f}")

# Show number of support vectors
print(f"Number of support vectors: {len(svm.support_vectors_)}")
plt.show()


In [None]:
### Q.25) Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.

In [None]:
ans) from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features by scaling to unit variance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train the Gaussian Naïve Bayes classifier
nb_clf = GaussianNB()
nb_clf.fit(X_train, y_train)

# Make predictions
y_pred = nb_clf.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')


In [None]:
### Q.26) Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20 Newsgroups dataset.

In [None]:
ans) from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.pipeline import Pipeline
import numpy as np

# Load the 20 newsgroups dataset
# Using a subset of categories for simplicity
categories = ['alt.atheism', 'soc.religion.christian',
              'comp.graphics', 'sci.med']
newsgroups_train = fetch_20newsgroups(subset='train',
                                     categories=categories,
                                     remove=('headers', 'footers', 'quotes'))
newsgroups_test = fetch_20newsgroups(subset='test',
                                    categories=categories,
                                    remove=('headers', 'footers', 'quotes'))

# Create a pipeline with TF-IDF vectorizer and Multinomial NB classifier
text_clf = Pipeline([
    ('tfidf', TfidfVectorizer(stop_words='english',
                             max_features=5000,
                             ngram_range=(1, 2))),
    ('clf', MultinomialNB(alpha=1.0))
])

# Train the classifier
text_clf.fit(newsgroups_train.data, newsgroups_train.target)

# Make predictions
predictions = text_clf.predict(newsgroups_test.data)

# Calculate accuracy
accuracy = text_clf.score(newsgroups_test.data, newsgroups_test.target)

# Print results
print("Text Classification Results")
print("--------------------------")
print(f"Model Accuracy: {accuracy:.4f}")

print("\nClassification Report:")
print(classification_report(newsgroups_test.target, predictions,
                          target_names=categories))

# Get feature names and their importance
tfidf = text_clf.named_steps['tfidf']
nb = text_clf.named_steps['clf']

# Get top features for each category
def print_top_features(classifier, vectorizer, class_labels, n=5):
    feature_names = np.array(vectorizer.get_feature_names_out())
    for i, category in enumerate(class_labels):
        top_features_idx = np.argsort(classifier.feature_log_prob_[i])[-n:]
        top_features = feature_names[top_features_idx]
        print(f"\nTop {n} features for {category}:")
        for feature in reversed(top_features):
            print(f"- {feature}")

print("\nMost Informative Features per Category:")
print_top_features(nb, tfidf, categories)

# Example of prediction with new text
def predict_category(text):
    prediction = text_clf.predict([text])
    probability = text_clf.predict_proba([text])[0]
    return categories[prediction[0]], probability[prediction[0]]

# Test with a sample text
sample_text = "The patient was prescribed antibiotics for the infection"
category, confidence = predict_category(sample_text)
print(f"\nSample Text Classification:")
print(f"Text: {sample_text}")
print(f"Predicted Category: {category}")
print(f"Confidence: {confidence:.4f}")

In [None]:
### Q.27) Write a Python program to train an SVM Classifier with different C values and compare the decision boundaries visually.

In [None]:
ans) import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data[:, :2]  # Use only the first two features for visualization

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features by scaling to unit variance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Different values of C to compare
C_values = [0.1, 1, 10, 100]

fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.ravel()

for i, C in enumerate(C_values):
    svm_clf = SVC(kernel='linear', C=C, random_state=42)
    svm_clf.fit(X_train, y_train)

    # Create a mesh grid for visualization
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100))

    # Predict for each point in the mesh grid
    Z = svm_clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plot the decision boundary and data points
    axes[i].contourf(xx, yy, Z, alpha=0.3)
    axes[i].scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap=plt.cm.jet)
    axes[i].set_xlabel('Feature 1')
    axes[i].set_ylabel('Feature 2')
    axes[i].set_title(f'SVM with C={C}')

plt.tight_layout()
plt.show()



In [None]:
### Q.28)  Write a Python program to train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with binary features.

In [None]:
ans) from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

# Generate a sample dataset with binary features
np.random.seed(0)
X = np.random.randint(2, size=(100, 10))  # 100 samples, 10 binary features
y = np.random.randint(2, size=(100,))  # Binary target variable

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Bernoulli Naive Bayes classifier
bnb = BernoulliNB()
bnb.fit(X_train, y_train)

# Make predictions on the test set
y_pred = bnb.predict(X_test)

# Evaluate the classifier's performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:")
print(classification_report(y_test, y_pred))



In [None]:
### Q.29) Apply feature scaling before training an SVM model and compare results with unscaled data.

In [None]:
ans) import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data[:, :2]  # Use only the first two features for visualization

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM without feature scaling
svm_clf_unscaled = SVC(kernel='linear', C=1.0, random_state=42)
svm_clf_unscaled.fit(X_train, y_train)
y_pred_unscaled = svm_clf_unscaled.predict(X_test)
unscaled_accuracy = accuracy_score(y_test, y_pred_unscaled)

# Apply feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train SVM with feature scaling
svm_clf_scaled = SVC(kernel='linear', C=1.0, random_state=42)
svm_clf_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = svm_clf_scaled.predict(X_test_scaled)
scaled_accuracy = accuracy_score(y_test, y_pred_scaled)

# Print accuracy comparison
print(f'Accuracy without scaling: {unscaled_accuracy:.2f}')
print(f'Accuracy with scaling: {scaled_accuracy:.2f}')

# Plot decision boundaries
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
for i, (clf, title, X_train_plot, X_test_plot) in enumerate([
    (svm_clf_unscaled, 'Without Scaling', X_train, X_test),
    (svm_clf_scaled, 'With Scaling', X_train_scaled, X_test_scaled)
]):
    x_min, x_max = X_train_plot[:, 0].min() - 1, X_train_plot[:, 0].max() + 1
    y_min, y_max = X_train_plot[:, 1].min() - 1, X_train_plot[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    axes[i].contourf(xx, yy, Z, alpha=0.3)
    axes[i].scatter(X_train_plot[:, 0], X_train_plot[:, 1], c=y_train, edgecolor='k', cmap=plt.cm.jet)
    axes[i].set_xlabel('Feature 1')
    axes[i].set_ylabel('Feature 2')
    axes[i].set_title(title)

plt.tight_layout()
plt.show()



In [None]:
### Q.30) Train a Gaussian Naïve Bayes model and compare predictions before and after Laplace Smoothing.



In [None]:
ans) from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

# Generate a sample dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=3, n_repeated=2, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Gaussian Naive Bayes model without Laplace Smoothing
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions on the test set
y_pred = gnb.predict(X_test)

# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy without Laplace Smoothing:", accuracy)
print("Classification Report without Laplace Smoothing:")
print(classification_report(y_test, y_pred))

# Apply Laplace Smoothing
from sklearn.naive_bayes import MultinomialNB
from sklearn.preprocessing import Normalizer
normalizer = Normalizer()
X_train_normalized = normalizer.fit_transform(X_train)
X_test_normalized = normalizer.transform(X_test)

mnb = MultinomialNB(alpha=1.0e-10) # alpha is the smoothing parameter
mnb.fit(X_train_normalized, y_train)

# Make predictions on the test set
y_pred_smoothed = mnb.predict(X_test_normalized)

# Evaluate the model's performance
accuracy_smoothed = accuracy_score(y_test, y_pred_smoothed)
print("Accuracy with Laplace Smoothing:", accuracy_smoothed)
print("Classification Report with Laplace Smoothing:")
print(classification_report(y_test, y_pred_smoothed))



In [None]:
### Q.31) Train an SVM Classifier and use GridSearchCV to tune the hyperparameters (C, gamma, kernel).



In [None]:
ans) from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, classification_report

# Load the iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the hyperparameter tuning space
param_grid = {
    'C': [0.1, 1, 10],
    'gamma': ['scale', 'auto'],
    'kernel': ['linear', 'rbf', 'poly']
}

# Initialize the SVM classifier
svm_classifier = svm.SVC()

# Perform grid search with cross-validation
grid_search = GridSearchCV(estimator=svm_classifier, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Get the best-performing model and its hyperparameters
best_model = grid_search.best_estimator_
best_hyperparameters = grid_search.best_params_

# Evaluate the best model on the test set
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Best Hyperparameters:", best_hyperparameters)
print("Accuracy:", accuracy)
print("Classification Report:")
print(classification_report(y_test, y_pred))


In [None]:
### Q.32)   Train an SVM Classifier on an imbalanced dataset and apply class weighting.

In [None]:
ans) from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import accuracy_score, classification_report, f1_score

# Load the iris dataset (we'll make it imbalanced)
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Make the dataset imbalanced by removing some samples from one class
np.random.seed(0)
indices_to_remove = np.random.choice(np.where(y == 0)[0], size=40, replace=False)
y = np.delete(y, indices_to_remove)
X = np.delete(X, indices_to_remove, axis=0)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Compute class weights
class_weights = compute_class_weight(class_weight='balanced', classes=np.unique(y_train), y=y_train)

# Create a dictionary mapping class labels to their corresponding weights
class_weight_dict = dict(enumerate(class_weights))

# Train an SVM classifier with class weighting
svm_classifier = svm.SVC(class_weight=class_weight_dict, kernel='rbf', C=1)
svm_classifier.fit(X_train, y_train)

# Evaluate the classifier on the test set
y_pred = svm_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred, average='macro')
print("Accuracy:", accuracy)
print("F1-score (macro):", f1)
print("Classification Report:")
print(classification_report(y_test, y_pred))



In [None]:
### Q.33) Implement a Naïve Bayes classifier for spam detection using email data.

In [None]:
ans) import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Load the email dataset
data = pd.read_csv('email_data.csv')

# Split the data into training and testing sets
X = data['email']
y = data['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a bag-of-words representation of the email data
vectorizer = CountVectorizer(stop_words='english')
X_train_count = vectorizer.fit_transform(X_train)
X_test_count = vectorizer.transform(X_test)

# Train a Naïve Bayes classifier
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train_count, y_train)

# Make predictions on the test set
y_pred = nb_classifier.predict(X_test_count)

# Evaluate the classifier's performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:")
print(classification_report(y_test, y_pred))



In [None]:
### Q.34) Train an SVM Classifier and a Naïve Bayes Classifier on the same dataset and compare accuracy.



In [None]:
ans) from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Load the iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM Classifier
svm_classifier = svm.SVC(kernel='rbf', C=1)
svm_classifier.fit(X_train, y_train)

# Train a Naïve Bayes Classifier
nb_classifier = MultinomialNB()
# Since MultinomialNB expects feature values to be non-negative, we'll use a different dataset
# Let's use the 20 newsgroups dataset instead
from sklearn.datasets import fetch_20newsgroups
newsgroups_train = fetch_20newsgroups(subset='train')
X_train_nb = newsgroups_train.data
y_train_nb = newsgroups_train.target
newsgroups_test = fetch_20newsgroups(subset='test')
X_test_nb = newsgroups_test.data
y_test_nb = newsgroups_test.target

# We need to vectorize the text data
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
X_train_nb_vectorized = vectorizer.fit_transform(X_train_nb)
X_test_nb_vectorized = vectorizer.transform(X_test_nb)

nb_classifier.fit(X_train_nb_vectorized, y_train_nb)

# Make predictions using the SVM Classifier
y_pred_svm = svm_classifier.predict(X_test)

# Make predictions using the Naïve Bayes Classifier
y_pred_nb = nb_classifier.predict(X_test_nb_vectorized)

# Evaluate the accuracy of both classifiers
accuracy_svm = accuracy_score(y_test, y_pred_svm)
accuracy_nb = accuracy_score(y_test_nb, y_pred_nb)

print("SVM Classifier Accuracy:", accuracy_svm)
print("Naïve Bayes Classifier Accuracy:", accuracy_nb)

print("SVM Classifier Classification Report:")
print(classification_report(y_test, y_pred_svm))

print("Naïve Bayes Classifier Classification Report:")
print(classification_report(y_test_nb, y_pred_nb))



In [None]:
### Q.35) Perform feature selection before training a Naïve Bayes classifier and compare results.



In [None]:
ans) from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Perform feature selection using chi-squared test
selector = SelectKBest(chi2, k=2)
X_train_selected = selector.fit_transform(X_train, y_train)
X_test_selected = selector.transform(X_test)

# Train a Naïve Bayes classifier on the original dataset
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train, y_train)

# Train a Naïve Bayes classifier on the dataset with selected features
nb_classifier_selected = MultinomialNB()
nb_classifier_selected.fit(X_train_selected, y_train)

# Make predictions using both classifiers
y_pred = nb_classifier.predict(X_test)
y_pred_selected = nb_classifier_selected.predict(X_test_selected)

# Evaluate the accuracy of both classifiers
accuracy = accuracy_score(y_test, y_pred)
accuracy_selected = accuracy_score(y_test, y_pred_selected)

print("Accuracy without feature selection:", accuracy)
print("Accuracy with feature selection:", accuracy_selected)

print("Classification Report without feature selection:")
print(classification_report(y_test, y_pred))

print("Classification Report with feature selection:")
print(classification_report(y_test, y_pred_selected))


In [None]:
### Q.36) Train an SVM Classifier using One-vs-Rest (OvR) and One-vs-One (OvO) strategies on the Wine dataset.



In [None]:
ans) from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = datasets.load_wine()
X, y = wine.data, wine.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features by scaling to unit variance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train SVM using One-vs-Rest (OvR) strategy
svm_ovr = OneVsRestClassifier(SVC(kernel='linear', C=1.0, random_state=42))
svm_ovr.fit(X_train, y_train)
y_pred_ovr = svm_ovr.predict(X_test)
accuracy_ovr = accuracy_score(y_test, y_pred_ovr)

# Train SVM using One-vs-One (OvO) strategy
svm_ovo = OneVsOneClassifier(SVC(kernel='linear', C=1.0, random_state=42))
svm_ovo.fit(X_train, y_train)
y_pred_ovo = svm_ovo.predict(X_test)
accuracy_ovo = accuracy_score(y_test, y_pred_ovo)

# Print accuracy comparison
print(f'Accuracy with One-vs-Rest (OvR): {accuracy_ovr:.2f}')
print(f'Accuracy with One-vs-One (OvO): {accuracy_ovo:.2f}')


In [None]:
### Q.37) Train an SVM Classifier using Linear, Polynomial, and RBF kernels on the Breast Cancer dataset.

In [None]:
ans) from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features by scaling to unit variance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train SVM with Linear kernel
svm_linear = SVC(kernel='linear', C=1.0, random_state=42)
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)

# Train SVM with Polynomial kernel
svm_poly = SVC(kernel='poly', degree=3, C=1.0, random_state=42)
svm_poly.fit(X_train, y_train)
y_pred_poly = svm_poly.predict(X_test)
accuracy_poly = accuracy_score(y_test, y_pred_poly)

# Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf', C=1.0, random_state=42)
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Print accuracy comparison
print(f'Accuracy with Linear Kernel: {accuracy_linear:.2f}')
print(f'Accuracy with Polynomial Kernel: {accuracy_poly:.2f}')
print(f'Accuracy with RBF Kernel: {accuracy_rbf:.2f}')


In [None]:
### Q.38)  Train an SVM Classifier using Stratified K-Fold Cross-Validation and compute the average accuracy.

In [None]:
ans) from sklearn import datasets
from sklearn.model_selection import StratifiedKFold
from sklearn import svm
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Initialize the Stratified K-Fold Cross-Validation object
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Initialize the list to store the accuracy scores
accuracy_scores = []

# Train an SVM Classifier using Stratified K-Fold Cross-Validation
for train_index, test_index in kfold.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Train an SVM Classifier on the current fold
    svm_classifier = svm.SVC(kernel='rbf', C=1)
    svm_classifier.fit(X_train, y_train)

    # Make predictions on the test set
    y_pred = svm_classifier.predict(X_test)

    # Compute the accuracy score
    accuracy = accuracy_score(y_test, y_pred)
    accuracy_scores.append(accuracy)

# Compute the average accuracy
average_accuracy = sum(accuracy_scores) / len(accuracy_scores)
print("Average Accuracy:", average_accuracy)



In [None]:
### Q.39) Write a Python program to train a Naïve Bayes classifier using different prior probabilities and compare performance.

In [None]:
ans) from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define different prior probabilities
prior_probabilities = [
    None,  # Default prior probabilities (uniform distribution)
    [0.4, 0.3, 0.3],  # Custom prior probabilities
    [0.6, 0.2, 0.2],  # Custom prior probabilities
]

# Train a Naïve Bayes classifier using different prior probabilities
for prior_probability in prior_probabilities:
    # Initialize the Naïve Bayes classifier with the specified prior probability
    if prior_probability is None:
        nb_classifier = GaussianNB()
    else:
        nb_classifier = GaussianNB(priors=prior_probability)

    # Train the Naïve Bayes classifier
    nb_classifier.fit(X_train, y_train)

    # Make predictions on the test set
    y_pred = nb_classifier.predict(X_test)

    # Evaluate the accuracy of the classifier
    accuracy = accuracy_score(y_test, y_pred)
    print("Prior Probability:", prior_probability)
    print("Accuracy:", accuracy)
    print("Classification Report:")
    print(classification_report(y_test, y_pred))
    print()




In [None]:
### Q.40)  Write a Python program to perform Recursive Feature Elimination (RFE) before training an SVM Classifier and compare accuracy.

In [None]:
ans) from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.feature_selection import RFE
from sklearn.metrics import accuracy_score, classification_report

# Load the iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM Classifier without RFE
svm_classifier = svm.SVC(kernel='rbf', C=1)
svm_classifier.fit(X_train, y_train)

# Make predictions without RFE
y_pred = svm_classifier.predict(X_test)

# Evaluate the accuracy without RFE
accuracy_without_rfe = accuracy_score(y_test, y_pred)
print("Accuracy without RFE:", accuracy_without_rfe)
print("Classification Report without RFE:")
print(classification_report(y_test, y_pred))

# Perform Recursive Feature Elimination (RFE)
rfe = RFE(estimator=svm.SVC(kernel='rbf', C=1), n_features_to_select=2)
rfe.fit(X_train, y_train)

# Train an SVM Classifier with RFE
X_train_rfe = rfe.transform(X_train)
X_test_rfe = rfe.transform(X_test)
svm_classifier_rfe = svm.SVC(kernel='rbf', C=1)
svm_classifier_rfe.fit(X_train_rfe, y_train)

# Make predictions with RFE
y_pred_rfe = svm_classifier_rfe.predict(X_test_rfe)

# Evaluate the accuracy with RFE
accuracy_with_rfe = accuracy_score(y_test, y_pred_rfe)
print("Accuracy with RFE:", accuracy_with_rfe)
print("Classification Report with RFE:")
print(classification_report(y_test, y_pred_rfe))



In [None]:
### Q.41) Write a Python program to train an SVM Classifier and evaluate its performance using Precision, Recall, and F1-Score instead of accuracy.

In [None]:
ans) import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import precision_score, recall_score, f1_score

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train SVM classifier
svm_clf = SVC(kernel='linear', C=1.0, random_state=42)
svm_clf.fit(X_train, y_train)
y_pred = svm_clf.predict(X_test)

# Evaluate performance using Precision, Recall, and F1-Score
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print evaluation metrics
print(f'Precision: {precision:.2f}')
print(f'Recall: {recall:.2f}')
print(f'F1-Score: {f1:.2f}')


In [None]:
### Q.42)  Write a Python program to train a Naïve Bayes Classifier and evaluate its performance using Log Loss (Cross-Entropy Loss).

In [None]:
ans) import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import log_loss

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_prob = gnb.predict_proba(X_test)

# Evaluate performance using Log Loss
log_loss_value = log_loss(y_test, y_prob)

# Print evaluation metric
print(f'Log Loss: {log_loss_value:.4f}')


In [None]:
### Q.43)  Write a Python program to train an SVM Classifier and visualize the Confusion Matrix using seaborn.



In [None]:
ans) from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
import seaborn as sns
import matplotlib.pyplot as plt

# Load the iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM Classifier
svm_classifier = svm.SVC(kernel='rbf', C=1)
svm_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = svm_classifier.predict(X_test)

# Evaluate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:")
print(classification_report(y_test, y_pred))

# Create a confusion matrix
conf_mat = confusion_matrix(y_test, y_pred)

# Visualize the confusion matrix using seaborn
plt.figure(figsize=(8, 6))
sns.heatmap(conf_mat, annot=True, cmap='Blues', fmt='d')
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.title("Confusion Matrix")
plt.show()



In [None]:
### Q.44) Write a Python program to train an SVM Regressor (SVR) and evaluate its performance using Mean Absolute Error (MAE) instead of MSE.



In [None]:
ans)  import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error

# Load the California housing dataset
housing = datasets.fetch_california_housing()
X, y = housing.data, housing.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train SVR model
svr = SVR(kernel='rbf', C=1.0, epsilon=0.1)
svr.fit(X_train, y_train)
y_pred = svr.predict(X_test)

# Evaluate performance using Mean Absolute Error (MAE)
mae = mean_absolute_error(y_test, y_pred)

# Print evaluation metric
print(f'Mean Absolute Error (MAE): {mae:.4f}')


In [None]:
### Q.45)  Write a Python program to train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC score.



In [None]:
ans) from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import roc_auc_score, accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler

# Generate a sample dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=3, n_repeated=2, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train a Naïve Bayes classifier
nb_classifier = GaussianNB()
nb_classifier.fit(X_train_scaled, y_train)

# Make predictions on the test set
y_pred = nb_classifier.predict(X_test_scaled)

# Evaluate the classifier's performance using ROC-AUC score
y_pred_proba = nb_classifier.predict_proba(X_test_scaled)[:, 1]
roc_auc = roc_auc_score(y_test, y_pred_proba)
print("ROC-AUC Score:", roc_auc)

# Evaluate the classifier's performance using accuracy score and classification report
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:")
print(classification_report(y_test, y_pred))


In [None]:
### Q.46) Write a Python program to train an SVM Classifier and visualize the Precision-Recall Curve.

In [None]:
ans)  import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import precision_recall_curve, auc

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train SVM classifier
svm_clf = SVC(kernel='linear', C=1.0, probability=True, random_state=42)
svm_clf.fit(X_train, y_train)
y_prob = svm_clf.predict_proba(X_test)[:, 1]

# Compute Precision-Recall curve
precision, recall, _ = precision_recall_curve(y_test, y_prob)
pr_auc = auc(recall, precision)

# Plot Precision-Recall curve
plt.figure(figsize=(6, 4))
plt.plot(recall, precision, marker='.', label=f'PR AUC = {pr_auc:.2f}')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.legend()
plt.grid()
plt.show()

