#Theoretical Questions

**1: What is a Support Vector Machine (SVM)?**  
SVM is a supervised learning algorithm used for classification and regression. It finds the optimal hyperplane that maximizes the margin between different classes in a dataset. It works well in high-dimensional spaces and can handle non-linearly separable data using kernel functions.

---

**2: What is the difference between Hard Margin and Soft Margin SVM?**  
Hard Margin SVM strictly separates data with no misclassification, making it sensitive to noise. Soft Margin SVM allows some misclassification by introducing a slack variable, balancing margin maximization and classification accuracy, making it more suitable for real-world, noisy datasets.

---

**3: What is the mathematical intuition behind SVM?**  
SVM aims to find a hyperplane \( w \cdot x + b = 0 \) that maximizes the margin between two classes. It minimizes \( ||w||^2 \) subject to correct classification constraints. When data is non-linearly separable, kernel functions map it to a higher-dimensional space for separation.

---

**4: What is the role of Lagrange Multipliers in SVM?**  
Lagrange multipliers help transform the constrained optimization problem into an unconstrained one using the Lagrangian function. They allow SVM to maximize the margin while enforcing classification constraints through the Karush-Kuhn-Tucker (KKT) conditions.

---

**5: What are Support Vectors in SVM?**  
Support vectors are data points that lie closest to the decision boundary (hyperplane). They determine the position and orientation of the hyperplane and are crucial for maximizing the margin between classes.

---

**6: What is a Support Vector Classifier (SVC)?**  
SVC is the classification variant of SVM that finds an optimal hyperplane for separating different classes. It can handle both linearly and non-linearly separable data using kernel functions.

---

**7: What is a Support Vector Regressor (SVR)?**  
SVR applies SVM to regression tasks by fitting a hyperplane within a margin of tolerance (ε) around the actual data points. It minimizes prediction error while avoiding unnecessary complexity.

---

**8: What is the Kernel Trick in SVM?**  
The kernel trick maps input data to a higher-dimensional space where it becomes linearly separable, without explicitly computing the transformation. Common kernels include linear, polynomial, and radial basis function (RBF).

---

**9: Compare Linear Kernel, Polynomial Kernel, and RBF Kernel.**  
- **Linear Kernel:** Best for linearly separable data, simple and fast.  
- **Polynomial Kernel:** Captures complex relationships, but computationally expensive.  
- **RBF Kernel:** Handles highly non-linear data by mapping it to infinite dimensions, but requires tuning.

---

**10: What is the effect of the C parameter in SVM?**  
The **C parameter** controls the trade-off between margin size and classification accuracy. A **high C** focuses on minimizing misclassification, risking overfitting. A **low C** allows a larger margin but may misclassify more points, improving generalization.

---

**11: What is the role of the Gamma parameter in RBF Kernel SVM?**  
Gamma controls how far the influence of a single training point reaches. **High gamma** captures fine details but risks overfitting, while **low gamma** generalizes better but may underfit.

---

**12: What is the Naïve Bayes classifier, and why is it called "Naïve"?**  
Naïve Bayes is a probabilistic classifier based on Bayes' Theorem, assuming feature independence. It’s called "naïve" because it assumes that all features contribute independently to the outcome, which may not hold in real-world data.

---

**13: What is Bayes’ Theorem?**  
Bayes’ Theorem states that:  
\[
P(A|B) = \frac{P(B|A) P(A)}{P(B)}
\]
It calculates the probability of event A given event B, based on prior probabilities.

---

**14: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes.**  
- **Gaussian NB:** Assumes features follow a normal distribution. Best for continuous data.  
- **Multinomial NB:** Works with frequency-based discrete data. Used in text classification.  
- **Bernoulli NB:** Handles binary feature data, often applied in spam detection.

---

**15: When should you use Gaussian Naïve Bayes over other variants?**  
Use Gaussian NB when features are continuous and follow a normal distribution, such as in medical diagnosis or sensor data classification.

---

**16: What are the key assumptions made by Naïve Bayes?**  
1. **Feature independence** (unlikely in real-world data).  
2. **Equal importance of all features.**  
3. **Correctness of prior probabilities.**

---

**17: What are the advantages and disadvantages of Naïve Bayes?**  
**Advantages:** Fast, requires little training data, works well with high-dimensional data, and is easy to interpret.  
**Disadvantages:** Assumes feature independence, struggles with correlated features, and may not perform well on complex datasets.

---

**18: Why is Naïve Bayes a good choice for text classification?**  
Naïve Bayes is effective for text classification because it efficiently handles high-dimensional sparse data, works well with bag-of-words models, and provides fast predictions.

---

**19: Compare SVM and Naïve Bayes for classification tasks.**  
- **SVM:** Works well with complex, high-dimensional data but is computationally expensive.  
- **Naïve Bayes:** Fast, simple, and effective for text data but relies on strong independence assumptions.

---

**20: How does Laplace Smoothing help in Naïve Bayes?**  
Laplace Smoothing prevents zero probabilities by adding a small constant to all feature counts. It ensures unseen words or features do not completely invalidate the probability estimation.

#Practical Questions

In [None]:
# 21: Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy.
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the SVM Classifier
svm_classifier = SVC(kernel='linear')
svm_classifier.fit(X_train, y_train)

# Make predictions
y_pred = svm_classifier.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of SVM Classifier on Iris dataset: {accuracy:.2f}")

In [None]:
#22: Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.
from sklearn.datasets import load_wine
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM Classifier with Linear kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)

# Train SVM Classifier with RBF kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Compare accuracies
print(f"Accuracy of SVM Classifier with Linear kernel: {accuracy_linear:.2f}")
print(f"Accuracy of SVM Classifier with RBF kernel: {accuracy_rbf:.2f}")

In [None]:
#23: Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean Squared Error (MSE)
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston

# Load the Boston housing dataset
boston = load_boston()
X = boston.data
y = boston.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the SVM Regressor
svr = SVR(kernel='linear')
svr.fit(X_train, y_train)

# Make predictions
y_pred = svr.predict(X_test)

# Evaluate using Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error of SVR on housing dataset: {mse:.2f}")

In [None]:
#24: Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision boundary.
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification

# Create a synthetic dataset for visualization
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, n_informative=2, n_redundant=0, random_state=42)

# Train SVM Classifier with Polynomial kernel
svm_poly = SVC(kernel='poly', degree=3)
svm_poly.fit(X, y)

# Create a mesh to plot the decision boundary
xx, yy = np.meshgrid(np.linspace(X[:, 0].min()-1, X[:, 0].max()+1, 100),
                     np.linspace(X[:, 1].min()-1, X[:, 1].max()+1, 100))
Z = svm_poly.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plotting
plt.contourf(xx, yy, Z, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o')
plt.title('SVM Classifier with Polynomial Kernel')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

In [None]:
#25: Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.
from sklearn.datasets import load_breast_cancer
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the Gaussian Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of Gaussian Naïve Bayes on Breast Cancer dataset: {accuracy:.2f}")

In [None]:
#26: Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20 Newsgroups dataset.
from sklearn.datasets import fetch_20newsgroups
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline

# Load the 20 Newsgroups dataset
data = fetch_20newsgroups(subset='train')
X = data.data
y = data.target

# Create a pipeline with CountVectorizer and MultinomialNB
model = make_pipeline(CountVectorizer(), MultinomialNB())
model.fit(X, y)

# Evaluate on the test set
test_data = fetch_20newsgroups(subset='test')
y_pred = model.predict(test_data.data)
accuracy = accuracy_score(test_data.target, y_pred)

print(f"Accuracy of Multinomial Naïve Bayes on 20 Newsgroups dataset: {accuracy:.2f}")

In [None]:
#27 Write a Python program to train an SVM Classifier with different C values and compare the decision boundaries visually.
# Create a synthetic dataset for visualization
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, n_informative=2, n_redundant=0, random_state=42)

# Set up the figure
plt.figure(figsize=(12, 6))

# Train SVM Classifier with different C values
for i, C in enumerate([0.1, 1, 10]):
    svm = SVC(kernel='linear', C=C)
    svm.fit(X, y)

    # Create a mesh to plot the decision boundary
    xx, yy = np.meshgrid(np.linspace(X[:, 0].min()-1, X[:, 0].max()+1, 100),
                         np.linspace(X[:, 1].min()-1, X[:, 1].max()+1, 100))
    Z = svm.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plotting
    plt.subplot(1, 3, i + 1)
    plt.contourf(xx, yy, Z, alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o')
    plt.title(f'SVM Classifier with C={C}')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')

plt.tight_layout()
plt.show()

In [None]:
#28 Write a Python program to train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with binary features.
from sklearn.naive_bayes import BernoulliNB
from sklearn.datasets import make_classification

# Create a synthetic binary dataset
X, y = make_classification(n_samples=100, n_features=10, n_classes=2, n_informative=2, random_state=42)

# Train the Bernoulli Naïve Bayes classifier
bnb = BernoulliNB()
bnb.fit(X, y)

# Make predictions
y_pred = bnb.predict(X)

# Evaluate accuracy
accuracy = accuracy_score(y, y_pred)
print(f"Accuracy of Bernoulli Naïve Bayes: {accuracy:.2f}")

In [None]:
#29 Write a Python program to apply feature scaling before training an SVM model and compare results with unscaled data.
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM without scaling
svm_unscaled = SVC(kernel='linear')
svm_unscaled.fit(X_train, y_train)
y_pred_unscaled = svm_unscaled.predict(X_test)
accuracy_unscaled = accuracy_score(y_test, y_pred_unscaled)

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train SVM with scaling
svm_scaled = SVC(kernel='linear')
svm_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = svm_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

# Compare accuracies
print(f"Accuracy without scaling: {accuracy_unscaled:.2f}")
print(f"Accuracy with scaling: {accuracy_scaled:.2f}")

In [None]:
#30 Write a Python program to train a Gaussian Naïve Bayes model and compare the predictions before and after Laplace Smoothing.
# Load the Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Gaussian Naïve Bayes without smoothing
gnb_no_smoothing = GaussianNB()
gnb_no_smoothing.fit(X_train, y_train)
y_pred_no_smoothing = gnb_no_smoothing.predict(X_test)
accuracy_no_smoothing = accuracy_score(y_test, y_pred_no_smoothing)

# Train Gaussian Naïve Bayes with Laplace Smoothing
gnb_with_smoothing = GaussianNB(var_smoothing=1e-9)
gnb_with_smoothing.fit(X_train, y_train)
y_pred_with_smoothing = gnb_with_smoothing.predict(X_test)
accuracy_with_smoothing = accuracy_score(y_test, y_pred_with_smoothing)

# Compare accuracies
print(f"Accuracy without Laplace Smoothing: {accuracy_no_smoothing:.2f}")
print(f"Accuracy with Laplace Smoothing: {accuracy_with_smoothing:.2f}")

In [None]:
#31 Write a Python program to train an SVM Classifier and use GridSearchCV to tune the hyperparameters (C, gamma, kernel).
from sklearn.model_selection import GridSearchCV

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Define the model
svm = SVC()

# Define the parameter grid
param_grid = {
    'C': [0.1, 1, 10],
    'gamma': [0.01, 0.1, 1],
    'kernel': ['linear', 'rbf']
}

# Set up GridSearchCV
grid_search = GridSearchCV(svm, param_grid, cv=5)
grid_search.fit(X, y)

# Best parameters and score
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best cross-validation score: {grid_search.best_score_:.2f}")

In [None]:
#32 Write a Python program to train an SVM Classifier on an imbalanced dataset and apply class weighting and check it improve accuracy
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Create an imbalanced dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1], flip_y=0, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM Classifier with class weighting
svm_weighted = SVC(class_weight='balanced')
svm_weighted.fit(X_train, y_train)

# Make predictions
y_pred_weighted = svm_weighted.predict(X_test)

# Evaluate accuracy
accuracy_weighted = accuracy_score(y_test, y_pred_weighted)
print(f"Accuracy of SVM with class weighting: {accuracy_weighted:.2f}")

In [None]:
#33 Write a Python program to implement a Naïve Bayes classifier for spam detection using email data
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline

# Sample email data
emails = [
    "Free money now!!!",
    "Hi, how are you?",
    "Win a free ticket to Bahamas!",
    "Meeting at 10am tomorrow.",
    "Congratulations! You've won a lottery!"
]
labels = [1, 0, 1, 0, 1]  # 1 for spam, 0 for not spam

# Create a pipeline with CountVectorizer and MultinomialNB
model = make_pipeline(CountVectorizer(), MultinomialNB())
model.fit(emails, labels)

# Test the model
test_emails = ["Get rich quick!", "Let's have lunch tomorrow."]
predictions = model.predict(test_emails)

print("Predictions:", predictions)  # Output: [1, 0] (spam, not spam)

In [None]:
#34 Write a Python program to train an SVM Classifier and a Naïve Bayes Classifier on the same dataset and compare their accuracy.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM Classifier
svm_model = SVC(kernel='linear')
svm_model.fit(X_train, y_train)
svm_preds = svm_model.predict(X_test)

# Train a Naïve Bayes Classifier
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
nb_preds = nb_model.predict(X_test)

# Compare accuracy
svm_accuracy = accuracy_score(y_test, svm_preds)
nb_accuracy = accuracy_score(y_test, nb_preds)

print(f"SVM Accuracy: {svm_accuracy:.2f}")
print(f"Naïve Bayes Accuracy: {nb_accuracy:.2f}")

In [None]:
#35 Write a Python program to perform feature selection before training a Naïve Bayes classifier and compare results.
from sklearn.datasets import load_breast_cancer
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Perform feature selection
selector = SelectKBest(score_func=f_classif, k=10)
X_train_selected = selector.fit_transform(X_train, y_train)
X_test_selected = selector.transform(X_test)

# Train Gaussian Naïve Bayes on selected features
gnb = GaussianNB()
gnb.fit(X_train_selected, y_train)
y_pred_selected = gnb.predict(X_test_selected)
accuracy_selected = accuracy_score(y_test, y_pred_selected)

# Train Gaussian Naïve Bayes on all features
gnb_full = GaussianNB()
gnb_full.fit(X_train, y_train)
y_pred_full = gnb_full.predict(X_test)
accuracy_full = accuracy_score(y_test, y_pred_full)

# Compare accuracies
print(f"Accuracy with feature selection: {accuracy_selected:.2f}")
print(f"Accuracy without feature selection: {accuracy_full:.2f}")


In [None]:
#36 Write a Python program to train an SVM Classifier using One-vs-Rest (OvR) and One-vs-One (OvO) strategies on the Wine dataset and compare their accuracy.
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.svm import SVC
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM Classifier with One-vs-Rest
ovr_classifier = OneVsRestClassifier(SVC(kernel='linear'))
ovr_classifier.fit(X_train, y_train)
y_pred_ovr = ovr_classifier.predict(X_test)
accuracy_ovr = accuracy_score(y_test, y_pred_ovr)

# Train SVM Classifier with One-vs-One
ovo_classifier = OneVsOneClassifier(SVC(kernel='linear'))
ovo_classifier.fit(X_train, y_train)
y_pred_ovo = ovo_classifier.predict(X_test)
accuracy_ovo = accuracy_score(y_test, y_pred_ovo)

# Compare accuracies
print(f"Accuracy of SVM with One-vs-Rest: {accuracy_ovr:.2f}")
print(f"Accuracy of SVM with One-vs-One: {accuracy_ovo:.2f}")

In [None]:
#37. Write a Python program to train an SVM Classifier using Linear, Polynomial, and RBF kernels on the Breast Cancer dataset and compare their accuracy.
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM Classifier with Linear kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)

# Train SVM Classifier with Polynomial kernel
svm_poly = SVC(kernel='poly', degree=3)
svm_poly.fit(X_train, y_train)
y_pred_poly = svm_poly.predict(X_test)
accuracy_poly = accuracy_score(y_test, y_pred_poly)

# Train SVM Classifier with RBF kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Compare accuracies
print(f"Accuracy of SVM with Linear kernel: {accuracy_linear:.2f}")
print(f"Accuracy of SVM with Polynomial kernel: {accuracy_poly:.2f}")
print(f"Accuracy of SVM with RBF kernel: {accuracy_rbf:.2f}")

In [None]:
#38. Write a Python program to train an SVM Classifier using Stratified K-Fold Cross-Validation and compute the average accuracy.
from sklearn.model_selection import StratifiedKFold
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Set up Stratified K-Fold
skf = StratifiedKFold(n_splits=5)
accuracies = []

# Perform Stratified K-Fold Cross-Validation
for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    svm = SVC(kernel='linear')
    svm.fit(X_train, y_train)
    y_pred = svm.predict(X_test)
    accuracies.append(accuracy_score(y_test, y_pred))

# Compute average accuracy
average_accuracy = np.mean(accuracies)
print(f"Average accuracy using Stratified K-Fold: {average_accuracy:.2f}")

In [None]:
# 39. Write a Python program to train a Naïve Bayes classifier using different prior probabilities and compare performance.
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Gaussian Naïve Bayes with default priors
gnb_default = GaussianNB()
gnb_default.fit(X_train, y_train)
y_pred_default = gnb_default.predict(X_test)
accuracy_default = accuracy_score(y_test, y_pred_default)

# Train Gaussian Naïve Bayes with custom priors
priors = [0.6, 0.4]  # Example of custom priors
gnb_custom = GaussianNB(priors=priors)
gnb_custom.fit(X_train, y_train)
y_pred_custom = gnb_custom.predict(X_test)
accuracy_custom = accuracy_score(y_test, y_pred_custom)

# Compare accuracies
print(f"Accuracy with default priors: {accuracy_default:.2f}")
print(f"Accuracy with custom priors: {accuracy_custom:.2f}")

In [None]:
#40. Write a Python program to perform Recursive Feature Elimination (RFE) before training an SVM Classifier and compare accuracy.
from sklearn.feature_selection import RFE
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Perform Recursive Feature Elimination
svm = SVC(kernel='linear')
selector = RFE(svm, n_features_to_select=10)
X_train_rfe = selector.fit_transform(X_train, y_train)
X_test_rfe = selector.transform(X_test)

# Train SVM on selected features
svm.fit(X_train_rfe, y_train)
y_pred_rfe = svm.predict(X_test_rfe)
accuracy_rfe = accuracy_score(y_test, y_pred_rfe)

# Train SVM on all features
svm_full = SVC(kernel='linear')
svm_full.fit(X_train, y_train)
y_pred_full = svm_full.predict(X_test)
accuracy_full = accuracy_score(y_test, y_pred_full)

# Compare accuracies
print(f"Accuracy with RFE: {accuracy_rfe:.2f}")
print(f"Accuracy without RFE: {accuracy_full:.2f}")

In [None]:
#41.Write a Python program to train an SVM Classifier and evaluate its performance using Precision, Recall, and F1-Score instead of accuracy.
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load the Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM Classifier
svm = SVC(kernel='linear')
svm.fit(X_train, y_train)

# Make predictions
y_pred = svm.predict(X_test)

# Evaluate performance
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {f1:.2f}")

In [None]:
#42. Write a Python program to train a Naïve Bayes Classifier and evaluate its performance using Log Loss (Cross-Entropy Loss).
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss

# Load the Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Gaussian Naïve Bayes
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions
y_pred_proba = gnb.predict_proba(X_test)

# Evaluate using Log Loss
loss = log_loss(y_test, y_pred_proba)
print(f"Log Loss of Naïve Bayes Classifier: {loss:.2f}")

In [None]:
#43 Write a Python program to train an SVM Classifier and visualize the Confusion Matrix using seaborn.
import seaborn as sns
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt

# Load the Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM Classifier
svm = SVC(kernel='linear')
svm.fit(X_train, y_train)

# Make predictions
y_pred = svm.predict(X_test)

# Create confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Visualize confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Benign', 'Malignant'], yticklabels=['Benign', 'Malignant'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('Confusion Matrix')
plt.show()

In [None]:
#44.Write a Python program to train an SVM Regressor (SVR) and evaluate its performance using Mean Absolute Error (MAE) instead of MSE.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import GridSearchCV

# Generate a synthetic regression dataset
X, y = make_regression(n_samples=200, n_features=1, noise=0.1, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create and train the SVR model with different kernels
kernels = ['linear', 'poly', 'rbf']
svr_models = {}
mae_scores = {}

for kernel in kernels:
    # Create and train the SVR model
    svr = SVR(kernel=kernel)
    svr.fit(X_train_scaled, y_train)

    # Make predictions
    y_pred = svr.predict(X_test_scaled)

    # Calculate MAE
    mae = mean_absolute_error(y_test, y_pred)

    # Store the model and score
    svr_models[kernel] = svr
    mae_scores[kernel] = mae

    print(f"SVR with {kernel} kernel - MAE: {mae:.4f}")

# Find the best kernel based on MAE
best_kernel = min(mae_scores, key=mae_scores.get)
print(f"\nBest kernel: {best_kernel} with MAE: {mae_scores[best_kernel]:.4f}")

# Fine-tune the best model using GridSearchCV
if best_kernel == 'rbf':
    param_grid = {
        'C': [0.1, 1, 10, 100],
        'gamma': [0.01, 0.1, 1, 'scale', 'auto'],
        'epsilon': [0.01, 0.1, 0.2, 0.5]
    }
elif best_kernel == 'poly':
    param_grid = {
        'C': [0.1, 1, 10, 100],
        'degree': [2, 3, 4],
        'epsilon': [0.01, 0.1, 0.2, 0.5]
    }
else:  # linear
    param_grid = {
        'C': [0.1, 1, 10, 100],
        'epsilon': [0.01, 0.1, 0.2, 0.5]
    }

grid_search = GridSearchCV(
    SVR(kernel=best_kernel),
    param_grid,
    cv=5,
    scoring='neg_mean_absolute_error',
    verbose=1
)

grid_search.fit(X_train_scaled, y_train)
best_params = grid_search.best_params_
best_score = -grid_search.best_score_  # Convert back to positive MAE

print(f"\nBest parameters: {best_params}")
print(f"Best cross-validation MAE: {best_score:.4f}")

# Evaluate the optimized model on the test set
best_svr = grid_search.best_estimator_
y_pred_best = best_svr.predict(X_test_scaled)
final_mae = mean_absolute_error(y_test, y_pred_best)
print(f"Final test set MAE with optimized model: {final_mae:.4f}")

# Visualize the results
plt.figure(figsize=(12, 6))

# Sort the data for better visualization
sort_idx = np.argsort(X_test.flatten())
X_test_sorted = X_test[sort_idx]
y_test_sorted = y_test[sort_idx]
y_pred_best_sorted = y_pred_best[sort_idx]

plt.scatter(X_test, y_test, color='blue', label='Actual data')
plt.plot(X_test_sorted, y_pred_best_sorted, color='red', linewidth=2, label=f'SVR prediction (MAE: {final_mae:.4f})')
plt.title(f'SVR Regression with {best_kernel} kernel (Optimized)')
plt.xlabel('Feature')
plt.ylabel('Target')
plt.legend()
plt.grid(True)
plt.show()

In [None]:
#45.Write a Python program to train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC score.
from sklearn.datasets import load_breast_cancer
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

# Load the Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Gaussian Naïve Bayes
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions
y_pred_proba = gnb.predict_proba(X_test)[:, 1]  # Get probabilities for the positive class

# Evaluate using ROC-AUC score
roc_auc = roc_auc_score(y_test, y_pred_proba)
print(f"ROC-AUC score of Naïve Bayes Classifier: {roc_auc:.4f}")

In [None]:
#46. Write a Python program to train an SVM Classifier and visualize the Precision-Recall Curve.
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_curve
import matplotlib.pyplot as plt

# Load the Breast Cancer dataset
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM Classifier
svm = SVC(kernel='linear', probability=True)  # Enable probability estimates
svm.fit(X_train, y_train)

# Get predicted probabilities
y_scores = svm.predict_proba(X_test)[:, 1]

# Calculate precision and recall
precision, recall, _ = precision_recall_curve(y_test, y_scores)

# Plot Precision-Recall Curve
plt.figure(figsize=(8, 6))
plt.plot(recall, precision, marker='.')
plt.title('Precision-Recall Curve')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.grid()
plt.show()