# 🧪 Naive Bayes Classifier on Breast Cancer Dataset

In this notebook, we will apply the **Naive Bayes algorithm** on the Breast Cancer dataset.  
We will go step by step:  
1. Import dataset  
2. Explore data  
3. Train-test split  
4. Standardization  
5. Train Naive Bayes  
6. Evaluate performance with metrics & plots  


## 1. Import required libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import (
    accuracy_score, confusion_matrix, classification_report, roc_curve, auc,
    precision_recall_curve, average_precision_score
)

## 2. Load the Breast Cancer dataset

In [None]:
# Load dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

print("Dataset shape:", X.shape)
print("Target classes:", data.target_names)

# Show first rows
X.head()

## 3. Explore dataset: class distribution

In [None]:
# Visualize class distribution
sns.countplot(x=y)
plt.title("Class Distribution (0=Malignant, 1=Benign)")
plt.show()

## 4. Train-test split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

## 5. Standardize features

In [None]:
# Standardizing features for better performance
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## 6. Train Gaussian Naive Bayes

In [None]:
nb = GaussianNB()
nb.fit(X_train_scaled, y_train)

# Predictions
y_pred = nb.predict(X_test_scaled)
y_prob = nb.predict_proba(X_test_scaled)[:, 1]

## 7. Model Evaluation: Accuracy & Report

In [None]:
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

## 8. Confusion Matrix

In [None]:
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues",
            xticklabels=data.target_names,
            yticklabels=data.target_names)
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

## 9. ROC Curve & AUC

In [None]:
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(6,5))
plt.plot(fpr, tpr, color="blue", label=f"AUC = {roc_auc:.2f}")
plt.plot([0,1], [0,1], color="red", linestyle="--")
plt.title("ROC Curve")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.legend()
plt.show()

## 10. Precision-Recall Curve

In [None]:
precision, recall, thresholds = precision_recall_curve(y_test, y_prob)
ap = average_precision_score(y_test, y_prob)

plt.figure(figsize=(6,5))
plt.plot(recall, precision, label=f"AP = {ap:.2f}", color="green")
plt.title("Precision-Recall Curve")
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.legend()
plt.show()

## 11. Actual vs Predicted (Line Plot)

In [None]:
comparison = pd.DataFrame({"Actual": y_test.values, "Predicted": y_pred}).reset_index(drop=True)

plt.figure(figsize=(12,5))
plt.plot(comparison.index, comparison["Actual"], label="Actual", color="blue", linewidth=2)
plt.plot(comparison.index, comparison["Predicted"], label="Predicted", color="red", linestyle="--", linewidth=2, alpha=0.7)
plt.title("Actual vs Predicted Classes (Naive Bayes)")
plt.xlabel("Sample Index")
plt.ylabel("Class (0=Malignant, 1=Benign)")
plt.legend()
plt.show()

## 12. Scatter Plot: Actual vs Predicted

In [None]:
plt.figure(figsize=(8,5))
plt.scatter(comparison.index, comparison["Actual"], label="Actual", alpha=0.7, color="blue")
plt.scatter(comparison.index, comparison["Predicted"], label="Predicted", alpha=0.5, color="red", marker="x")
plt.title("Scatter Plot: Actual vs Predicted")
plt.xlabel("Sample Index")
plt.ylabel("Class")
plt.legend()
plt.show()