## Using Naive Bayes

Data Loading:

The load_breast_cancer dataset is loaded, and features (X) and labels (y) are extracted.
Data Preprocessing:

MinMaxScaler is used to scale the data for algorithms like GaussianNB and ComplementNB.
Binarizer is used to binarize the data for BernoulliNB and MultinomialNB.
Model Training:

Each Naive Bayes algorithm is trained on the preprocessed data.
BernoulliNB and MultinomialNB use binarized data, while others use scaled data.
Evaluation:

Each model is evaluated using accuracy and a classification report.
Results are stored in a dictionary for comparison.
Model Selection:

The model with the highest accuracy is selected as the best model.
Insights:
- GaussianNB often performs well on continuous data.
- BernoulliNB is suitable for binary features.
- MultinomialNB works well with count data.
- ComplementNB is designed for imbalanced datasets.
- CategoricalNB is ideal for categorical features.

In [None]:
import warnings

warnings.filterwarnings("ignore", category=DeprecationWarning, module=".*pandas.*")

In [6]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.naive_bayes import BernoulliNB, CategoricalNB, ComplementNB, GaussianNB, MultinomialNB
from sklearn.preprocessing import MinMaxScaler, Binarizer
from sklearn.metrics import classification_report, accuracy_score

# Load the breast cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Preprocessing for different Naive Bayes algorithms
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

binarizer = Binarizer()
X_train_binarized = binarizer.fit_transform(X_train_scaled)
X_test_binarized = binarizer.transform(X_test_scaled)

# Initialize models
models = {
    "BernoulliNB": BernoulliNB(),
    "CategoricalNB": CategoricalNB(),
    "ComplementNB": ComplementNB(),
    "GaussianNB": GaussianNB(),
    "MultinomialNB": MultinomialNB()
}

from sklearn.preprocessing import KBinsDiscretizer

# Preprocessing for CategoricalNB
discretizer = KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='uniform')
X_train_discretized = discretizer.fit_transform(X_train)
X_test_discretized = discretizer.transform(X_test)

# Update the training and evaluation loop
for name, model in models.items():
    if name == "CategoricalNB":
        model.fit(X_train_discretized, y_train)
        y_pred = model.predict(X_test_discretized)
    elif name in ["BernoulliNB", "MultinomialNB"]:
        model.fit(X_train_binarized, y_train)
        y_pred = model.predict(X_test_binarized)
    else:
        model.fit(X_train_scaled, y_train)
        y_pred = model.predict(X_test_scaled)
    
    accuracy = accuracy_score(y_test, y_pred)
    report = classification_report(y_test, y_pred, target_names=data.target_names)
    results[name] = {
        "accuracy": accuracy,
        "classification_report": report
    }

# Display results
for name, result in results.items():
    print(f"Model: {name}")
    print(f"Accuracy: {result['accuracy']:.4f}")
    print("Classification Report:")
    print(result['classification_report'])
    print("-" * 50)

# Select the best model
best_model = max(results, key=lambda x: results[x]["accuracy"])
print(f"The best model is {best_model} with an accuracy of {results[best_model]['accuracy']:.4f}")

Model: BernoulliNB
Accuracy: 0.6316
Classification Report:
              precision    recall  f1-score   support

   malignant       0.00      0.00      0.00        63
      benign       0.63      1.00      0.77       108

    accuracy                           0.63       171
   macro avg       0.32      0.50      0.39       171
weighted avg       0.40      0.63      0.49       171

--------------------------------------------------
Model: CategoricalNB
Accuracy: 0.9357
Classification Report:
              precision    recall  f1-score   support

   malignant       0.92      0.90      0.91        63
      benign       0.94      0.95      0.95       108

    accuracy                           0.94       171
   macro avg       0.93      0.93      0.93       171
weighted avg       0.94      0.94      0.94       171

--------------------------------------------------
Model: ComplementNB
Accuracy: 0.8480
Classification Report:
              precision    recall  f1-score   support

   malign

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
