# Naive Bayes

## Problem Type
**Naive Bayes** is primarily used for:
- **Classification** problems
- **Supervised** learning

### How Naive Bayes Works
- **Bayes' Theorem:**
  - Uses Bayes' theorem to calculate the probability of a class given the feature values.
  - Assumes independence between features, meaning the presence of a particular feature in a class is unrelated to the presence of any other feature.
- **Types of Naive Bayes Classifiers:**
  - **Gaussian Naive Bayes:** Assumes that features follow a Gaussian distribution (used for continuous data).
  - **Multinomial Naive Bayes:** Typically used for text classification, assumes feature vectors represent frequencies or counts (used for discrete data).
  - **Bernoulli Naive Bayes:** Used when features are binary (e.g., the presence or absence of a word in text classification).
- **Decision rule:**
  - Classifies instances by selecting the class with the highest posterior probability given the feature values.
- **Training process:**
  - Involves calculating the prior probabilities of each class and the likelihood of each feature given the class.

### Key Tuning Metrics
- **`var_smoothing`:**
  - **Description:** Portion of the largest variance of all features added to variances for stability.
  - **Impact:** Prevents division by zero errors and handles numerical stability in Gaussian Naive Bayes.
  - **Default:** `1e-9`.
- **`alpha` (for Multinomial and Bernoulli Naive Bayes):**
  - **Description:** Smoothing parameter to handle zero probabilities (Laplace smoothing).
  - **Impact:** Helps in handling features not present in the training set, especially in text classification.
  - **Default:** `1.0`.
- **`fit_prior`:**
  - **Description:** Whether to learn class prior probabilities or use uniform priors.
  - **Impact:** Affects the bias towards certain classes, useful in imbalanced datasets.
  - **Default:** `True`.

### Pros vs Cons

| Pros                                                  | Cons                                                   |
|-------------------------------------------------------|--------------------------------------------------------|
| Simple and fast to train                               | Strong assumption of feature independence, which is rarely true in real data |
| Works well with small datasets                        | Less flexible and can perform poorly with correlated features |
| Performs well with high-dimensional data              | May struggle with very small sample sizes or noisy data |
| Effective for text classification and spam filtering  | Assumes normally distributed features in Gaussian Naive Bayes, which may not hold |
| Robust to irrelevant features                         | Smoothing parameters like `alpha` require careful tuning in some cases |

### Evaluation Metrics
- **Accuracy (Classification):**
  - **Description:** Ratio of correct predictions to total predictions.
  - **Good Value:** Higher is better; values above 0.85 indicate good performance.
  - **Bad Value:** Below 0.5 suggests poor model performance.
- **Precision (Classification):**
  - **Description:** Proportion of positive identifications that were actually correct.
  - **Good Value:** Higher values indicate fewer false positives; crucial in imbalanced datasets.
  - **Bad Value:** Low values suggest many false positives.
- **Recall (Classification):**
  - **Description:** Proportion of actual positives that were correctly identified.
  - **Good Value:** Higher values indicate fewer false negatives; important in recall-sensitive applications.
  - **Bad Value:** Low values suggest many false negatives.
- **F1 Score (Classification):**
  - **Description:** Harmonic mean of Precision and Recall.
  - **Good Value:** Higher values indicate a good balance between Precision and Recall.
  - **Bad Value:** Low values suggest an imbalance, with either high false positives or false negatives.
- **AUC-ROC (Classification):**
  - **Description:** Measures the ability of the model to distinguish between classes across all thresholds.
  - **Good Value:** Values closer to 1 indicate strong separability between classes.
  - **Bad Value:** Values near 0.5 suggest random guessing.
- **Log Loss (Classification):**
  - **Description:** Measures the performance of a classification model where the output is a probability value between 0 and 1.
  - **Good Value:** Lower values indicate better model calibration and performance.
  - **Bad Value:** Higher values suggest poor probabilistic predictions.



In [None]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.metrics import (accuracy_score, classification_report,
                             confusion_matrix)
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

In [None]:
# Load the Iris dataset
iris = load_iris()

# Convert to a DataFrame for easier exploration
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target
df['species'] = df['target'].apply(lambda x: iris.target_names[x])

df.head()

In [None]:
# Features (X) and target (y)
X = iris.data
y = iris.target

# Split the dataset into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Display the shapes of the splits
print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_test shape: {y_test.shape}")

In [None]:
# Initialize the Gaussian Naive Bayes model
gnb = GaussianNB(var_smoothing = 1e-9)

# Train the model
gnb.fit(X_train, y_train)

In [None]:
# Make predictions on the test set
y_pred = gnb.predict(X_test)

# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Generate the classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# Display the confusion matrix
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))