# Naive Bayes Classifier

### **1. GaussianNB()**

-   **What**: **Gaussian Naive Bayes** is used when the features are
    continuous and assumed to follow a **Gaussian (Normal)
    distribution**.

-   **How it works**:

    -   The model assumes that for each class, the features follow a
        normal distribution (bell curve).
    -   For each feature, it computes the mean and variance of the data
        for each class and uses them to predict the class label for new
        data.

-   **When to use**:

    -   Use **GaussianNB** when your data consists of **continuous
        features** that are normally distributed (e.g., height, weight,
        or test scores).

-   **Example**:

    ``` python
    python
    CopyEdit
    from sklearn.naive_bayes import GaussianNB
    gnb = GaussianNB()
    gnb.fit(X_train, y_train)
    ```

### **2. MultinomialNB()**

-   **What**: **Multinomial Naive Bayes** is used for **discrete count
    data** or **categorical data** that is often modeled using the
    **multinomial distribution**.

-   **How it works**:

    -   It’s commonly used for text classification where features are
        the **word counts** (e.g., number of times a word appears in a
        document).
    -   It works by computing the likelihood of each class based on the
        counts of features (e.g., words).

-   **When to use**:

    -   **MultinomialNB** is great for datasets where features represent
        **counts** or **frequency** (e.g., word count vectors in NLP
        tasks).

-   **Example**:

    ``` python
    python
    CopyEdit
    from sklearn.naive_bayes import MultinomialNB
    mnb = MultinomialNB()
    mnb.fit(X_train, y_train)
    ```

### **3. BernoulliNB()**

-   **What**: **Bernoulli Naive Bayes** is used when the features are
    binary (0 or 1), i.e., they represent the presence or absence of a
    particular feature.

-   **How it works**:

    -   The model assumes that the features follow a **Bernoulli
        distribution** (each feature is either present or absent,
        encoded as 1 or 0).

-   **When to use**:

    -   **BernoulliNB** is good when the features represent binary data,
        like the presence or absence of a word in a text document (e.g.,
        binary bag-of-words).

-   **Example**:

    ``` python
    python
    CopyEdit
    from sklearn.naive_bayes import BernoulliNB
    bnb = BernoulliNB()
    bnb.fit(X_train, y_train)
    ```

### **4. ComplementNB()**

-   **What**: **Complement Naive Bayes** is a variant of
    **MultinomialNB** and is designed to handle **imbalanced datasets**
    better.

-   **How it works**:

    -   It tries to correct the bias in **MultinomialNB** when the
        classes are imbalanced by computing a “complement” or a
        corrective measure to balance the influence of each class.

-   **When to use**:

    -   **ComplementNB** is helpful when your data has imbalanced
        classes.

-   **Example**:

    ``` python
    python
    CopyEdit
    from sklearn.naive_bayes import ComplementNB
    cnb = ComplementNB()
    cnb.fit(X_train, y_train)
    ```

-   Iris

    ``` python
    import pandas as pd
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    from sklearn.naive_bayes import GaussianNB
    from sklearn.preprocessing import StandardScaler
    from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
    import seaborn as sns
    import matplotlib.pyplot as plt

    # Load data
    iris = load_iris()
    X = iris.data
    y = iris.target

    # Scale
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

    # Model
    model = GaussianNB()
    model.fit(X_train, y_train)

    # Predict & evaluate
    y_pred = model.predict(X_test)
    print(f"Iris Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")
    sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, cmap='Blues', xticklabels=iris.target_names, yticklabels=iris.target_names)
    plt.title("Confusion Matrix - Iris")
    plt.show()
    print(classification_report(y_test, y_pred, target_names=iris.target_names))
    ```

-   Breast Cancer Wisconsin

    ``` python
    df = pd.read_csv('breast_cancer.csv')
    df = df.loc[:, ~df.columns.str.contains('^Unnamed|id', case=False)]
    df['diagnosis'] = df['diagnosis'].map({'M': 1, 'B': 0})

    X = df.drop('diagnosis', axis=1)
    y = df['diagnosis']

    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

    model = GaussianNB()
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    print(f"Breast Cancer Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")
    sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, cmap='Blues', xticklabels=['Benign', 'Malignant'], yticklabels=['Benign', 'Malignant'])
    plt.title("Confusion Matrix - Breast Cancer")
    plt.show()
    print(classification_report(y_test, y_pred, target_names=['Benign', 'Malignant']))
    ```

-   Titanic

    ``` python
    # Titanic: assumes you have a preprocessed CSV with no missing values
    df = pd.read_csv('titanic.csv')

    # Example preprocessing (adjust based on your CSV)
    df = df[['Pclass', 'Sex', 'Age', 'Fare', 'Survived']]
    df.dropna(inplace=True)
    df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})

    X = df.drop('Survived', axis=1)
    y = df['Survived']

    # Scale
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    # Train-test
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

    # Naive Bayes
    model = GaussianNB()
    model.fit(X_train, y_train)

    # Predict & evaluate
    y_pred = model.predict(X_test)
    print(f"Titanic Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")
    sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, cmap='Blues', xticklabels=['Died', 'Survived'], yticklabels=['Died', 'Survived'])
    plt.title("Confusion Matrix - Titanic")
    plt.show()
    print(classification_report(y_test, y_pred, target_names=['Died', 'Survived']))
    ```