# Module 1: Introduction to Scikit-Learn

## Section 2: Supervised Learning Algorithms

### Part 13: Naive Bayes Classifiers

In this section, we will explore Naive Bayes classifiers, a family of simple yet powerful supervised learning algorithms based on Bayes' theorem. Naive Bayes classifiers are widely used for classification tasks and are particularly effective when dealing with high-dimensional data.

### 13.1 Understanding Naive Bayes Classifiers

Naive Bayes classifiers are probabilistic models that use Bayes' theorem to make predictions. They assume that the features are conditionally independent of each other given the class label. This assumption simplifies the computation and makes Naive Bayes classifiers computationally efficient.

Naive Bayes classifiers calculate the probability of each class label given the observed feature values and select the label with the highest probability as the predicted class.

The assumption of feature independence is a key assumption in Naive Bayes classifiers. Although this assumption may not hold in all datasets, Naive Bayes classifiers can still perform well in practice, especially with large amounts of training data.

When dealing with imbalanced datasets, Naive Bayes classifiers may produce biased models.

The three main variants of Naive Bayes classifiers are:
- Gaussian Naive Bayes: Assumes continuous features follow a Gaussian distribution. Suitable for real-valued data.
- Multinomial Naive Bayes: Designed for discrete data, such as text documents with word counts. It models the probability of observing specific counts of discrete features.
- Bernoulli Naive Bayes: Suited for binary or Boolean features, where each feature is a binary variable, representing presence or absence.

It is important to choose the appropriate Naive Bayes classifier based on the nature of the features in your dataset.

### 13.2 Training and Evaluation

To train a Naive Bayes classifier, we need a labeled dataset with the target variable and the corresponding feature values. The model learns the probabilities of the feature values given each class label from the training data.

Once trained, we can evaluate the model's performance using evaluation metrics suitable for classification tasks, such as accuracy, precision, recall, F1-score, or area under the ROC curve (AUC-ROC).

#### Gaussian Naive Bayes Example

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from mpl_toolkits.mplot3d import Axes3D

np.random.seed(0)
class_samples = 100
class1_attributes = np.random.randn(class_samples, 3) + np.array([2, 2, 2])
class1_labels = np.zeros(class_samples)
class2_attributes = np.random.randn(class_samples, 3) + np.array([4, 4, 4])
class2_labels = np.ones(class_samples)
class3_attributes = np.random.randn(class_samples, 3) + np.array([6, 6, 6])
class3_labels = 2 * np.ones(class_samples)

# Combine the classes
X = np.vstack((class1_attributes, class2_attributes, class3_attributes))
y = np.hstack((class1_labels, class2_labels, class3_labels))

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a Gaussian Naive Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Plot the custom dataset and predictions
fig = plt.figure(figsize=(8,4))
ax1 = fig.add_subplot(121, projection='3d')
ax1.scatter(X_test[:, 0], X_test[:, 1], X_test[:, 2], c=y_test, cmap=plt.cm.Set1, edgecolor='k')
ax1.set_xlabel('Feature 1')
ax1.set_ylabel('Feature 2')
ax1.set_zlabel('Feature 3')
ax1.set_title('Original Test Data')
ax2 = fig.add_subplot(122, projection='3d')
ax2.scatter(X_test[:, 0], X_test[:, 1], X_test[:, 2], c=y_pred, cmap=plt.cm.Set1, edgecolor='k')
ax2.set_xlabel('Feature 1')
ax2.set_ylabel('Feature 2')
ax2.set_zlabel('Feature 3')
ax2.set_title('Test Data with Predictions')
plt.tight_layout()
plt.show()

In this example, a custom dataset was generated with three classes, each exhibiting distinct clusters of three continuous attributes. The dataset was split into training and testing sets. A Gaussian Naive Bayes classifier was trained on the training data and used to predict class labels for the test data. The classifier achieved an accuracy of 92%, indicating its ability to correctly classify data points into the appropriate classes. The 3D scatter plots visualize the original test data and the corresponding predictions. Despite the simplicity of the Gaussian Naive Bayes algorithm, it effectively handled the multi-class, continuous attribute dataset, demonstrating its suitability for classification tasks involving continuous data with multiple classes.

#### Multinomial Naive Bayes Example

In [None]:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Sample text data
corpus = [
    "this is a positive sentence",
    "negative sentiment in this text",
    "we have a neutral document here",
    "positive feedback is always appreciated",
    "negative comments are not welcome",
    "neutral statements are neither good nor bad"
]
labels = [1, 0, 2, 1, 0, 2]
X_train, X_test, y_train, y_test = train_test_split(corpus, labels, test_size=0.3, random_state=42)

vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

clf = MultinomialNB()
clf.fit(X_train_vec, y_train)
y_pred = clf.predict(X_test_vec)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

In this example, we start with a collection of text documents and their corresponding labels. We split the data into training and testing sets, then use the CountVectorizer to convert the text data into numerical features. Finally, we train a Multinomial Naive Bayes classifier on the training data and evaluate its accuracy on the test data.

This example demonstrates how to use Multinomial Naive Bayes for text classification tasks, where the input features are counts of words or other discrete items.

#### Bernoulli Naive Bayes Example

In [None]:
import numpy as np
from sklearn.datasets import load_wine
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Load the wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Convert the feature values to binary using a threshold
X_train_binary = (X_train > X_train.mean(axis=0)).astype(int)
X_test_binary = (X_test > X_train.mean(axis=0)).astype(int)

# Train a Bernoulli Naive Bayes classifier
clf = BernoulliNB()
clf.fit(X_train_binary, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test_binary)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

In this example, we load the "wine" dataset, which contains features related to the chemical composition of wines, and we perform binary conversion of feature values based on a threshold. Then, we train a Bernoulli Naive Bayes classifier to predict the wine classes based on the binary feature values and calculate the classifier's accuracy on the test set.

### 13.4 Summary

Naive Bayes classifiers are a family of probabilistic machine learning algorithms commonly used for classification tasks. They are based on Bayes' theorem and the "naive" assumption of feature independence, which simplifies calculations. Despite this simplification, they often perform surprisingly well in practice, especially when dealing with high-dimensional datasets.

Naive Bayes classifiers are efficient, have low computational requirements, and work well even with limited training data. They find applications in spam detection, sentiment analysis, document classification, and more, where they can provide competitive accuracy with minimal tuning. However, the "naive" independence assumption may not hold in all real-world datasets, impacting their performance.