<img src="./images/banner.png" width="800">

# Naive Bayes Classifiers

Naive Bayes is a popular machine learning algorithm used for classification tasks. It's based on Bayes' theorem and makes a strong (naive) assumption about the independence of features. Despite its simplicity, Naive Bayes often performs surprisingly well in many real-world scenarios, particularly in text classification and spam filtering.


Naive Bayes is a probabilistic classifier that makes predictions based on the probability of an object belonging to a particular class. The algorithm gets its name from two key aspects:

1. **Naive**: It assumes that features are independent of each other, which is often not true in real-world scenarios. This "naive" assumption simplifies the computation and is what makes the algorithm efficient.

2. **Bayes**: It's based on Bayes' theorem, a fundamental theorem in probability theory that describes the probability of an event based on prior knowledge of conditions that might be related to the event.


<img src="./images/bayes.png" width="800">

<img src="./images/nb-dist.avif" width="800">

Naive Bayes classifies by calculating the probability of an instance belonging to each class and selecting the class with the highest probability.


To understand Naive Bayes, let's consider a simple example:

Imagine you're trying to classify fruits as either apples or oranges based on their color and shape. You have observed the following:

- 70% of the fruits in your dataset are apples, 30% are oranges.
- 80% of apples are red, 20% are green.
- 70% of oranges are orange-colored, 30% are green.
- 90% of apples are round, 10% are slightly elongated.
- 80% of oranges are round, 20% are slightly elongated.


Now, if you encounter a new fruit that is red and round, Naive Bayes would calculate:

1. The probability of it being an apple given it's red and round.
2. The probability of it being an orange given it's red and round.

It would then compare these probabilities and classify the fruit as the class with the higher probability.


Naive Bayes has several advantages that make it a popular choice in machine learning:

1. **Simplicity**: It's easy to implement and understand.
2. **Efficiency**: It's computationally fast and requires less training data than many other algorithms.
3. **Performance**: Despite its simplicity, it often performs well, especially in text classification tasks.


Naive Bayes is particularly effective when the dimensionality of the input is high, making it a go-to algorithm for text classification and spam filtering.


At its core, Naive Bayes applies Bayes' theorem to make predictions. The theorem is expressed as:

$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$

Where:
- $P(A|B)$ is the probability of A given B is true.
- $P(B|A)$ is the probability of B given A is true.
- $P(A)$ and $P(B)$ are the probabilities of A and B independently.


In the context of classification, we're interested in finding the probability of a class given certain features. This can be written as:

$P(Class|Features) = \frac{P(Features|Class) \cdot P(Class)}{P(Features)}$


❗️ **Important Note:** The "naive" assumption comes into play when calculating $P(Features|Class)$. We assume that features are independent, allowing us to multiply their individual probabilities.


In the next sections, we'll dive deeper into Bayes' theorem, explore different types of Naive Bayes classifiers, and learn how to implement this algorithm in practice.

**Table of contents**<a id='toc0_'></a>    
- [Bayes' Theorem and Its Application in Classification](#toc1_)    
  - [Understanding Bayes' Theorem](#toc1_1_)    
  - [Applying Bayes' Theorem to Classification](#toc1_2_)    
  - [The Classification Rule](#toc1_3_)    
  - [An Illustrative Example](#toc1_4_)    
  - [The Naive Assumption](#toc1_5_)    
- [Types of Naive Bayes Classifiers](#toc2_)    
  - [Gaussian Naive Bayes](#toc2_1_)    
  - [Multinomial Naive Bayes](#toc2_2_)    
  - [Bernoulli Naive Bayes](#toc2_3_)    
  - [Complement Naive Bayes](#toc2_4_)    
  - [Choosing the Right Naive Bayes Variant](#toc2_5_)    
- [The 'Naive' Assumption and Its Implications](#toc3_)    
  - [Understanding the Naive Assumption](#toc3_1_)    
  - [Implications of the Naive Assumption](#toc3_2_)    
  - [When the Naive Assumption Fails](#toc3_3_)    
  - [Addressing the Limitations](#toc3_4_)    
  - [Interpreting Naive Bayes Results](#toc3_5_)    
- [Implementing Naive Bayes](#toc4_)    
  - [Implementation from Scratch](#toc4_1_)    
  - [Implementation using scikit-learn](#toc4_2_)    
  - [Handling Text Data](#toc4_3_)    
- [Advantages and Disadvantages of Naive Bayes](#toc5_)    
  - [Advantages of Naive Bayes](#toc5_1_)    
  - [Disadvantages of Naive Bayes](#toc5_2_)    
- [Real-world Applications of Naive Bayes](#toc6_)    
  - [Text Classification and Spam Filtering](#toc6_1_)    
  - [Sentiment Analysis](#toc6_2_)    
  - [Medical Diagnosis](#toc6_3_)    
  - [Recommendation Systems](#toc6_4_)    
  - [Weather Prediction](#toc6_5_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_'></a>[Bayes' Theorem and Its Application in Classification](#toc0_)

Bayes' Theorem is the cornerstone of Naive Bayes classification. It provides a way to calculate the probability of a hypothesis given observed evidence, which is exactly what we need for classification tasks.


### <a id='toc1_1_'></a>[Understanding Bayes' Theorem](#toc0_)


Bayes' Theorem, named after Reverend Thomas Bayes, is a fundamental principle in probability theory. It describes the probability of an event based on prior knowledge of conditions that might be related to the event.


<img src="./images/bayes-2.png" width="800">

The theorem is expressed mathematically as:

$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$

Where:
- $P(A|B)$ is the posterior probability: the probability of event A occurring given that B is true.
- $P(B|A)$ is the likelihood: the probability of B occurring given that A is true.
- $P(A)$ is the prior probability: the probability of A occurring regardless of any other information.
- $P(B)$ is the marginal likelihood: the probability of B occurring regardless of any other information.


🔑 **Key Concept:** Bayes' Theorem allows us to update our beliefs about the probability of an event as we gather more evidence.


### <a id='toc1_2_'></a>[Applying Bayes' Theorem to Classification](#toc0_)


In the context of classification, we can rewrite Bayes' Theorem as:

$P(Class|Features) = \frac{P(Features|Class) \cdot P(Class)}{P(Features)}$


<img src="./images/naive-bayes-classifier.webp" width="800">

Here's what each term means in a classification context:

1. $P(Class|Features)$: The probability of a class given the observed features. This is what we're trying to calculate to make a classification.

2. $P(Features|Class)$: The probability of observing these features given the class. This is calculated from the training data.

3. $P(Class)$: The prior probability of the class, regardless of the features. This is also calculated from the training data.

4. $P(Features)$: The probability of observing these features across all classes. This acts as a normalizing constant.


### <a id='toc1_3_'></a>[The Classification Rule](#toc0_)


To classify an instance, we calculate $P(Class|Features)$ for each possible class and choose the class with the highest probability. This is known as the Maximum A Posteriori (MAP) decision rule:

$Class_{predicted} = \arg\max_{Class} P(Class|Features) = \arg\max_{Class} P(Features|Class) \cdot P(Class)$


In practice, we often don't need to calculate $P(Features)$ because it's constant for all classes. We can simply compare $P(Features|Class) \cdot P(Class)$ across classes.


### <a id='toc1_4_'></a>[An Illustrative Example](#toc0_)


Let's revisit our fruit classification example:

Suppose we want to classify a fruit that is red and round. We have two classes: Apple and Orange.

Given:
- $P(Apple) = 0.7$, $P(Orange) = 0.3$
- $P(Red|Apple) = 0.8$, $P(Red|Orange) = 0.2$
- $P(Round|Apple) = 0.9$, $P(Round|Orange) = 0.8$

We can calculate:

- $P(Apple|Red,Round) \propto P(Red|Apple) \cdot P(Round|Apple) \cdot P(Apple) = 0.8 \cdot 0.9 \cdot 0.7 = 0.504$

- $P(Orange|Red,Round) \propto P(Red|Orange) \cdot P(Round|Orange) \cdot P(Orange) = 0.2 \cdot 0.8 \cdot 0.3 = 0.048$


Since 0.504 > 0.048, we would classify this fruit as an Apple.


### <a id='toc1_5_'></a>[The Naive Assumption](#toc0_)


The "naive" in Naive Bayes comes from the assumption that features are conditionally independent given the class. This means:

$P(Features|Class) = P(Feature_1|Class) \cdot P(Feature_2|Class) \cdot ... \cdot P(Feature_n|Class)$


This assumption greatly simplifies the computation, but it's often violated in real-world scenarios. Despite this, Naive Bayes often performs well in practice.


🤔 **Why This Matters:** The naive assumption allows us to easily compute probabilities even with many features, making Naive Bayes computationally efficient and effective for high-dimensional problems like text classification.


In the next sections, we'll explore different types of Naive Bayes classifiers and see how to implement this algorithm in practice.

## <a id='toc2_'></a>[Types of Naive Bayes Classifiers](#toc0_)

Naive Bayes classifiers come in several variants, each designed to handle different types of data and distributions. The main difference between these variants lies in the assumptions they make about the distribution of features.


<img src="./images/nb-types.jpg" width="800">

### <a id='toc2_1_'></a>[Gaussian Naive Bayes](#toc0_)


Gaussian Naive Bayes assumes that the continuous values associated with each class are distributed according to a Gaussian (normal) distribution. Here are the key characteristics:
- Suitable for continuous data
- Assumes features follow a normal distribution for each class
- Calculates mean and standard deviation of features for each class


The probability of a feature given a class is calculated as:

$P(x_i | y) = \frac{1}{\sqrt{2\pi\sigma_y^2}} \exp\left(-\frac{(x_i - \mu_y)^2}{2\sigma_y^2}\right)$

Where $\mu_y$ is the mean and $\sigma_y^2$ is the variance of feature $i$ for class $y$.


🔑 **Key Concept:** Gaussian Naive Bayes is often used when dealing with continuous data, such as in many scientific and engineering applications.


### <a id='toc2_2_'></a>[Multinomial Naive Bayes](#toc0_)


Multinomial Naive Bayes is typically used for discrete data, and it's particularly popular for text classification tasks. Here are the key characteristics:
- Suitable for discrete data (e.g., word counts for text classification)
- Assumes features follow a multinomial distribution
- Often used with term frequency features


The probability of a feature given a class is calculated as:

$P(x_i | y) = \frac{N_{yi} + \alpha}{N_y + \alpha n}$

Where $N_{yi}$ is the count of feature $i$ in class $y$, $N_y$ is the total count of all features in class $y$, $n$ is the number of features, and $\alpha$ is a smoothing parameter.


💡 **Pro Tip:** Multinomial Naive Bayes often works well for text classification, even with a small amount of training data.


### <a id='toc2_3_'></a>[Bernoulli Naive Bayes](#toc0_)

Bernoulli Naive Bayes is used for binary/boolean features. It's similar to Multinomial Naive Bayes but penalizes the non-occurrence of a feature that's indicative of a class. Here are the key characteristics:
- Suitable for binary/boolean features
- Assumes features are binary-valued (e.g., word presence/absence)
- Penalizes non-occurrence of features


The probability of a feature given a class is calculated as:

$P(x_i | y) = P(i | y)x_i + (1 - P(i | y))(1 - x_i)$

Where $P(i | y)$ is the probability of feature $i$ appearing in class $y$, and $x_i$ is either 1 or 0.


❗️ **Important Note:** Bernoulli Naive Bayes considers both the presence and absence of features, making it particularly effective for short texts.


### <a id='toc2_5_'></a>[Choosing the Right Naive Bayes Variant](#toc0_)


The choice of Naive Bayes variant depends on the nature of your data:

1. For continuous data, use Gaussian Naive Bayes.
2. For discrete data, especially in text classification, use Multinomial Naive Bayes.
3. For binary features or short text classification, consider Bernoulli Naive Bayes.
4. For imbalanced text datasets, try Complement Naive Bayes.


🤔 **Why This Matters:** Understanding the different types of Naive Bayes classifiers allows you to choose the most appropriate variant for your specific problem, potentially improving your model's performance.


In the next sections, we'll explore the implications of the naive assumption and learn how to implement Naive Bayes classifiers in practice.

## <a id='toc3_'></a>[The 'Naive' Assumption and Its Implications](#toc0_)

The 'naive' assumption is a fundamental aspect of Naive Bayes classifiers that greatly simplifies the model but also has important implications for its performance and interpretation.


### <a id='toc3_1_'></a>[Understanding the Naive Assumption](#toc0_)


The naive assumption, also known as the conditional independence assumption, states that all features are independent of each other given the class label.


Mathematically, this can be expressed as:

$P(X_1, X_2, ..., X_n | Y) = P(X_1 | Y) \cdot P(X_2 | Y) \cdot ... \cdot P(X_n | Y)$

Where $X_1, X_2, ..., X_n$ are features and $Y$ is the class label.


🔑 **Key Concept:** This assumption allows us to simplify the computation of the joint probability distribution over features, making Naive Bayes computationally efficient and scalable to high-dimensional data.


### <a id='toc3_2_'></a>[Implications of the Naive Assumption](#toc0_)


The naive assumption dramatically **reduces the computational complexity of the model**. Instead of having to learn the parameters of a full joint probability distribution, we only need to learn the parameters for each feature independently.


💡 **Pro Tip:** This efficiency makes Naive Bayes particularly useful for high-dimensional problems like text classification, where the number of features (words) can be very large.


Despite its simplicity, Naive Bayes often **performs surprisingly well in practice**, even when the independence assumption is violated. This is partly because:

1. The classification doesn't require precise probability estimates, only that the correct class has the highest probability.
2. The errors in the probability estimates often cancel out when we multiply many small probabilities together.


The naive assumption **introduces bias into the model**, as it simplifies the true relationship between features. However, this bias often leads to lower variance, which can be beneficial when dealing with limited training data.

$Bias^2 + Variance + Irreducible\,Error = Expected\,Prediction\,Error$


🤔 **Why This Matters:** The bias-variance tradeoff is crucial in machine learning. Naive Bayes often achieves a good balance, making it resistant to overfitting, especially with small datasets.


### <a id='toc3_3_'></a>[When the Naive Assumption Fails](#toc0_)


While Naive Bayes is surprisingly robust, there are situations where the naive assumption can lead to poor performance:

- When features are strongly correlated, Naive Bayes can overemphasize their importance, leading to skewed probability estimates.


For example, in a text classification task, the words "machine" and "learning" might frequently appear together. Naive Bayes would treat these as independent pieces of evidence, potentially overestimating their combined importance.

- If a categorical feature has a category in the test data that was not observed in the training data, the model will assign a zero probability to this class. This is known as the "zero frequency" problem.


❗️ **Important Note:** This issue is typically addressed through smoothing techniques, such as Laplace smoothing, which adds a small count to all feature/class combinations.


### <a id='toc3_4_'></a>[Addressing the Limitations](#toc0_)


Several techniques can help mitigate the limitations of the naive assumption:

1. **Feature Selection**: Removing redundant or highly correlated features can help reduce the impact of the independence assumption.

2. **Smoothing**: Techniques like Laplace smoothing help address the zero frequency problem.

3. **Ensemble Methods**: Combining Naive Bayes with other models can help compensate for its limitations.

4. **Kernel Density Estimation**: For continuous features, using kernel density estimation instead of assuming a specific distribution (like Gaussian) can capture more complex relationships.


### <a id='toc3_5_'></a>[Interpreting Naive Bayes Results](#toc0_)


When interpreting Naive Bayes results, it's important to remember that:

1. The raw probability estimates may not be well-calibrated due to the independence assumption.
2. The relative rankings of probabilities are often more reliable than their absolute values.
3. Feature importance can be assessed by examining the conditional probabilities, but be cautious of overinterpreting when features are correlated.


In conclusion, while the naive assumption is a simplification of reality, it often leads to a good balance between model complexity and performance. Understanding its implications is crucial for effectively applying and interpreting Naive Bayes classifiers.

## <a id='toc4_'></a>[Implementing Naive Bayes](#toc0_)

In this section, we'll explore how to implement Naive Bayes classifiers both from scratch and using the scikit-learn library. This dual approach will give you a deeper understanding of the algorithm's mechanics while also introducing you to practical tools for real-world applications.


### <a id='toc4_1_'></a>[Implementation from Scratch](#toc0_)


Let's implement a simple Gaussian Naive Bayes classifier from scratch. This will help us understand the inner workings of the algorithm.


In [1]:
import numpy as np
from scipy.stats import norm

class GaussianNaiveBayes:
    def fit(self, X, y):
        self.classes = np.unique(y)
        self.parameters = {}

        for c in self.classes:
            X_c = X[y == c]
            self.parameters[c] = {
                'mean': X_c.mean(axis=0),
                'var': X_c.var(axis=0),
                'prior': X_c.shape[0] / X.shape[0]
            }

    def predict(self, X):
        return np.array([self._predict(x) for x in X])

    def _predict(self, x):
        posteriors = []
        for c in self.classes:
            prior = np.log(self.parameters[c]['prior'])
            likelihood = np.sum(np.log(norm.pdf(x, self.parameters[c]['mean'], np.sqrt(self.parameters[c]['var']))))
            posterior = prior + likelihood
            posteriors.append(posterior)
        return self.classes[np.argmax(posteriors)]

Let's test our implementation on a simple dataset:


In [4]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

# Generate a random dataset
X, y = load_breast_cancer(return_X_y=True)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train and test our model
gnb = GaussianNaiveBayes()
gnb.fit(X_train, y_train)
predictions = gnb.predict(X_test)

# Calculate accuracy
accuracy = np.mean(predictions == y_test)
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 0.96


💡 **Pro Tip:** This implementation assumes Gaussian distribution for features. For other types of data, you'd need to modify the likelihood calculation accordingly.


### <a id='toc4_2_'></a>[Implementation using scikit-learn](#toc0_)


Now, let's see how to implement Naive Bayes using scikit-learn, which provides optimized implementations of various Naive Bayes classifiers.


In [8]:
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Generate a random dataset
X, y = load_breast_cancer(return_X_y=True)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the model
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

In [10]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 0.97


🔑 **Key Concept:** Scikit-learn provides implementations for different types of Naive Bayes classifiers. Use `GaussianNB` for continuous data, `MultinomialNB` for discrete data (like text classification), and `BernoulliNB` for binary features.


### <a id='toc4_3_'></a>[Handling Text Data](#toc0_)


Naive Bayes is particularly popular for text classification. Here's how you can use it for a simple text classification task:


In [18]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

# Sample data
texts = ["I love this movie", "This movie is awful", "Great acting", "Terrible plot", "I enjoyed it"]
labels = [1, 0, 1, 0, 1]  # 1 for positive, 0 for negative

# Create a pipeline
text_clf = Pipeline([
    ('vect', CountVectorizer()),
    ('clf', MultinomialNB()),
])

# Train the model
text_clf.fit(texts, labels)

# Make a prediction
new_texts = ["This film is amazing", "I hated every minute of it"]
predictions = text_clf.predict(new_texts)

print("Predictions:", predictions)

Predictions: [0 1]


❗️ **Important Note:** For text data, we typically use `MultinomialNB` instead of `GaussianNB`, as it's better suited for discrete counts (like word frequencies).


By implementing Naive Bayes both from scratch and using scikit-learn, we gain a deeper understanding of the algorithm while also learning practical tools for real-world applications. The scikit-learn implementation is optimized and provides additional features, making it the preferred choice for most practical applications.

## <a id='toc5_'></a>[Advantages and Disadvantages of Naive Bayes](#toc0_)

Understanding the strengths and limitations of Naive Bayes classifiers is crucial for effectively applying them to real-world problems. In this section, we'll explore the key advantages and disadvantages of this popular algorithm.


### <a id='toc5_1_'></a>[Advantages of Naive Bayes](#toc0_)


**Simplicity and Efficiency**

Naive Bayes is remarkably simple to implement and computationally efficient. Its training and prediction processes are typically much faster than more complex algorithms.


In [20]:
import time
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import make_classification

# Generate a large dataset
X, y = make_classification(n_samples=1000000, n_features=20, random_state=42)

# Measure training time
start_time = time.time()
gnb = GaussianNB()
gnb.fit(X, y)
training_time = time.time() - start_time

print(f"Training time: {training_time:.4f} seconds")

Training time: 0.1630 seconds


💡 **Pro Tip:** This efficiency makes Naive Bayes particularly suitable for real-time prediction tasks and large datasets.


**Performance with Small Datasets**

Naive Bayes can perform well even with limited training data. It doesn't require large amounts of data to estimate the necessary parameters.


**Handles High-Dimensional Data**

The algorithm performs well with high-dimensional data, such as text classification problems where the number of features (words) can be very large.


**Multiclass Classification**

Naive Bayes naturally extends to multiclass classification problems, making it versatile for various applications.

In [21]:
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load iris dataset (a classic multiclass problem)
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

# Train and evaluate
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)

print(f"Accuracy on multiclass problem: {accuracy_score(y_test, y_pred):.2f}")

Accuracy on multiclass problem: 0.98


**Insensitive to Irrelevant Features**

Naive Bayes is relatively robust to irrelevant features. It can handle situations where some features are not informative for the classification task.


### <a id='toc5_2_'></a>[Disadvantages of Naive Bayes](#toc0_)


**Independence Assumption**

The "naive" assumption of feature independence is often violated in real-world scenarios, which can lead to suboptimal performance in some cases.


🔑 **Key Concept:** While this assumption simplifies the model, it can sometimes lead to oversimplified predictions, especially when features are strongly correlated.


**Limited Expressiveness**

Naive Bayes cannot learn interactions between features. This limitation can make it less suitable for complex relationships in the data.


In [22]:
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Generate data where the class depends on the interaction of two features
np.random.seed(42)
X = np.random.randn(1000, 2)
y = (X[:, 0] * X[:, 1] > 0).astype(int)

# Train and evaluate
gnb = GaussianNB()
gnb.fit(X, y)
y_pred = gnb.predict(X)

print(f"Accuracy on interaction-dependent data: {accuracy_score(y, y_pred):.2f}")

Accuracy on interaction-dependent data: 0.59


**Continuous Data Assumptions**

For continuous features, Naive Bayes often assumes a specific distribution (e.g., Gaussian), which may not always hold true for the data.


**Zero Frequency Problem**

When a categorical variable has a category in the test data that was not observed in the training data, the model will assign a zero probability and be unable to make a prediction. This is known as the "zero frequency" problem.

❗️ **Important Note:** This issue is typically addressed through smoothing techniques, but it's important to be aware of its potential impact.


🤔 **Why This Matters:** Understanding these advantages and disadvantages helps in deciding when to use Naive Bayes and how to interpret its results. It's often a good baseline model and can be particularly effective for text classification and spam filtering tasks.


In practice, it's important to compare Naive Bayes with other models and consider ensemble methods to leverage its strengths while mitigating its weaknesses.

## <a id='toc6_'></a>[Real-world Applications of Naive Bayes](#toc0_)

Naive Bayes classifiers, despite their simplicity, have found widespread use in various real-world applications. Their efficiency, ability to handle high-dimensional data, and surprisingly good performance make them suitable for many practical scenarios.


### <a id='toc6_1_'></a>[Text Classification and Spam Filtering](#toc0_)


One of the most common and successful applications of Naive Bayes is in text classification, particularly spam filtering.


In [30]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Sample email data
emails = [
    "Get rich quick! Buy now!",
    "Meeting scheduled for tomorrow",
    "Claim your prize money now",
    "Project report due next week",
    "You've won a free iPhone",
    "Congratulations! You've won a free iPhone",
    "You've won a free iPhone",
    "You've won a huge amount of money",
    "Hey, I'm looking for a new job",
    "Hello, how are you?"
]
labels = [1, 0, 1, 0, 1, 1, 1, 1, 0, 0]  # 1 for spam, 0 for ham

# Create a pipeline
text_clf = Pipeline([
    ('vect', CountVectorizer()),
    ('clf', MultinomialNB()),
])

# Split the data
X_train, X_test, y_train, y_test = train_test_split(emails, labels, test_size=0.3, random_state=42)

# Train and evaluate
text_clf.fit(X_train, y_train)
y_pred = text_clf.predict(X_test)

print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")

Accuracy: 0.33


🔑 **Key Concept:** Naive Bayes works well for text classification because it can handle the high dimensionality of text data efficiently and is less prone to overfitting on small datasets.


### <a id='toc6_2_'></a>[Sentiment Analysis](#toc0_)


Naive Bayes is often used for sentiment analysis, determining whether a piece of text expresses positive, negative, or neutral sentiment.


In [33]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

# Sample reviews
reviews = [
    "This product is amazing!",
    "Terrible service, would not recommend",
    "Average experience, nothing special",
    "Absolutely love it, best purchase ever",
    "Disappointing quality, not worth the price"
]
sentiments = [1, -1, 0, 1, -1]  # 1 for positive, -1 for negative, 0 for neutral

# Create and train the model
sentiment_clf = Pipeline([
    ('vect', CountVectorizer()),
    ('clf', MultinomialNB()),
])
sentiment_clf.fit(reviews, sentiments)

predictions = sentiment_clf.predict(reviews)
print(f"Accuracy: {accuracy_score(sentiments, predictions):.2f}")

Accuracy: 1.00


In [34]:
# Predict sentiment for a new review
new_review = ["The product exceeded my expectations"]
prediction = sentiment_clf.predict(new_review)
sentiment_map = {1: "Positive", 0: "Neutral", -1: "Negative"}
print(f"Predicted sentiment: {sentiment_map[prediction[0]]}")

Predicted sentiment: Positive


### <a id='toc6_3_'></a>[Medical Diagnosis](#toc0_)


Naive Bayes can be used in medical diagnosis to predict the likelihood of a disease based on symptoms.


In [35]:
from sklearn.naive_bayes import GaussianNB
import numpy as np

# Sample medical data (simplified)
# Features: [fever, cough, fatigue, difficulty breathing]
X = np.array([
    [1, 1, 1, 1],
    [0, 0, 1, 0],
    [1, 0, 1, 1],
    [1, 1, 0, 0],
    [0, 1, 1, 0]
])
y = np.array([1, 0, 1, 0, 0])  # 1 for COVID-19 positive, 0 for negative

# Train the model
gnb = GaussianNB()
gnb.fit(X, y)

# Predict for a new patient
new_patient = np.array([[1, 1, 1, 0]])
prediction = gnb.predict(new_patient)
probability = gnb.predict_proba(new_patient)

print(f"COVID-19 prediction: {'Positive' if prediction[0] == 1 else 'Negative'}")
print(f"Probability of being positive: {probability[0][1]:.2f}")

COVID-19 prediction: Negative
Probability of being positive: 0.00


💡 **Pro Tip:** In real medical applications, much more comprehensive data and expert knowledge would be required. This example is highly simplified for illustration purposes.


### <a id='toc6_4_'></a>[Recommendation Systems](#toc0_)


Naive Bayes can be used in recommendation systems, particularly for content-based filtering.


In [36]:
from sklearn.naive_bayes import MultinomialNB
import numpy as np

# Sample user preferences data
# Features: [Action, Comedy, Romance, Sci-Fi]
X = np.array([
    [1, 0, 0, 1],
    [0, 1, 1, 0],
    [1, 0, 1, 0],
    [0, 1, 0, 1]
])
y = np.array([1, 0, 1, 0])  # 1 for like, 0 for dislike

# Train the model
mnb = MultinomialNB()
mnb.fit(X, y)

# Predict preference for a new movie
new_movie = np.array([[1, 0, 0, 1]])  # An action sci-fi movie
prediction = mnb.predict(new_movie)
probability = mnb.predict_proba(new_movie)

print(f"Prediction: {'Like' if prediction[0] == 1 else 'Dislike'}")
print(f"Probability of liking: {probability[0][1]:.2f}")

Prediction: Like
Probability of liking: 0.75


### <a id='toc6_5_'></a>[Weather Prediction](#toc0_)


Naive Bayes can be applied to simple weather prediction tasks based on observed conditions.


In [37]:
from sklearn.naive_bayes import CategoricalNB
from sklearn.preprocessing import OrdinalEncoder

# Sample weather data
X = [
    ['Sunny', 'Hot', 'High', 'Weak'],
    ['Sunny', 'Hot', 'High', 'Strong'],
    ['Overcast', 'Hot', 'High', 'Weak'],
    ['Rainy', 'Mild', 'High', 'Weak'],
    ['Rainy', 'Cool', 'Normal', 'Weak']
]
y = ['No', 'No', 'Yes', 'Yes', 'Yes']  # Play tennis or not

# Encode categorical features
enc = OrdinalEncoder()
X_encoded = enc.fit_transform(X)

# Train the model
cnb = CategoricalNB()
cnb.fit(X_encoded, y)

# Predict for new weather conditions
new_day = enc.transform([['Sunny', 'Cool', 'High', 'Strong']])
prediction = cnb.predict(new_day)
probability = cnb.predict_proba(new_day)

print(f"Prediction: {prediction[0]}")
print(f"Probability of playing tennis: {probability[0][1]:.2f}")

Prediction: No
Probability of playing tennis: 0.18


🤔 **Why This Matters:** These real-world applications demonstrate the versatility of Naive Bayes. Its simplicity, efficiency, and effectiveness in various domains make it a valuable tool in a data scientist's toolkit.


While Naive Bayes might not always be the best performing model for these tasks, it often serves as an excellent baseline and can be particularly useful in scenarios with limited computational resources or when quick results are needed.