# Naive Bayes: Probability-Based Classification

Welcome to the tenth notebook in our **Machine Learning Basics for Beginners** series! After exploring K-Means Clustering for unsupervised learning, let's dive into **Naive Bayes**, a simple yet powerful supervised learning algorithm used for classification based on probability.

**What You'll Learn in This Notebook:**
- What Naive Bayes is and when to use it.
- How Naive Bayes works in simple terms.
- A hands-on example of classifying text messages as spam or not spam.
- An interactive exercise to classify new messages and adjust features.
- Visualizations to understand the probability distributions.

Let's get started!

## 1. What is Naive Bayes?

**Naive Bayes** is a supervised learning algorithm used for classification tasks. It is based on Bayes’ Theorem, a fundamental concept in probability, and is particularly effective for problems involving text data.

- **Goal**: Predict the category (class) of a data point by calculating the probability of it belonging to each class based on its features, and choosing the class with the highest probability.
- **When to Use It**: Use Naive Bayes for classification tasks, especially with text data, such as spam detection, sentiment analysis, or document categorization. It works well with high-dimensional data and small datasets.
- **Examples**:
  - Classifying emails or text messages as spam or not spam based on word content.
  - Determining if a movie review is positive or negative based on the words used.
  - Categorizing news articles into topics like sports, politics, or technology.

**Analogy**: Imagine you’re a librarian trying to decide if a book belongs to the mystery or romance genre based on certain keywords. You’ve noticed that words like "detective" often appear in mystery books, while "love" appears in romance. You calculate which genre is more likely based on the words in the book. Naive Bayes does this by using probabilities of features (like words) to predict a class (like genre).

## 2. How Does Naive Bayes Work?

Naive Bayes might sound complex because of its probabilistic nature, but the idea is straightforward. It uses Bayes’ Theorem to calculate the likelihood of a class given the features. Let’s break it down:

1. **Bayes’ Theorem**: This theorem helps us calculate the probability of a class (e.g., spam) given the features (e.g., words in a message). It combines:
   - The prior probability of the class (how common spam is overall).
   - The likelihood of the features given the class (how often certain words appear in spam).
   - The overall probability of the features (how common the words are in general).
   Mathematically, for a class C and features F: P(C|F) = [P(F|C) * P(C)] / P(F)
2. **'Naive' Assumption**: Naive Bayes assumes that all features are independent of each other given the class. This is often not true in reality (e.g., words in a sentence are related), but this simplification makes the computation much faster and often still works well.
3. **Training**: During training, the algorithm learns:
   - The probability of each class (e.g., fraction of messages that are spam).
   - The probability of each feature given each class (e.g., how often "free" appears in spam vs. not spam).
4. **Prediction**: For a new data point, it calculates the probability of each class given the features and picks the class with the highest probability.
5. **Types of Naive Bayes**: There are variations like Gaussian (for continuous data), Multinomial (for discrete counts like word frequencies), and Bernoulli (for binary features).

**Analogy**: Think of Naive Bayes as a detective guessing if a suspect committed a crime. The detective knows how often crimes happen (prior probability), how often certain clues appear in crimes (likelihood), and uses these to estimate the chance the suspect is guilty. The 'naive' part is assuming clues like footprints and fingerprints don’t influence each other, even if they might.

**Key Advantage**: Naive Bayes is fast, easy to implement, and works surprisingly well for text classification, even with the naive independence assumption. It’s also good at handling high-dimensional data like word counts.

## 3. Example: Classifying Text Messages as Spam or Not Spam

Let’s see Naive Bayes in action with a small dataset of text messages labeled as spam or not spam. We’ll focus on a few key words as features to predict the class.

**Dataset** (simplified):
- Messages: ["win free money", "call now free", "hi meet today", "see you soon", "free prize now"]
- Labels: Spam (1), Spam (1), Not Spam (0), Not Spam (0), Spam (1)
- Features: Presence of words like "free", "win", "money", "call", "now", etc.

We’ll use Python’s `scikit-learn` library with a Multinomial Naive Bayes model to classify messages based on word frequencies. Focus on the steps and output, not the code details.

**Instructions**: Run the code below to see how Naive Bayes classifies messages and predicts a new message.

In [None]:
# Import necessary libraries
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
import matplotlib.pyplot as plt
import seaborn as sns

# Our small dataset of messages
messages = ["win free money", "call now free", "hi meet today", "see you soon", "free prize now"]
labels = np.array([1, 1, 0, 0, 1])  # 1 for Spam, 0 for Not Spam

# Convert text to word frequency features
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(messages)
feature_names = vectorizer.get_feature_names_out()
print(f"Features (words) used for classification: {feature_names}")
print(f"Feature matrix (word counts per message):\n{X.toarray()}")

# Train the Multinomial Naive Bayes model
model = MultinomialNB()
model.fit(X, labels)

# Predict for a new message
new_message = ["win free prize"]
new_X = vectorizer.transform(new_message)
prediction = model.predict(new_X)[0]
probabilities = model.predict_proba(new_X)[0]
print(f"New Message: '{new_message[0]}'")
print(f"Predicted Class: {'Spam' if prediction == 1 else 'Not Spam'}")
print(f"Probability of Not Spam: {probabilities[0]:.2f}")
print(f"Probability of Spam: {probabilities[1]:.2f}")

# Visualize the probabilities for the new message
plt.figure(figsize=(6, 4))
sns.barplot(x=['Not Spam', 'Spam'], y=probabilities, palette='Blues')
plt.title(f"Probability Distribution for '{new_message[0]}'")
plt.ylabel('Probability')
plt.ylim(0, 1)
plt.show()

print("Look at the plot above:")
print("- The bars show the model’s confidence (probability) for each class.")
print("- A taller bar indicates a higher likelihood of the message belonging to that class.")

## 4. Interactive Exercise: Classify Your Own Message

Now it’s your turn to experiment with Naive Bayes! In this exercise, you can type a short message and see if the model classifies it as spam or not spam. You’ll also see the probability distribution for each class.

**Instructions**:
- Run the code below.
- Enter a short message (e.g., "free offer now" or "let’s meet soon").
- Observe the predicted class and the probabilities for spam and not spam.

In [None]:
# Interactive exercise for Naive Bayes
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
import matplotlib.pyplot as plt
import seaborn as sns

print("Welcome to the 'Classify Your Message' Exercise!")
print("You’ll type a message and see if it’s classified as Spam or Not Spam.")

# Original dataset
messages = ["win free money", "call now free", "hi meet today", "see you soon", "free prize now"]
labels = np.array([1, 1, 0, 0, 1])  # 1 for Spam, 0 for Not Spam

# Convert text to word frequency features
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(messages)

# Train the model
model = MultinomialNB()
model.fit(X, labels)

# Ask user for a new message
user_message = input("Enter a short message (e.g., 'free offer now' or 'let’s meet soon'): ")
if not user_message.strip():
    user_message = "free offer now"
    print("No input provided. Defaulting to 'free offer now'.")

new_message = [user_message]
new_X = vectorizer.transform(new_message)
prediction = model.predict(new_X)[0]
probabilities = model.predict_proba(new_X)[0]
print(f"Your Message: '{user_message}'")
print(f"Predicted Class: {'Spam' if prediction == 1 else 'Not Spam'}")
print(f"Probability of Not Spam: {probabilities[0]:.2f}")
print(f"Probability of Spam: {probabilities[1]:.2f}")

# Visualize the probabilities
plt.figure(figsize=(6, 4))
sns.barplot(x=['Not Spam', 'Spam'], y=probabilities, palette='Blues')
plt.title(f"Probability Distribution for '{user_message}'")
plt.ylabel('Probability')
plt.ylim(0, 1)
plt.show()

print("Look at the plot above:")
print("- The bars show the model’s confidence (probability) for each class.")
print("- A taller bar indicates a higher likelihood of the message belonging to that class.")

## 5. Key Considerations for Naive Bayes

Naive Bayes is a fast and effective algorithm for classification, especially with text data, but it comes with some considerations to keep in mind:

- **Independence Assumption**: The 'naive' assumption that features are independent given the class is often unrealistic (e.g., words in a sentence are related). Despite this, it often performs well in practice, especially for text classification.
- **Zero Probability Problem**: If a feature (e.g., a word) in a new data point never appeared in training for a class, its probability becomes zero, which can skew predictions. Techniques like Laplace smoothing (adding a small count to all features) are used to avoid this.
- **Works Best with Categorical or Count Data**: Naive Bayes, especially Multinomial, excels with discrete data like word counts. For continuous data, Gaussian Naive Bayes can be used, but it assumes features follow a normal distribution, which may not always hold.
- **Sensitive to Imbalanced Data**: If one class is much more common in the training data, the model might overly favor that class. Balancing the dataset or adjusting priors can help.

**Analogy**: Naive Bayes is like guessing someone’s favorite food based on ingredients they mention. If they say "tomato" and "cheese", you might guess pizza, assuming these ingredients don’t influence each other (naive). If they mention something rare you’ve never heard of (zero probability), you’re stuck unless you guess a little likelihood for everything (smoothing). If they always talk about pizza, you might over-predict pizza unless you balance your guesses.

Despite these limitations, Naive Bayes is a go-to algorithm for quick and effective classification, particularly in natural language processing tasks.

## 6. Key Takeaways

- **Naive Bayes** is a supervised learning algorithm for classification that uses Bayes’ Theorem to predict class probabilities based on features.
- It works by calculating the likelihood of features given classes, assuming feature independence (the 'naive' part), and picking the most probable class.
- Use it for tasks like spam detection, sentiment analysis, or document classification, especially with text data and high-dimensional features.
- Be aware of limitations: the independence assumption isn’t always realistic, zero probabilities need smoothing, and it can be biased by imbalanced data.

You’ve now learned a probabilistic approach to classification! Naive Bayes introduces the power of using probability to make decisions, which is especially useful for text-based problems.

**What's Next?**
Move on to **Notebook 11: Evaluating Models** in Part 3: Intermediate Concepts to learn how to assess the performance of machine learning models. See you there!