### **Naive Bayes Classifier: An Overview**

The **Naive Bayes classifier** is a probabilistic classification algorithm based on **Bayes' Theorem**, with the **naive assumption** that the features are conditionally independent given the class. Despite this simplifying assumption, Naive Bayes works remarkably well for many real-world applications, particularly when dealing with large datasets, such as in text classification or spam filtering.

Bayes' Theorem states:
\[
P(C|X) = \frac{P(X|C) P(C)}{P(X)}
\]
Where:
- \( P(C|X) \) is the **posterior probability** of the class \( C \) given the features \( X \).
- \( P(X|C) \) is the **likelihood** of the features \( X \) given the class \( C \).
- \( P(C) \) is the **prior probability** of the class \( C \).
- \( P(X) \) is the **evidence** or the probability of the features \( X \).

### **Steps for Naive Bayes Classifier**
1. **Assume Conditional Independence**:
   We assume that the features \( X_1, X_2, \dots, X_n \) are conditionally independent given the class label \( C \). This is the "naive" assumption.
   
   So, we can write:
   \[
   P(X|C) = P(X_1|C) \cdot P(X_2|C) \cdot \dots \cdot P(X_n|C)
   \]

2. **Calculate the Prior Probability \( P(C) \)**:
   The prior is the probability of each class in the dataset.

3. **Calculate the Likelihood \( P(X|C) \)**:
   For each feature \( X_i \), calculate the conditional probability \( P(X_i|C) \).

4. **Calculate the Posterior \( P(C|X) \)**:
   Using Bayes' theorem, we calculate the posterior probability of each class \( C \) given the input features \( X \).

5. **Make the Prediction**:
   The class with the highest posterior probability is predicted.

### **Example**

Let's consider a dataset of weather conditions and whether or not someone will play tennis based on that weather:
- **Features**: Outlook (Sunny, Overcast, Rain), Temperature (Hot, Mild, Cool), Humidity (High, Low)
- **Class Label**: PlayTennis (Yes, No)

Here's a simple dataset:

| Outlook   | Temperature | Humidity | PlayTennis |
|-----------|-------------|----------|------------|
| Sunny     | Hot         | High     | No         |
| Sunny     | Hot         | High     | No         |
| Overcast  | Hot         | High     | Yes        |
| Rain      | Mild        | High     | Yes        |
| Rain      | Cool        | Low      | Yes        |
| Rain      | Cool        | Low      | No         |
| Overcast  | Cool        | Low      | Yes        |
| Sunny     | Mild        | High     | No         |
| Sunny     | Cool        | Low      | Yes        |
| Rain      | Mild        | Low      | Yes        |

### **Step-by-Step Approach**:

1. **Calculate Prior Probabilities**: The probability of "Yes" and "No" based on the classes in the training set.
   
2. **Calculate Likelihood for Each Feature**: The likelihood for each feature (e.g., Outlook = Sunny) given the class (e.g., PlayTennis = Yes).

3. **Apply Bayes' Theorem**: For a given test instance, apply Bayes' theorem to compute the posterior probability for each class ("Yes" and "No").

4. **Prediction**: Choose the class with the highest posterior probability.

### **Naive Bayes Classifier Code from Scratch**

```python
import numpy as np
from collections import defaultdict

# Sample Data (features + class)
data = [
    ['Sunny', 'Hot', 'High', 'No'],
    ['Sunny', 'Hot', 'High', 'No'],
    ['Overcast', 'Hot', 'High', 'Yes'],
    ['Rain', 'Mild', 'High', 'Yes'],
    ['Rain', 'Cool', 'Low', 'Yes'],
    ['Rain', 'Cool', 'Low', 'No'],
    ['Overcast', 'Cool', 'Low', 'Yes'],
    ['Sunny', 'Mild', 'High', 'No'],
    ['Sunny', 'Cool', 'Low', 'Yes'],
    ['Rain', 'Mild', 'Low', 'Yes']
]

# Convert to numpy array
data = np.array(data)

# Features and Target (Class)
X = data[:, :-1]  # All columns except the last one
y = data[:, -1]   # Last column (target)

# Step 1: Calculate prior probabilities P(C)
def calculate_prior(y):
    class_counts = defaultdict(int)
    for label in y:
        class_counts[label] += 1
    total_samples = len(y)
    prior = {label: count / total_samples for label, count in class_counts.items()}
    return prior

# Step 2: Calculate likelihoods P(X_i | C)
def calculate_likelihood(X, y):
    feature_likelihood = defaultdict(lambda: defaultdict(int))
    feature_count = defaultdict(int)
    
    for i in range(len(X)):
        label = y[i]
        for j in range(X.shape[1]):
            feature_likelihood[label][(X[i, j], j)] += 1
            feature_count[label] += 1
    
    # Convert to probabilities
    for label in feature_likelihood:
        for feature, count in feature_likelihood[label].items():
            feature_likelihood[label][feature] = count / feature_count[label]
    
    return feature_likelihood, feature_count

# Step 3: Predict class labels
def predict(X, prior, feature_likelihood, feature_count):
    predictions = []
    
    for x in X:
        class_probabilities = {}
        
        for label in prior:
            probability = np.log(prior[label])  # log(P(C))
            
            # Multiply by feature likelihoods
            for j in range(len(x)):
                feature = (x[j], j)
                if feature in feature_likelihood[label]:
                    probability += np.log(feature_likelihood[label][feature])
                else:
                    # If feature not found, use smoothing (Assuming probability 1/(count+1))
                    probability += np.log(1 / (feature_count[label] + len(np.unique(X[:, j]))))
            
            class_probabilities[label] = probability
        
        # Choose the class with the highest probability
        predicted_class = max(class_probabilities, key=class_probabilities.get)
        predictions.append(predicted_class)
    
    return predictions

# Training phase
prior = calculate_prior(y)
feature_likelihood, feature_count = calculate_likelihood(X, y)

# Prediction phase (Example test)
X_test = np.array([
    ['Sunny', 'Cool', 'High'],  # Predict if it will play tennis
    ['Overcast', 'Mild', 'Low']
])

# Predict the class labels for the test samples
predictions = predict(X_test, prior, feature_likelihood, feature_count)

print("Predictions:", predictions)
```

### **Explanation of the Code**

1. **Prior Probability**:
   - We calculate the prior probability \( P(C) \) for each class. This is the proportion of each class in the dataset.
   
   ```python
   def calculate_prior(y):
       class_counts = defaultdict(int)
       for label in y:
           class_counts[label] += 1
       total_samples = len(y)
       prior = {label: count / total_samples for label, count in class_counts.items()}
       return prior
   ```

2. **Likelihood**:
   - We calculate the likelihood \( P(X_i | C) \), the probability of a feature \( X_i \) given a class \( C \). This is done by counting the occurrences of each feature value for each class and dividing by the total occurrences of that class.
   
   ```python
   def calculate_likelihood(X, y):
       feature_likelihood = defaultdict(lambda: defaultdict(int))
       feature_count = defaultdict(int)
       
       for i in range(len(X)):
           label = y[i]
           for j in range(X.shape[1]):
               feature_likelihood[label][(X[i, j], j)] += 1
               feature_count[label] += 1
       
       for label in feature_likelihood:
           for feature, count in feature_likelihood[label].items():
               feature_likelihood[label][feature] = count / feature_count[label]
       
       return feature_likelihood, feature_count
   ```

3. **Prediction**:
   - We use the **logarithm** of the probabilities to avoid dealing with very small numbers. For each test instance, we compute the class probability using Bayes' Theorem, then predict the class with the highest posterior probability.
   
   ```python
   def predict(X, prior, feature_likelihood, feature_count):
       predictions = []
       
       for x in X:
           class_probabilities = {}
           
           for label in prior:
               probability = np.log(prior[label])
               
               for j in range(len(x)):
                   feature = (x[j], j)
                   if feature in feature_likelihood[label]:
                       probability += np.log(feature_likelihood[label][feature])
                   else:
                       probability += np.log(1 / (feature_count[label] + len(np.unique(X[:, j]))))
               
               class_probabilities[label] = probability
           
           predicted_class = max(class_probabilities, key=class_probabilities.get)
           predictions.append(predicted_class)
       
       return predictions
   ```

### **Example Output**:

```python
Predictions: ['No', 'Yes']
```

In this example:
- The test data `['Sunny', 'Cool', 'High']` is predicted as "No".
- The test data `['Overcast', 'Mild', 'Low']` is predicted as "Yes".

