# **Problem Statement**  
## **5. Implement a simple Naive Bayes classifier using probability rules.**

Implement a Naive Bayes classifier from scratch using basic probability rules.

Given a labeled dataset:
- Compute prior probabilities for each class
- Compute likelihood probabilities for feature values
- Apply Bayes’ Theorem to predict the class of unseen data points
No external ML libraries (like sklearn) are allowed.

### Constraints & Example Inputs/Outputs

- Classification problem (binary or multi-class)
- Categorical features (simplest Naive Bayes)
- Conditional independence assumption holds
- No missing values

Example Input:
```python
X = [
    ["Sunny", "Hot", "High", "Weak"],
    ["Sunny", "Hot", "High", "Strong"],
    ["Overcast", "Hot", "High", "Weak"],
    ["Rain", "Mild", "High", "Weak"]
]

y = ["No", "No", "Yes", "Yes"]

```
Example Prediction Input
```python
["Sunny", "Mild", "High", "Weak"]
```

Expected Output:
```python
Predicted Class → No
```

### Solution Approach

### Naive Bayes Logic

1. Bayes’ Theorem:

   ```python
      P(Class∣X)= P(X∣Class)⋅P(Class) / P(X)
    ```
    
2. Since P(X) is constant for all classes:

   ```python
   P(Class∣X)∝P(Class)⋅∏P(xi​∣Class)
   ```


Steps

1. Compute prior probabilities:

P(Class)

2. Compute likelihood probabilities:

P(feature value∣Class)

3. Multiply probabilities for prediction

4. Choose class with maximum posterior probability

### Solution Code

In [1]:
# Approach1: Brute Force Naive Bayes (Direct Probability Computation)
from collections import defaultdict, Counter

class NaiveBayesBruteForce:
    def __init__(self):
        self.class_priors = {}
        self.likelihoods = {}
        self.classes = set()

    def fit(self, X, y):
        self.classes = set(y)
        total_samples = len(y)
        class_counts = Counter(y)

        # Prior probabilities
        for cls in self.classes:
            self.class_priors[cls] = class_counts[cls] / total_samples

        # Likelihoods
        self.likelihoods = {cls: defaultdict(lambda: defaultdict(int)) for cls in self.classes}

        for features, label in zip(X, y):
            for idx, value in enumerate(features):
                self.likelihoods[label][idx][value] += 1

        # Convert counts to probabilities
        for cls in self.classes:
            for idx in self.likelihoods[cls]:
                total = sum(self.likelihoods[cls][idx].values())
                for val in self.likelihoods[cls][idx]:
                    self.likelihoods[cls][idx][val] /= total

    def predict(self, X):
        predictions = []

        for features in X:
            class_scores = {}

            for cls in self.classes:
                score = self.class_priors[cls]
                for idx, value in enumerate(features):
                    score *= self.likelihoods[cls][idx].get(value, 1e-6)
                class_scores[cls] = score

            predictions.append(max(class_scores, key=class_scores.get))

        return predictions


### Alternative Solution

In [2]:
# Approach2: Optimized Naive Bayes (Log Probabilities – Numerically Stable)
import math

class NaiveBayesOptimized:
    def __init__(self):
        self.class_priors = {}
        self.likelihoods = {}
        self.classes = set()

    def fit(self, X, y):
        self.classes = set(y)
        total_samples = len(y)
        class_counts = Counter(y)

        # Log Priors
        for cls in self.classes:
            self.class_priors[cls] = math.log(class_counts[cls] / total_samples)

        self.likelihoods = {cls: defaultdict(lambda: defaultdict(int)) for cls in self.classes}

        for features, label in zip(X, y):
            for idx, value in enumerate(features):
                self.likelihoods[label][idx][value] += 1

        # Log Likelihoods
        for cls in self.classes:
            for idx in self.likelihoods[cls]:
                total = sum(self.likelihoods[cls][idx].values())
                for val in self.likelihoods[cls][idx]:
                    self.likelihoods[cls][idx][val] = math.log(
                        self.likelihoods[cls][idx][val] / total
                    )

    def predict(self, X):
        predictions = []

        for features in X:
            scores = {}

            for cls in self.classes:
                score = self.class_priors[cls]
                for idx, value in enumerate(features):
                    score += self.likelihoods[cls][idx].get(value, math.log(1e-6))
                scores[cls] = score

            predictions.append(max(scores, key=scores.get))

        return predictions


### Alternative Approaches

- Gaussian Naive Bayes (for numerical features)
- Multinomial Naive Bayes (for text data)
- Bernoulli Naive Bayes (binary features)
- Laplace smoothing for zero-frequency issues

### Test Case

In [6]:
# Test Case 1: Weather Dataset (Classic Example)

X = [
    ["Sunny", "Hot", "High", "Weak"],
    ["Sunny", "Hot", "High", "Strong"],
    ["Overcast", "Hot", "High", "Weak"],
    ["Rain", "Mild", "High", "Weak"],
    ["Rain", "Cool", "Normal", "Weak"],
    ["Rain", "Cool", "Normal", "Strong"],
    ["Overcast", "Cool", "Normal", "Strong"]
]

y = ["No", "No", "Yes", "Yes", "Yes", "No", "Yes"]


model = NaiveBayesOptimized()
model.fit(X, y)

test_sample = [["Sunny", "Mild", "High", "Weak"]]
print("Prediction:", model.predict(test_sample))


Prediction: ['No']


In [4]:
# Test Case 2: Multiple Predictions

X_test = [
    ["Rain", "Cool", "Normal", "Weak"],
    ["Overcast", "Hot", "High", "Strong"]
]

print(model.predict(X_test))


['Yes', 'Yes']


In [5]:
# Test Case 3: Binary Feature Dataset

X = [
    ["Yes", "Yes"],
    ["Yes", "No"],
    ["No", "Yes"],
    ["No", "No"]
]

y = ["True", "True", "False", "False"]

model = NaiveBayesBruteForce()
model.fit(X, y)

print(model.predict([["Yes", "Yes"], ["No", "Yes"]]))


['True', 'False']


## Complexity Analysis

### Time Complexity
- Training: O(n × d)
- Prediction: O(c × d) per sample

### Space Complexity
- O(c × d × v)
where v = unique feature values

#### Thank You!!