<a href="https://colab.research.google.com/github/slyjain/PythonCodes/blob/main/coding_exercise_naive_bayes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**NAIVE BAYES IMPLEMENTATION**

The following code is a simple implementation of the Naive Bayes.

You have a few tasks to finish, they're in the comments. The code is in simple python, and it is an exercise to help you understand the algorithm better, and also give you some coding practice.

Certainly! Here is a quick walkthrough of the code and the tasks:

### Walkthrough of the Code

#### Dataset and Labels
```python
# Given dataset
X = [
    [1, 0, 1],  # Sample 1
    [1, 1, 0],  # Sample 2
    [0, 0, 1],  # Sample 3
    [1, 0, 0],  # Sample 4
    [0, 1, 1],  # Sample 5
    [0, 1, 0],  # Sample 6
    [1, 1, 1],  # Sample 7
    [0, 0, 0],  # Sample 8
    [1, 0, 1],  # Sample 9
    [1, 1, 1],  # Sample 10
    [0, 1, 0],  # Sample 11
    [1, 0, 0],  # Sample 12
    [0, 0, 1],  # Sample 13
    [0, 1, 1],  # Sample 14
    [1, 1, 0],  # Sample 15
]

y = [0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1]  # Labels corresponding to the samples

# Possible values of y
y_values = list(set(y))
```
- `X` is the dataset containing 15 samples, each with three features.
- `y` is the list of labels corresponding to each sample in `X`.
- `y_values` contains the unique values of `y`, which represent the possible classes.

#### Naive Bayes Prediction Function
```python
# Function to predict the class for a given input
def predict_naive_bayes(x_test, X, y, y_values):
    max_prob = -1
    best_class = None
    for y_val in y_values:  # For each value of y - we need P(X|Y) * P(Y)

        # Calculate P(y)
        count_y = ...  # Replace with the count of y_val in y
        p_y = ...  # Replace with count_y divided by the length of y

        # Calculate P(x|y)
        p_x_given_y = 1.0
        for feature_idx in range(len(x_test)):
            count_feature_and_y = 0
            for i in range(len(X)):
                if y[i] == y_val and X[i][feature_idx] == x_test[feature_idx]:
                    count_feature_and_y += 1
            if count_y > 0:
                p_x_given_y *= ...  # Replace with the conditional probability calculation

        # Calculate P(y|x) = P(x|y) * P(y)
        p_y_given_x = ...  # Replace with p_x_given_y multiplied by p_y

        # Update max_prob and best_class
        if p_y_given_x > max_prob:
            max_prob = p_y_given_x
            best_class = y_val
    return best_class
```
- `predict_naive_bayes` is the function that will predict the class of a test input `x_test`.
- `max_prob` and `best_class` are used to keep track of the highest posterior probability and the corresponding class.

#### Test Input and Prediction
```python
# Test input
X_test = [0, 1, 1]

# Predicting the class for the test input
prediction = predict_naive_bayes(X_test, X, y, y_values)
print("Predicted class:", prediction)
```
- `X_test` is the input sample for which we want to predict the class.
- `predict_naive_bayes` is called with `X_test`, the dataset `X`, the labels `y`, and the possible values of `y`.
- The predicted class is printed.

### Reference for Naive Bayes Classifier

#### Steps for Naive Bayes Classifier

1. **Calculate Prior Probability \( P(y) \)**:
   - Count the occurrences of each class in `y`.
   - Divide the count by the total number of samples to get \( P(y) \).

2. **Calculate Likelihood \( P(x|y) \)**:
   - For each feature in the test sample, calculate the probability of that feature given the class label.
   - Count how many times the feature value appears in the training data for the given class.
   - Divide this count by the number of samples with the given class label.

3. **Calculate Posterior Probability \( P(y|x) \)**:
   - Multiply the prior probability \( P(y) \) by the likelihood \( P(x|y) \) for each class.
   - This gives the posterior probability \( P(y|x) \).

4. **Predict the Class**:
   - The class with the highest posterior probability is chosen as the prediction.

### Tasks for Students

1. **Calculate \( P(y) \)**:
   - Implement the calculation of the prior probability.

2. **Calculate \( P(x|y) \)**:
   - Implement the calculation of the likelihood for each feature given the class.

3. **Calculate \( P(y|x) \)**:
   - Implement the calculation of the posterior probability by multiplying the prior and likelihood.

4. **Update max_prob and best_class**:
   - Compare the calculated posterior probability with the current maximum and update if it's higher.

This walkthrough provides a clear guide on how to complete the Naive Bayes classifier and understand its core concepts.

In [None]:
# Skeleton code for Naive Bayes Classifier

# Given dataset
X = [
    [1, 0, 1],  # Sample 1
    [1, 1, 0],  # Sample 2
    [0, 0, 1],  # Sample 3
    [1, 0, 0],  # Sample 4
    [0, 1, 1],  # Sample 5
    [0, 1, 0],  # Sample 6
    [1, 1, 1],  # Sample 7
    [0, 0, 0],  # Sample 8
    [1, 0, 1],  # Sample 9
    [1, 1, 1],  # Sample 10
    [0, 1, 0],  # Sample 11
    [1, 0, 0],  # Sample 12
    [0, 0, 1],  # Sample 13
    [0, 1, 1],  # Sample 14
    [1, 1, 0],  # Sample 15
]

y = [0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1]  # Labels corresponding to the samples

# Possible values of y
y_values = list(set(y))

# Function to predict the class for a given input
def predict_naive_bayes(x_test, X, y, y_values):
    max_prob = -1
    best_class = None
    for y_val in y_values:  # For each value of y - we need P(X|Y) * P(Y)

        # Task 1: Calculate P(y)
        # Hint: P(y) is the number of times y_val appears in y divided by the total number of samples
        count_y = len(y_values)  # Replace with the count of y_val in y
        p_y = count_y/len(y)  # Replace with count_y divided by the length of y

        # Task 2: Calculate P(x|y)
        # Hint: Loop through each feature and calculate the conditional probability
        p_x_given_y = 1.0
        for feature_idx in range(len(x_test)):
            count_feature_and_y = 0
            for i in range(len(X)):
                if y[i] == y_val and X[i][feature_idx] == x_test[feature_idx]:
                    count_feature_and_y += 1
            if count_y > 0:
                p_x_given_y *= count_feature_and_y/len(X)  # Replace with the conditional probability calculation

        # Task 3: Calculate P(y|x) = P(x|y) * P(y)
        p_y_given_x = p_x_given_y*p_y  # Replace with p_x_given_y multiplied by p_y

        # Task 4: Update max_prob and best_class
        if p_y_given_x > max_prob:
            max_prob = p_y_given_x
            best_class = y_val
    return best_class

# Test input
X_test = [0, 1, 1]

# Predicting the class for the test input
prediction = predict_naive_bayes(X_test, X, y, y_values)
print("Predicted class:", prediction)


**SOLUTION**

In [None]:
# Given dataset
X = [
    [1, 0, 1],  # Sample 1
    [1, 1, 0],  # Sample 2
    [0, 0, 1],  # Sample 3
    [1, 0, 0],  # Sample 4
    [0, 1, 1],  # Sample 5
    [0, 1, 0],  # Sample 6
    [1, 1, 1],  # Sample 7
    [0, 0, 0],  # Sample 8
    [1, 0, 1],  # Sample 9
    [1, 1, 1],  # Sample 10
    [0, 1, 0],  # Sample 11
    [1, 0, 0],  # Sample 12
    [0, 0, 1],  # Sample 13
    [0, 1, 1],  # Sample 14
    [1, 1, 0],  # Sample 15
]

y = [0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1]  # Labels corresponding to the samples

# Possible values of y
y_values = list(set(y))

# Function to predict the class for a given input
def predict_naive_bayes(x_test, X, y, y_values):
      max_prob = -1
      best_class = None
      for y_val in y_values: #For each value of y - we need P(X|Y) * P(Y)

          # Calculate P(y)
          count_y = y.count(y_val)
          p_y = count_y / len(y)

          # Calculate P(x|y)
          p_x_given_y = 1.0
          for feature_idx in range(len(x_test)):
              count_feature_and_y = 0
              for i in range(len(X)):
                if y[i] == y_val and X[i][feature_idx] == x_test[feature_idx]:
                    count_feature_and_y += 1
              if count_y > 0:
                  p_x_given_y *= float(count_feature_and_y) / float(count_y)
                    # print(count_feature_and_y, count_y)
              # print(p_x_given_y, count_y)
            # Calculate P(y|x) = P(x|y) * P(y)
          p_y_given_x = p_x_given_y * p_y

          if p_y_given_x > max_prob:
              max_prob = p_y_given_x
              best_class = y_val
      return best_class

# Test input
X_test = [0, 1, 1]

# Predicting the class for the test input
prediction = predict_naive_bayes(X_test, X, y, y_values)
print("Predicted classes:", prediction)
