<a href="https://colab.research.google.com/github/pravinkr05/Data-Mining-Project/blob/main/Classifier%20Algorithms/Bayesian_Classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The steps of Naive Bayes Classifier algorithms:

1. **Data Prep**: Get your dataset ready.

2. **Class Probabilities**:
   - Calculate how often each class occurs:
     - P(C) = (Number of instances with class C) / (Total number of instances)
     

3. **Feature Probabilities**:
   - For each feature, find out how often it appears in each class:
     - P(X_i | C) = (Number of instances with feature value X_i and class C) / (Number of instances with class C)

4. **Predictions**:
   - Use Bayes' theorem to calculate the probability of each class for a new data point:
     - ( P(C | x) = (P(C) * P(x | C)) / P(x)
     - Here,  P(x)  is constant for all classes, so we compare  P(C) * P(x | C)  directly.
   - Pick the class with the highest probability as the predicted class for the new data point.

5. **Evaluation**: Test the model with some data you've held back and see how well it does.


Let's consider a example where we have a dataset of weather conditions (sunny, rainy) and corresponding labels indicating whether people went for a picnic (yes, no).

| Weather   | Picnic |
|-----------|--------|
| Sunny     | Yes    |
| Rainy     | No     |
| Sunny     | Yes    |
| Sunny     | No     |
| Rainy     | Yes    |

Now, let's say we want to predict whether people will go for a picnic given the weather condition "Sunny".

Here are the steps:

1. **Calculate Class Probabilities**: Calculate the prior probability of each class (Picnic = Yes, Picnic = No).
   - P(Picnic = Yes) = 3/5
   - P(Picnic = No) = 2/5

2. **Calculate Conditional Probabilities**: For each feature (Weather), calculate the conditional probability of each value (Sunny, Rainy) given each class (Picnic = Yes, Picnic = No).
   - P(Weather = Sunny | Picnic = Yes) = 2/3
   - P(Weather = Sunny | Picnic = No) = 1/2
   - P(Weather = Rainy | Picnic = Yes) = 1/3
   - P(Weather = Rainy | Picnic = No) = 1/2

3. **Make Predictions**: For the new data point (Weather = Sunny), calculate the posterior probability of each class given the weather condition using Bayes' theorem and the conditional probabilities calculated earlier.
   - P(Picnic = Yes | Weather = Sunny) ∝ P(Picnic = Yes) * P(Weather = Sunny | Picnic = Yes)
   - P(Picnic = No | Weather = Sunny) ∝ P(Picnic = No) * P(Weather = Sunny | Picnic = No)

   Normalize the probabilities to make them sum to 1, and predict the class with the highest posterior probability.

4. **Evaluate Model**: Test the model on a separate testing dataset and evaluate its performance using metrics like accuracy, precision, recall, or F1-score.


Gaussian Naive Bayes Classifier for Discrete Classes:

In [None]:
import numpy as np

# Step 1: Prepare the Dataset
# Let's generate a simple dataset for classification
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [1, 5], [2, 4], [3, 3], [4, 2], [5, 1]])
y = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])  # Binary classes: 0 and 1

# Step 2: Implement the Gaussian Naive Bayes Classifier
class GaussianNaiveBayesClassifier:
    def __init__(self):
        self.class_probs = None
        self.mean = None
        self.std = None

    def fit(self, X, y):
        # Calculate prior probabilities of each class
        self.class_probs = np.bincount(y) / len(y)

        # Calculate mean and standard deviation for each feature in each class
        self.mean = np.array([X[y == c].mean(axis=0) for c in np.unique(y)])
        self.std = np.array([X[y == c].std(axis=0) for c in np.unique(y)])

    def predict(self, X):
        # Calculate likelihood for each class
        likelihood = np.exp(-0.5 * ((X[:, None] - self.mean) / self.std) ** 2) / (np.sqrt(2 * np.pi) * self.std)

        # Calculate posterior probabilities using Bayes' theorem
        posterior = likelihood.prod(axis=2) * self.class_probs

        # Make predictions based on maximum posterior probability
        y_pred = np.argmax(posterior, axis=1)
        return y_pred

# Step 3: Train the Model
# Instantiate the Gaussian Naive Bayes classifier
gnb_cls = GaussianNaiveBayesClassifier()

# Train the classifier
gnb_cls.fit(X, y)

# Step 4: Test the Model
# Make predictions on the testing set (same as training set for simplicity)
y_pred = gnb_cls.predict(X)

# Step 5: Evaluate the Model
from sklearn.metrics import accuracy_score

# Calculate accuracy
accuracy = accuracy_score(y, y_pred)
print("Accuracy:", accuracy)


Accuracy: 0.6


Gaussian Naive Bayes Regressor for Continuous Values:

In [None]:
# Step 1: Prepare the Dataset
# Let's generate a simple dataset for regression
np.random.seed(0)
X = np.random.rand(100, 1)  # Feature
y = 2 * X.squeeze() + np.random.randn(100)  # Continuous target

# Step 2: Implement the Gaussian Naive Bayes Regressor
class GaussianNaiveBayesRegressor:
    def __init__(self):
        self.mean = None
        self.std = None

    def fit(self, X, y):
        # Calculate mean and standard deviation of the target variable
        self.mean = np.mean(y)
        self.std = np.std(y)

    def predict(self, X):
        # Calculate likelihood for each data point using Gaussian distribution
        likelihood = (1 / (np.sqrt(2 * np.pi) * self.std)) * np.exp(-(X - self.mean) ** 2 / (2 * self.std ** 2))

        # No prior probabilities involved in regression

        # Posterior probability is proportional to likelihood in regression

        # Make predictions based on likelihood
        y_pred = likelihood
        return y_pred

# Step 3: Train the Model
# Instantiate the Gaussian Naive Bayes regressor
gnb_reg = GaussianNaiveBayesRegressor()

# Train the regressor
gnb_reg.fit(X, y)

# Step 4: Test the Model
# Make predictions on the testing set (same as training set for simplicity)
y_pred = gnb_reg.predict(X)

# Step 5: Evaluate the Model
# No evaluation needed for regression in this simple example


Gaussian Naive Bayes Classifier and regression using scikit-learn:

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Step 1: Prepare the Dataset
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Step 2: Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Train the Model
# Instantiate the Gaussian Naive Bayes classifier
gnb_cls = GaussianNB()

# Train the classifier
gnb_cls.fit(X_train, y_train)

# Step 4: Test the Model
# Make predictions on the testing set
y_pred = gnb_cls.predict(X_test)

# Step 5: Evaluate the Model
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 1.0
