# Understanding Naive Bays Classification

Overview of Naive Bayes
Naive Bayes is a probabilistic classifier based on Bayes' theorem, assuming features are __conditionally independent__ given the class label. For a given sample with features $x_1, x_2, \dots x_n $ , the classifier predicts the class $y$ that maximizes the posterior probability:

$
P(y \mid x_1, x_2) \propto P(y) \cdot P(x_1\mid y) \cdot P(x_2\mid y)
$

Where:
  - $P(y)$ Prior probability of class $y$
  - $P(x_i \mid y)$: liklihood of $x_i$ give $y$


For continuous features, Gaussian Naive Bayes assumes each feature follows a normal (Gaussian) distribution This allows us to predict $P(x_i \mid y)$ using the Gaussian probability density function. 

### Training
During training we calculate for each $y = (0,1)$:
 - the mean ($\mu_{x_1,y},\mu_{x_2,y},\dots$)
 - variance ($\sigma_{x_1,y},\sigma_{x_2,y},\dots$) for each feature for each output class.
 - Class priors $P(y=0)$ and $P(y=1)$

The above calculations are stored as class parameters.
## Prediction

During prediction the parameters are used to compute the posterior for each class:

$
P(x_i\mid y=c) = \frac{1}{\sqrt{2\pi\sigma_{i,y}^2}} \exp\left(-\frac{(x_i - \mu_{i,y})^2}{2\sigma_{i,y}^2}\right)
$

Where, these are calculated during training:
 - $\mu_{i,y}$: Mean of feature $x_i$ for class $y$.
 - $\sigma_{i,y}^2$: Variance of feature $x_i$ for class $y$.

 Then we use the formulae above $P(y \mid x_1, x_2)$ to calculate overall probablity of class outcome. After that the result is normalized amomg class outcomes as we did not use demoninator above.

In [47]:
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np

### Prepare Dataset

Keep only first two features.

In [48]:
data = load_breast_cancer()
X = data.data[:, :2] # Only two features for experiment
y = data.target
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.3, random_state=242, stratify=y)

## Preparation

The training includes calculating mean and variance for each feature class. We compare these by manually calculating these and using `GaussianNB` logistic class.

In [49]:
# Calculate mean and variance for each feature for each class
mu_0 = X_train[y_train == 0].mean(axis=0)
mu_1 = X_train[y_train == 1].mean(axis=0)

sigma_0 = X_train[y_train == 0].std(axis=0)
sigma_1 = X_train[y_train == 1].std(axis=0)

# prior probabilities
P_1 = y_train.mean()
P_0 = 1 - P_1

# Calculate posterior probabilities
print(f"Prior probabilities: P(y=0) = {P_0:.4f}, P(y=1) = {P_1:.4f}")
print(f"Mean values: mu_0 = {mu_0}, mu_1 = {mu_1}")
print(f"Standard deviation: sigma_0 = {sigma_0}, sigma_1 = {sigma_1}")


Prior probabilities: P(y=0) = 0.3719, P(y=1) = 0.6281
Mean values: mu_0 = [0.88464823 0.44463023], mu_1 = [-0.58503963 -0.32430205]
Standard deviation: sigma_0 = [0.91768878 0.73103412], sigma_1 = [0.52567253 0.92684928]


In [50]:
# Create Gaussian Naive Bayes model
model = GaussianNB()
model.fit(X_train, y_train)

# Print parameters
print(f"Classes: {model.classes_}")
print(f"Class priors: P(y=0) = {model.class_prior_[0]:.4f}, P(y=1) = {model.class_prior_[1]:.4f}")
print(f"Class means: mu_0 = {model.theta_[0]}, mu_1 = {model.theta_[1]}")
std_dev = np.sqrt(model.var_)
print(f"Class standard deviations: sigma_0 = {std_dev[0]}, sigma_1 = {std_dev[1]}")


Classes: [0 1]
Class priors: P(y=0) = 0.3719, P(y=1) = 0.6281
Class means: mu_0 = [0.88464823 0.44463023], mu_1 = [-0.58503963 -0.32430205]
Class standard deviations: sigma_0 = [0.91768878 0.73103412], sigma_1 = [0.52567253 0.92684928]


## Calculate Predictions Probablilities

This section manually calculates the probabilities for each test obervations. And compare them to those calculated by `GaussianNB()` class.

In [70]:
xt_0 = X_test[:, 0] # Feature 0
xt_1 = X_test[:, 1] # Feature 1
# Calculate probabilities for each class, for each feature
# Calculate P(x_i|y=0)
prob_0_0 = np.exp(-(xt_0 - mu_0[0])**2 / (2 * sigma_0[0]**2)) / (sigma_0[0] * np.sqrt(2 * np.pi))
prob_0_1 = np.exp(-(xt_1 - mu_0[1])**2 / (2 * sigma_0[1]**2)) / (sigma_0[1] * np.sqrt(2 * np.pi))

# Calculate P(x_i|y=1)
prob_1_0 = np.exp(-(xt_0 - mu_1[0])**2 / (2 * sigma_1[0]**2)) / (sigma_1[0] * np.sqrt(2 * np.pi))
prob_1_1 = np.exp(-(xt_1 - mu_1[1])**2 / (2 * sigma_1[1]**2)) / (sigma_1[1] * np.sqrt(2 * np.pi))

# P = P(y=0) * P(x_i|y=0) * P(x_j|y=0)
prob_0_combined = prob_0_0 * prob_0_1 * P_0
prob_1_combined = prob_1_0 * prob_1_1 * P_1
prob_combined = np.column_stack((prob_0_combined, prob_1_combined))

# Normalize to get probabilities
prob_combined_normalized = prob_combined / prob_combined.sum(axis=1, keepdims=True)
prob_combined_normalized[:3]

array([[1.65574824e-01, 8.34425176e-01],
       [9.99896641e-01, 1.03359317e-04],
       [7.77212714e-01, 2.22787286e-01]])

In [71]:
# Make predictions
y_pred = model.predict_proba(X_test)

y_pred[:3]

array([[1.65574824e-01, 8.34425176e-01],
       [9.99896641e-01, 1.03359321e-04],
       [7.77212714e-01, 2.22787286e-01]])

In [69]:
np.allclose(y_pred, prob_combined_normalized)

True

__Remark:__ You can check to see that the prediction probabilities for each class match