In [3]:
from sklearn import datasets
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

Gaussian Naive Bayes (GNB) is a variant of the Naive Bayes classifier that assumes the likelihood of features to be Gaussian, meaning that they follow a normal distribution. Here's how Gaussian Naive Bayes classification works:

### 1. Naive Bayes Algorithm Overview:
Naive Bayes is a probabilistic classifier based on Bayes' theorem, which calculates the probability of a class given a set of features. The "naive" assumption in Naive Bayes is that the features are conditionally independent given the class, meaning that the presence of one feature does not affect the presence of another feature.

### 2. Gaussian Naive Bayes:
In Gaussian Naive Bayes, it is assumed that the continuous features in the dataset follow a Gaussian (normal) distribution. This assumption simplifies the calculation of the likelihood probabilities.

### 3. Training Phase:
During the training phase, GNB estimates the parameters of the Gaussian distribution (mean and variance) for each feature in each class. Specifically:
- For each class, GNB calculates the mean and variance of each feature based on the training data belonging to that class.
- These parameters are used to model the likelihood of each feature given the class.

### 4. Prediction Phase:
During the prediction phase, GNB calculates the posterior probability of each class given the features of a new instance using Bayes' theorem:
\[ P(C_k | \mathbf{x}) = \frac{P(C_k) \times P(\mathbf{x} | C_k)}{P(\mathbf{x})} \]

Where:
- \( P(C_k | \mathbf{x}) \) is the posterior probability of class \( C_k \) given the features \( \mathbf{x} \) of the instance.
- \( P(C_k) \) is the prior probability of class \( C_k \).
- \( P(\mathbf{x} | C_k) \) is the likelihood of the features \( \mathbf{x} \) given class \( C_k \), which is calculated using the Gaussian probability density function.
- \( P(\mathbf{x}) \) is the evidence probability, which serves as a normalization factor.

### 5. Prediction:
- GNB selects the class with the highest posterior probability as the predicted class for the new instance.

### 6. Advantages of Gaussian Naive Bayes:
- GNB is computationally efficient and scales well to large datasets.
- It works well with high-dimensional data and is less prone to overfitting.
- Despite its "naive" assumption, GNB can perform surprisingly well in practice, especially when the independence assumption holds reasonably well or when the features are correlated but the class conditional distributions are still close to Gaussian.

### 7. Limitations:
- The assumption of Gaussian distribution may not hold true for all datasets, especially if the features are not continuous or do not follow a normal distribution.
- The "naive" assumption of feature independence may not be valid in all cases and can lead to suboptimal performance if violated.

In summary, Gaussian Naive Bayes classification works by estimating the parameters of Gaussian distributions for each class and each feature during training, and then using these distributions to calculate the posterior probability of each class given the features of a new instance during prediction. It's a simple yet powerful classifier, especially suitable for datasets with continuous features.

In [4]:
iris = datasets.load_iris()
X = iris.data
Y = iris.target
X_train , X_test , Y_train , Y_test = train_test_split(X,Y)

In [5]:
model = GaussianNB()
model.fit(X_train,Y_train)

In [6]:
predicted = model.predict(X_test)
expected = Y_test
metrics.accuracy_score(expected, predicted)

0.9210526315789473

Let's illustrate Gaussian Naive Bayes classification with a simple numerical example. Suppose we have a dataset with two continuous features (X1 and X2) and two classes (Class 0 and Class 1). We'll assume that the features follow a Gaussian distribution within each class.

### Dataset:
```
Class 0:
(1, 1), (1, 2), (2, 1)

Class 1:
(4, 4), (4, 5), (5, 4)
```

### Training Phase:
1. **Calculate Mean and Variance:**
   - For each feature (X1 and X2) and each class, calculate the mean and variance.

```
Class 0:
  Mean(X1) = (1+1+2)/3 = 4/3
  Variance(X1) = ((1-4/3)^2 + (1-4/3)^2 + (2-4/3)^2)/3 = 1/3
  Mean(X2) = (1+2+1)/3 = 4/3
  Variance(X2) = ((1-4/3)^2 + (2-4/3)^2 + (1-4/3)^2)/3 = 1/3

Class 1:
  Mean(X1) = (4+4+5)/3 = 13/3
  Variance(X1) = ((4-13/3)^2 + (4-13/3)^2 + (5-13/3)^2)/3 = 1/3
  Mean(X2) = (4+5+4)/3 = 13/3
  Variance(X2) = ((4-13/3)^2 + (5-13/3)^2 + (4-13/3)^2)/3 = 1/3
```

### Prediction Phase:
Suppose we have a new instance with features (3, 3).

1. **Calculate Likelihood:**
   - For each class, calculate the likelihood of the features using the Gaussian probability density function.

```
Class 0:
  Likelihood(X1=3) = (1/sqrt(2π*1/3)) * exp(-((3-4/3)^2)/(2*1/3)) ≈ 0.7979
  Likelihood(X2=3) = (1/sqrt(2π*1/3)) * exp(-((3-4/3)^2)/(2*1/3)) ≈ 0.7979
  P(X|Class 0) ≈ 0.7979 * 0.7979 ≈ 0.6365

Class 1:
  Likelihood(X1=3) = (1/sqrt(2π*1/3)) * exp(-((3-13/3)^2)/(2*1/3)) ≈ 0.0912
  Likelihood(X2=3) = (1/sqrt(2π*1/3)) * exp(-((3-13/3)^2)/(2*1/3)) ≈ 0.0912
  P(X|Class 1) ≈ 0.0912 * 0.0912 ≈ 0.0083
```

2. **Calculate Prior:**
   - Suppose equal prior probabilities for each class: \( P(Class 0) = P(Class 1) = 0.5 \).

3. **Calculate Posterior:**
   - Using Bayes' theorem, calculate the posterior probability of each class.

```
P(Class 0|X) ≈ (0.5 * 0.6365) / ((0.5 * 0.6365) + (0.5 * 0.0083)) ≈ 0.987
P(Class 1|X) ≈ (0.5 * 0.0083) / ((0.5 * 0.6365) + (0.5 * 0.0083)) ≈ 0.013
```

### Prediction:
Since \( P(Class 0|X) > P(Class 1|X) \), the model predicts Class 0 for the new instance (3, 3).

In this example, we trained a Gaussian Naive Bayes classifier using a simple dataset with two classes and two continuous features. We then used the trained classifier to predict the class of a new instance based on its features.