# Naive Bayes

One of the first probabilistic classifiers, considered as the "hello world" to data analysis. Is Naive because it assumes independece between the features on which we are goingo to apply the Bayes' theorem.

Let us take a vector $\textbf{x} = (x_1 , ... , x_n)$ representing n features. The probability of a certain outcome $C_k$ using Bayes' theorem is :

$$ p(C_k | \textbf{x}) = \frac{p(C_k) p(\textbf{x}|C_k)}{p(\textbf{x})}$$

for each $k$ in $K$.

The denominator, $p(\textrm{x})$, does not depend on $C$ and the $x_i$ values are given. So by using the joint probability and assuming that the variables are independent we obtain:

$$ p(C_k|\textbf{x}) = \frac{ p(C_k) \prod_{i=1}^n p(x_i|C_k) }{ \sum_kp(C_k)p(\textbf{x}|C_k)}$$

## classifier

What we have seen is the probability model. By combining the that model with a decision rule, we obtain the Naive Bayes classifier. A common rule is to pick the most probable hypothesis. In this case the function which assigns y to a clacc $C_k$ over some k is:

$$y = \underset{k\in\{1,...,K\}}{\textrm{argmax}} p(C_k)\prod_{i=1}^n p(x_i|C_k)$$ 

## Example

training set:

| Object | mean radius | radius/ratio | roudness |
|--------|-------------|--------------|----------|
| track  | 5           | 1.1          | 1.3      |
| track  | 6           | 1.5          | 1.2      |
| track  | 12          | 1            | 1        |
| track  | 7           | 1.2          | 1.4      |
| track  | 8           | 1.3          | 1.9      |
| track  | 4           | 1.7          | 1.2      |
| track  | 9           | 2            | 2.2      |
| track  | 5           | 1.4          | 1.5      |
| track  | 7           | 1.3          | 1.2      |
| dust   | 10          | 1.7          | 3        |
| dust   | 12          | 1.8          | 1.7      |
| dust   | 14          | 2            | 1.9      |
| dust   | 20          | 1            | 2.5      |
| dust   | 9           | 2.1          | 3        |
| dust   | 8           | 1.3          | 2.6      |
| dust   | 14          | 1.4          | 1.4      |
| dust   | 16          | 1.6          | 1.8      |
| dust   | 11          | 1.8          | 1.2      |
| dust   | 13          | 2            | 1.8      |
| dust   | 10          | 1.5          | 1.9      |

In [None]:
import numpy as np

In [None]:
# Object Dictionary that defines the first column of the data array
ObjectDict  = {"track":0, 
               "dust":1}
# Features Dictionary which defines the remaining columns of the data array
FeatureDict = {"radius":1,
               "rr":2,
               "roundness":3}
# The data containing:
# Object type, radius, radius ratio, roundness
train_data =   [[0,  5, 1.1, 1.3],
                [0,  6, 1.5, 1.2],
                [0, 12, 1,   1],
                [0,  7, 1.2, 1.4],
                [0,  8, 1.3, 1.9],
                [0,  4, 1.7, 1.2],
                [0,  9, 2,   2.2],
                [0,  5, 1.4, 1.5],
                [0,  7, 1.3, 1.2],
                [0,  3, 1.0, 1.8],
                [0,  9, 1.4, 1.2],
                [1,  10, 1.7, 3],
                [1,  12, 1.8, 1.7],
                [1,  14, 2,   1.9],
                [1,  20, 1,   2.5],
                [1,   9, 2.1, 3],
                [1,   8, 1.3, 2.6],
                [1,  14, 1.4, 1.4],
                [1,  16, 1.6, 1.8],
                [1,  11, 1.8, 1.2],
                [1,  13, 2,   1.8],
                [1,  10, 1.5, 1.9]
               ]
train_data=np.array(train_data)

# Classifier training dictionary, holding the info for the training
# in this case we are going to use a gaussian classifier
classifier_training = {}

print("{:10s} {:20s} {:>10s} {:>10s}".format("Class","Feature","Mean","Var"))
# for each object
for Object in ObjectDict:
  classifier_training[Object]={}
  # for each feature
  for Feature in FeatureDict:
    mean = train_data[train_data[:,0]==ObjectDict[Object]][:,FeatureDict[Feature]].mean()
    var  = train_data[train_data[:,0]==ObjectDict[Object]][:,FeatureDict[Feature]].var()
    # we are going to save the mean value and variance 
    # of the combined probability
    classifier_training[Object][Feature]=[mean,var]
    print("{:10s} {:20s} {:10.2f} {:10.2f}".format(Object,Feature,mean,var))

## Let us classify

data to classify:

| Class | radius         | radius/ratio | roudness     |
|-------|----------------|--------------|--------------|
| track |  6.82 +- 6.15  | 1.35 +- 0.08 | 1.45 +- 0.12 |
| dust  | 12.45 +- 10.98 | 1.65 +- 0.10 | 2.07 +- 0.34 |

In [None]:
# Gaussian function
def gaussian_prob(value,mean,var):
  return np.exp(-(value-mean)**2/(2*var))/(np.sqrt(2*np.pi*var))

# Prediction function
# data_to_predict       List containing [mean radius, radius ratio, roudness]
# weights               Containing the training dictionary
# model                 Which is the fuction used as a model
# ObjectDict            Object names
# FeatureDict           Feture names
def predict(data_to_predict , weights=classifier_training,model=gaussian_prob, ObjectDict=ObjectDict, FeatureDict=FeatureDict):
  # array which will be used as a container for the predictions
  predictions = np.zeros(len(ObjectDict))
  print(data_to_predict)
  # for each object
  for Object in ObjectDict:
    print(Object)
    # for each feature
    # the initial probability is always p(feature)
    P = 1 / len(ObjectDict)
    for Feature in FeatureDict:
      # P(f|o)
      Pfo = model(data_to_predict[FeatureDict[Feature]-1],*weights[Object][Feature])
      # each combined probabily has to be multiplied with the object probability
      P *= Pfo
      print("     P({}|{}) = {}".format(Feature,Object,Pfo/ len(ObjectDict)))
    # once the loop over all features has finished we assign the probability 
    # to the predictions array, to the correct index of the object
    predictions[ObjectDict[Object]]=P
    print("P({}) = {}".format(Object,P))
  # now I invert the Object Dictionary to be used with the argmax function
  inv_ObjectDict = {v: k for k, v in ObjectDict.items()}
  return inv_ObjectDict[np.argmax(predictions)]

In [None]:
# We have now two test values which we do not know if they are track or dust, let's ask to the model
values = [[7,1.4,1.7],
          [15,1.2,1.4]]

for i in range(len(values)):
  print("\nRESULT:\n{} corresponds to {} \n".format(values[i],predict(values[i])))

We get that the first set of values correspond to a track while the latter ones refer to dust. This was a very simple example writtes to explicitly follow each step of the calculation. The following one will be more pythonic and will use a well known datase.

## Now something more serious

We will use the [iris dataset](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html) which comes with scikit-learn.

If you have to install this python library you can do it from within this notebook, by writing and executing the following line in a *code block*:

```
conda install scikit-learn -y
```

Once the block has finished the execution you can start to use the **sklearn** modules. We will begin by uploading the data and split the data into train and test.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Import the Iris dataset
iris = load_iris()
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

Once the data is uploaded let's define a numpy function which applies the naive bayes throuhgh a gausian classifier (as the example above).

In [None]:
# Define a function to train the classifier
def train_naive_bayes(X, y):
    n_samples, n_features = X.shape
    classes = np.unique(y)
    n_classes = len(classes)
    
    # Calculate class probabilities
    class_probs = np.zeros(n_classes)
    for i in range(n_classes):
        class_probs[i] = np.mean(y == classes[i])
    
    # Calculate mean and variance for each feature for each class
    means = np.zeros((n_classes, n_features))
    variances = np.zeros((n_classes, n_features))
    for i in range(n_classes):
        means[i, :] = np.mean(X[y == classes[i], :], axis=0)
        variances[i, :] = np.var(X[y == classes[i], :], axis=0)
    
    # Define the function to calculate the probability of a sample belonging to a class
    def calculate_probability(sample, mean, var):
        exponent = -((sample - mean) ** 2) / (2 * var + 1e-9)
        return np.prod(1 / np.sqrt(2 * np.pi * var + 1e-9) * np.exp(exponent))
    
    # Define the function to predict the class of a sample
    def predict(sample):
        probabilities = np.zeros(n_classes)
        for i in range(n_classes):
            probabilities[i] = class_probs[i] * np.prod(calculate_probability(sample, means[i, :], variances[i, :]))
        return classes[np.argmax(probabilities)]
    
    return predict

# Train the classifier using the training data
predictor = train_naive_bayes(X_train, y_train)

# Make predictions on the testing data
y_pred = [predictor(sample) for sample in X_test]

# Calculate the accuracy of the classifier
accuracy = np.mean(y_pred == y_test)
print("Accuracy:", accuracy)

Obviously there are more simpler ways to use a Naive Bayes classifier, **sklearn** has the *GausianNB* class which can do this in just a few lines of code:

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Initialize the Naive Bayes classifier
clf = GaussianNB()

# Train the classifier using the training data
clf.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = clf.predict(X_test)

# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

## Conclusions and exercises

In the code above we have seen two kind of datasets and 3 ways to apply the naive bayes classifier to the data. The aim of the notebook is to show that the training process is achieved by computing the mean and the variance of each feature, assuming that the values of those feature distribute according to a gaussian probability distribution. Even with this strong assumption, this classifier has an accuracy of just less than 0.98 on the iris dataset.

**Evaluate your comprehension by answering these questions:**

1. What is the size of the iris dataset ?
2. How many features are there ?
3. How many classes are there ?
4. Could you produce a scatter plot of the first two features where each dot has the appropriate class color ?
5. Why does the prediction accuracy decrease if we upload the data as follows: 

```
# Split the dataset into features and labels
X = iris.data
y = iris.target

## Split the dataset into training and testing sets
split = int(0.7 * len(X))
X_train = X[:split]
y_train = y[:split]
X_test = X[split:]
y_test = y[split:]
```

tip: try this by using `the train_naive_bayes()` function and the `sklearn` gaussian classifier `GaussianNB`.

In [None]:
# Split the dataset into features and labels
X = iris.data
y = iris.target

## Split the dataset into training and testing sets
split = int(0.7 * len(X))
X_train = X[:split]
y_train = y[:split]
X_test = X[split:]
y_test = y[split:]

In [None]:
# Initialize the Naive Bayes classifier
clf = GaussianNB()

# Train the classifier using the training data
clf.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = clf.predict(X_test)

# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)