# Naive Bayes

In machine learning, a Bayes classifier is a simple probabilistic classifier, which is based on applying Bayes' theorem. The feature model used by a naive Bayes classifier makes strong independence assumptions. This means that the existence of a particular feature of a class is independent or unrelated to the existence of every other feature.

Definition of independent events:

Two events E and F are independent, if both $E$ and $F$ have positive probability and if $P(E|F) = P(E)$ and $P(F|E) = P(F)$

As we have stated in our definition, the Naive Bayes Classifier is based on the Bayes' theorem. The Bayes theorem is based on the conditional probability, which we will define.



$P(A|B)$ stands for `"the conditional probability of A given B"`, or `"the probability of A under the condition B"`, i.e. the probability of some event A under the assumption that the event B took place. When in a random experiment the event B is known to have occurred, the possible outcomes of the experiment are reduced to B, and hence the probability of the occurrence of A is changed from the unconditional probability into the conditional probability given B. The Joint probability is the probability of two events in conjunction. That is, it is the probability of both events together. There are three notations for the joint probability of A and B. It can be written as

- $P(A ∩ B)$
- $P(AB)$ or
- $P(A,B)$

The `conditional probability` is defined by
$$ P(A|B) = \frac{P( A ∩ B)}{P(B)} $$


## Bayes Theorem

$ P(A|B) $ is the conditional probability of $A$, given $B$ (posterior probability), $P(B)$ is the prior probability of $B$ and $P(A)$ the prior probability of $A$. $P(B|A)$ is the conditional probability of $B$ given $A$, called the likely-hood.

$$ P(A|B) = \frac{P(B|A) P(A)}{P(B)} $$

In [6]:
# Gaussian Naive Bayes
from sklearn import datasets
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
# load the iris datasets
dataset = datasets.load_iris()
# fit a Naive Bayes model to the data
model = GaussianNB()

model.fit(dataset.data, dataset.target)
print(model)
# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

GaussianNB()
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        50
           1       0.94      0.94      0.94        50
           2       0.94      0.94      0.94        50

    accuracy                           0.96       150
   macro avg       0.96      0.96      0.96       150
weighted avg       0.96      0.96      0.96       150

[[50  0  0]
 [ 0 47  3]
 [ 0  3 47]]


In [18]:
import numpy as np
import pandas as pd

def prepare_person_dataset(fname):
    genders = ["male", "female"]
    persons = []
    with open(fname) as fh:
        for line in fh:
            persons.append(line.strip().split())

    firstnames = []
    dataset = []  # weight and height


    for person in persons:
        firstnames.append( (person[0], person[4]) )
        height_weight = (float(person[2]), float(person[3]))
        dataset.append( (height_weight, person[4]))
    return dataset

learnset_df = pd.read_csv('https://python-course.eu/data/person_data.txt', names=['First Name', 'Last Name', 'Body Size', 'Weight', 'Gender'], delimiter=' ')
testset_df = pd.read_csv('https://python-course.eu/data/person_data_testset.txt', names=['First Name', 'Last Name', 'Body Size', 'Weight', 'Gender'], delimiter=' ')

learnset_df

Unnamed: 0,First Name,Last Name,Body Size,Weight,Gender
0,Randy,Carter,184,73.0,male
1,Stephanie,Smith,149,52.0,female
2,Cynthia,Watson,174,63.0,female
3,Jessie,Morgan,175,67.0,male
4,Katherine,Carter,183,81.0,female
...,...,...,...,...,...
95,Jessie,Thomas,168,69.0,female
96,Emily,Gonzalez,156,51.0,female
97,Doris,Nelson,167,40.0,female
98,Louis,Bennett,161,18.0,male


In [38]:
# Gaussian Naive Bayes
from sklearn import datasets
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB

model = GaussianNB()
#print(dataset.data, dataset.target)
w = learnset_df[['Body Size', 'Weight']].values
l = learnset_df['Gender'].values
model.fit(w, l)
#print(model)

w = testset_df[['Body Size', 'Weight']].values
l = testset_df['Gender'].values
predicted = model.predict(w)
#print(predicted)
#print(l)

# summarize the fit of the model
print(metrics.classification_report(l, predicted))
print(metrics.confusion_matrix(l, predicted))


              precision    recall  f1-score   support

      female       0.68      0.80      0.73        50
        male       0.76      0.62      0.68        50

    accuracy                           0.71       100
   macro avg       0.72      0.71      0.71       100
weighted avg       0.72      0.71      0.71       100

[[40 10]
 [19 31]]
