Bayes Bayes. Let's start with his formula:

$P(A|x) = \frac{P(x|A)P(A)}{P(x)} = \frac{P(A\cap x)}{P(x)} $

We have $P(A)$ the probability of $A$ occuring in a more general case it is called a parameter. And also we have $P(x)$ - probablility of $x$ data occuring.

Now some explaining to do: we have an event $A$ that can occur with a probability P(A) - this a called a prior i.e. a probability we have without taking into account any helpful information. We are interested in using information that some data $x$ can bring about $A$. This probablity is called the posterior (or a conditional probability) and is written as $P(x|A)$. Then there's somthing we call the likelihood P(x|A) - it is another conditional probability. It quantifies how likelily (thus likelihood) is the evidence to be observed on if our hypothesis is true. Lastly there is the marginal likelihood/probability of $P(x)$.

It seems that we have to calculate $P(A|x)$ given $P(x|A)$ $P(A)$ and $P(x)$. Let's start expanding it for a single probablity of class $A$  

$P(A_i|x) = \frac{P(x|A_i)P(A_i)}{P(x)} \propto P(x|A)P(A)= P(A)\prod P(x_k|A_i) = P(A) \frac{1}{\sqrt{\pi{\sigma}^2_{ik}}}exp(-\frac{(x-\mu_{ik})^2}{2\sigma{^2}_{ik}})$

There's a lot of going on in the equaltions above. First of all we got rid of $P(x)$ because it is the same for all of for all the classes. By doing so we forwent calculating the exact posterior probability, but we preserved the order of the probabilites which is what we really need. Then there's the $P(A)\prod P(x_k|A_i)$ which is the main assumption of the naive Bayes model which is independance of features ($x$). The last equation is the assumption of features to follow a gaussian distribution.

One more point is that we use log-likelihood instead of likelihood:  
$log[(P|A)] \propto log(A_i) +\Sigma_i^{K} log(P(x_k|A_i))$


So let's list what we need:  
1. Prior for classes $P(A_i)$ - for these we can take the frequencies of classes
2. Parameters for $\mu$ and $\sigma$ of the features - we can use the data to estimate these
3.


In [1]:
import pandas as pd
import numpy as np
from sklearn import datasets

iris = datasets.load_iris()
df=pd.DataFrame(iris['data'])
df.columns=iris.feature_names
df['y']=iris['target']

train = df.sample(frac = 0.8, random_state = 369)
test = df.drop(train.index)


In [2]:
X=train.iloc[:,0:4]
y=train.loc[:,["y"]]

In [3]:
means = train.groupby(["y"]).mean() # Estimate mean of each class, feature
variances = train.groupby(["y"]).var() # Estimate variance of each class, feature
priors = (train.groupby("y").count() / len(train)).iloc[:,1] # Estimate prior probabilities
classes = np.unique(train["y"].tolist()) # Storing all possible classes

In [7]:
means

Unnamed: 0_level_0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,4.952381,3.388095,1.45,0.245238
1,6.0,2.774359,4.325641,1.330769
2,6.566667,2.964103,5.515385,1.987179


In [8]:
variances

Unnamed: 0_level_0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.105482,0.136196,0.027439,0.012294
1,0.223684,0.093536,0.176694,0.035344
2,0.456491,0.110256,0.359757,0.0722


In [9]:
priors

y
0    0.350
1    0.325
2    0.325
Name: sepal width (cm), dtype: float64

In [10]:
def log_likelihood_normal(X, mean, var):
        return -0.5 * np.log(2 * np.pi * var) - 0.5 * ((X - mean) ** 2 / var)

In [11]:
X.iloc[0]

sepal length (cm)    7.7
sepal width (cm)     2.6
petal length (cm)    6.9
petal width (cm)     2.3
Name: 118, dtype: float64

In [13]:
log_likelihood_normal(X.iloc[0], means, variances)

Unnamed: 0_level_0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,-35.579634,-2.202253,-540.3666,-170.433452
1,-6.630179,0.103257,-18.805982,-12.53708
2,-1.933712,-0.417658,-3.072295,-0.282459


In [17]:
priors

y
0    0.350
1    0.325
2    0.325
Name: sepal width (cm), dtype: float64

In [18]:
np.log(priors)

y
0   -1.049822
1   -1.123930
2   -1.123930
Name: sepal width (cm), dtype: float64

In [19]:
X.iloc[1]

sepal length (cm)    5.8
sepal width (cm)     2.6
petal length (cm)    4.0
petal width (cm)     1.2
Name: 92, dtype: float64

In [25]:
def add_vector_to_df(dff, vector):
    for i, r in dff.iteritems():
        dff.loc[:,i] += vector
    return(dff)

In [29]:
i=0

In [31]:
log_likelihood_normal(X.iloc[i], means, variances) 

Unnamed: 0_level_0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,-35.579634,-2.202253,-540.3666,-170.433452
1,-6.630179,0.103257,-18.805982,-12.53708
2,-1.933712,-0.417658,-3.072295,-0.282459


In [32]:
l_i = log_likelihood_normal(X.iloc[i], means, variances)
l_i = add_vector_to_df(l_i, np.log(priors))

In [34]:
np.argmax(l1.sum(axis=1))

2

In [36]:
y.iloc[0]

y    2
Name: 118, dtype: int64

In [351]:
preds=[]
for i in range(0,X.shape[0]):
    l1 = log_likelihood_normal(X.iloc[i], means, variances)
    l1=add_vector_to_df(l1, np.log(priors))
    preds.append(np.argmax(l1.sum(axis=1)))

In [352]:
np.mean(preds==y.values.flatten())

0.9583333333333334

In [39]:
X_test=test.iloc[:,0:4]
y_test=test.loc[:,["y"]]

In [42]:
preds_oos=[]
for i in range(0,X_test.shape[0]):
    l1 = log_likelihood_normal(X_test.iloc[i], means, variances)
    l1=add_vector_to_df(l1, np.log(priors))
    preds_oos.append(np.argmax(l1.sum(axis=1)))

In [40]:
print("A")

A


In [48]:
np.mean(preds_oos==y_test['y'].values)

0.9666666666666667

In [73]:
class naive_bayes:
    def __init__(self):
        self.name = "Naive Bayes classifier"
    def log_likelihood_normal(X, mean, var):
        return -0.5 * np.log(2 * np.pi * var) - 0.5 * ((X - mean) ** 2 / var)
    def add_vector_to_df(dff, vector):
        for i, r in dff.iteritems():
            dff.loc[:,i] += vector
        return(dff)
    def predict(self, X):
        preds=list()
        for i in range(0,X.shape[0]):
            l1 = log_likelihood_normal(X.iloc[i], means, variances)
            l1=add_vector_to_df(l1, np.log(priors))
            preds.append(np.argmax(l1.sum(axis=1)))
        self.preds = preds
    def get_accuracy(self,  y):
        if self.preds:
            self.accuracy = np.mean(self.preds == y)
        else:
            print('Run predict() method first')

In [74]:
nb=naive_bayes()

In [75]:
nb.predict(X_test)

In [81]:
nb.get_accuracy(y_test['y'])

In [82]:
nb.accuracy

0.9666666666666667