## Table of Contents:
* [Gaussian Naive Bayes](#gaussian_naive_bayes)
* [Data load](#data_load)
* [Math Explanation](#math_expl)
* [SciKit GaussianNB](#sci_gnb)
* [Questions](#questions)

In [2]:
import pandas as pd
import traceback
import math
from sklearn.naive_bayes import GaussianNB

## Gaussian Naive Bayes <a class="anchor" id="gaussian_naive_bayes"></a>
$
\begin{align}
& \hat{y} = \underset{k \in {1, .., K}}{\mathrm{arg\,max}} P(y_k) \prod_{i=1}^{d} p(x_i | y_k) \\
& P(x_{i}\mid y_{k}) = \frac{1}{\sigma_{y_{k}}\sqrt{2\pi}} 
  \exp\left( -\frac{1}{2}\left(\frac{x_{i}-\mu_{y_{k}}}{\sigma_{y_{k}}}\right)^{\!2}\,\right) \\
& log(P(x_{i}\mid y_{k})) = -0.5*log(2\pi\sigma_{y_{k}}^2) - 0.5*\left(\frac{x_{i}-\mu_{y_{k}}}{\sigma_{y_{k}}}\right)^{\!2}
\end{align}
$
<br><br>
--> <b>Assumptions:</b> <br>
1) features are independent within each class (no co-relation). <br>
2) (Gaussian) Naive Bayes assumes that each class follow a Gaussian distribution.

## Data load <a class="anchor" id="data_load"></a>
https://www.kaggle.com/datasets/whenamancodes/fraud-detection <br>
-- selected three features ['V1', 'V4', 'V7'] <br>
-- target is 'Class' ~ Fraudulent (1) or Genuine (0) <br>

In [3]:
def get_conf():
    try:
        conf = {
            "data_fl_path": "../DataSets/creditcard.csv"
        }       
        return conf
    except Exception as e:
        raise e

In [4]:
def load_data(conf):
    try:
        df = pd.read_csv(conf["data_fl_path"])
        df = df[['V1', 'V4', 'V7', 'Class']]
        return df
    except Exception as e:
        raise e

In [5]:
def data_explor():
    try:
        conf = get_conf()
        df = load_data(conf)
        display("Number of  Fraudulent (1) or Genuine (0) in dataset")
        display(df['Class'].value_counts())
        display(df.head())
        return df
    except Exception as e:
        traceback.print_exc()
        
data_df = data_explor()

'Number of  Fraudulent (1) or Genuine (0) in dataset'

0    284315
1       492
Name: Class, dtype: int64

Unnamed: 0,V1,V4,V7,Class
0,-1.359807,1.378155,0.239599,0
1,1.191857,0.448154,-0.078803,0
2,-1.358354,0.37978,0.791461,0
3,-0.966272,-0.863291,0.237609,0
4,-1.158233,0.403034,0.592941,0


## Math Explanation <a class="anchor" id="math_expl"></a>

In [6]:
'''
From our training data
'''
stats_1=pd.DataFrame()
df_1 = data_df.loc[data_df['Class'] == 1]
stats_1["mean- mu"]=df_1[['V1', 'V4', 'V7']].mean()
stats_1["Std.Dev - sigma"]=df_1[['V1', 'V4', 'V7']].std()
stats_1["Var"]=df_1[['V1', 'V4', 'V7']].var()


stats_0=pd.DataFrame()
df_0 = data_df.loc[data_df['Class'] == 0]
stats_0["mean - mu"]=df_0[['V1', 'V4', 'V7']].mean()
stats_0["Std.Dev - sigma"]=df_0[['V1', 'V4', 'V7']].std()
stats_0["Var"]=df_0[['V1', 'V4', 'V7']].var()

display("Distribution Parameter of class Fraudulent (1)")
display(stats_1)

display("Distribution Parameter of class Genuine (0)")
display(stats_0)

'Distribution Parameter of class Fraudulent (1)'

Unnamed: 0,mean- mu,Std.Dev - sigma,Var
V1,-4.771948,6.783687,46.018406
V4,4.542029,2.873318,8.255955
V7,-5.568731,7.206773,51.937575


'Distribution Parameter of class Genuine (0)'

Unnamed: 0,mean - mu,Std.Dev - sigma,Var
V1,0.008258,1.929814,3.724182
V4,-0.00786,1.399333,1.958134
V7,0.009637,1.178812,1.389598


--> <b>let's take a TEST instance</b> <br>

<table style="float:left">
    <tr>
        <td>V1</td> <td>V4</td> <td>V7</td> <td>CLASS</td>
    </tr>
    <tr>
        <td>1.191857</td> <td>0.448154</td> <td>-0.078803</td> <td>?</td>
    </tr>
    <tr>
        <td>-3.0435406239976</td> <td>2.2886436183814</td> <td>0.325574266158614</td> <td>?</td>
    </tr>   
</table>

So, We have calculate the below <br><br>
For Fraudulent (1) <br>
$
\begin{align}
& P(y = 1|X) = P(y = 1) \prod_{i=1}^{d} p(x_i | y = 1) \\
& = P(y = 1) p(x = V1 | y = 1) p(x = V4 | y = 1) p(x = V7 | y = 1) \\
\end{align}
$

For Genuine (0) <br>
$
\begin{align}
& P(y = 0|X) = P(y = 0) \prod_{i=1}^{d} p(x_i | y = 0) \\
& = P(y = 0) p(x = V1 | y = 0) p(x = V4 | y = 0) p(x = V7 | y = 0) \\
\end{align}
$

After this, we have to compare $P(y = 1|X)$ and $P(y = 0|X)$. Which ever is bigger, we will assign the test instance to that Class (0/1)

In [7]:
'''
First, the prior calculation,
'''
P_y1 = len(data_df.loc[data_df['Class'] == 1]) / len(data_df)
P_y0 = len(data_df.loc[data_df['Class'] == 0]) / len(data_df)

display(f'(P = y1):- {P_y1}')
display(f'(P = y0):- {P_y0}')

'(P = y1):- 0.001727485630620034'

'(P = y0):- 0.9982725143693799'

***
<b>TestCase-1</b> <br>
V1: 1.191857 V4: 0.448154 V7: -0.078803 Class: ? 
***

In [8]:
'''
now calculate the likelihood of y1 ~ Fraudulent (1)
'''

def pdf(x, mu=0.0, sigma=1.0):
    comp_1 = float(1 / (sigma * math.sqrt(2.0*math.pi)))
    
    comp_2_1 = math.pow(float((x - mu) / sigma), 2)
    comp_2_2 = (-1/2.0)*comp_2_1
    comp_2 = math.exp(comp_2_2)
    
    comp = comp_1 * comp_2
    
    return comp

def pdf_log(x, mu=0.0, sigma=1.0):
    comp_1 = -0.5*math.log((2.0*math.pi*math.pow(sigma,2)))
    
    comp_2 = -0.5*math.pow(float((x - mu) / sigma), 2)
    
    comp = comp_1 + comp_2
    
    return comp

x_y1_v1 = 1.191857
mu_y1_v1 = -4.771948
sigma_y1_v1 = 6.783687
P_y1_xv1 = pdf_log(x_y1_v1, mu=mu_y1_v1, sigma=sigma_y1_v1)

x_y1_v4 = 0.448154
mu_y1_v4 = 4.542029
sigma_y1_v4 = 2.873318
P_y1_xv4 = pdf_log(x_y1_v4, mu=mu_y1_v4, sigma=sigma_y1_v4)

x_y1_v7 = -0.078803
mu_y1_v7 = -5.568731
sigma_y1_v7 = 7.206773
P_y1_xv7 = pdf_log(x_y1_v7, mu=mu_y1_v7, sigma=sigma_y1_v7)

# pdf
# P_y1_xv1v4v7 = P_y1 * P_y1_xv1 * P_y1_xv4 * P_y1_xv7
# log pdf
P_y1_xv1v4v7 = math.log(P_y1) + P_y1_xv1 + P_y1_xv4 + P_y1_xv7

display(f"P_y1:- {P_y1}, P_y1_xv1:- {P_y1_xv1}, P_y1_xv4:- {P_y1_xv4}, P_y1_xv7:- {P_y1_xv7}")
display(f"P_y1_xv1v4v7:- {P_y1_xv1v4v7}")

'P_y1:- 0.001727485630620034, P_y1_xv1:- -3.2199021381220847, P_y1_xv4:- -2.9894193871916257, P_y1_xv7:- -3.1841091710093594'

'P_y1_xv1v4v7:- -15.754519016700355'

In [9]:
'''
now calculate the likelihood of y0 ~ Genuine (0)
'''

def pdf(x, mu=0.0, sigma=1.0):
    comp_1 = float(1 / (sigma * math.sqrt(2.0*math.pi)))
    
    comp_2_1 = math.pow(float((x - mu) / sigma), 2)
    comp_2_2 = (-1/2.0)*comp_2_1
    comp_2 = math.exp(comp_2_2)
    
    comp = comp_1 * comp_2
    
    return comp

def pdf_log(x, mu=0.0, sigma=1.0):
    comp_1 = -0.5*math.log((2.0*math.pi*math.pow(sigma,2)))
    
    comp_2 = -0.5*math.pow(float((x - mu) / sigma), 2)
    
    comp = comp_1 + comp_2
    
    return comp

x_y0_v1 = 1.191857
mu_y0_v1 = 0.008258
sigma_y0_v1 = 1.929814
P_y0_xv1 = pdf_log(x_y0_v1, mu=mu_y0_v1, sigma=sigma_y0_v1)

x_y0_v4 = 0.448154
mu_y0_v4 = -0.007860
sigma_y0_v4 = 1.399333
P_y0_xv4 = pdf_log(x_y0_v4, mu=mu_y0_v4, sigma=sigma_y0_v4)

x_y0_v7 = -0.078803
mu_y0_v7 = 0.009637
sigma_y0_v7 = 1.178812
P_y0_xv7 = pdf_log(x_y0_v7, mu=mu_y0_v7, sigma=sigma_y0_v7)

# pdf
# P_y0_xv1v4v7 = P_y0 * P_y0_xv1 * P_y0_xv4 * P_y0_xv7
# log pdf
P_y0_xv1v4v7 = math.log(P_y0) + P_y0_xv1 + P_y0_xv4 + P_y0_xv7

display(f"P_y0:- {P_y0}, P_y0_xv1:- {P_y0_xv1}, P_y0_xv4:- {P_y0_xv4}, P_y0_xv7:- {P_y0_xv7}")
display(f"P_y0_xv1v4v7:- {P_y0_xv1v4v7}")

'P_y0:- 0.9982725143693799, P_y0_xv1:- -1.7644446104514482, P_y0_xv4:- -1.3080329663381352, P_y0_xv7:- -1.0862600366231299'

'P_y0_xv1v4v7:- -4.160466592867256'

In [10]:
if P_y0_xv1v4v7 > P_y1_xv1v4v7:
    print("test data point is Genuine (0)")
    print(f"P_y0_xv1v4v7:- {P_y0_xv1v4v7}, P_y1_xv1v4v7:- {P_y1_xv1v4v7}")
else:
    print("test data point is Fraudulent (1)")
    print(f"P_y0_xv1v4v7:- {P_y0_xv1v4v7}, P_y1_xv1v4v7:- {P_y1_xv1v4v7}")

test data point is Genuine (0)
P_y0_xv1v4v7:- -4.160466592867256, P_y1_xv1v4v7:- -15.754519016700355


## SciKit GaussianNB <a class="anchor" id="sci_gnb"></a>

In [11]:
X_train = data_df[['V1', 'V4', 'V7']]
y_train = data_df.Class

print(X_train.shape, y_train.shape)

# instantiate the model
gnb = GaussianNB()

# fit the model
gnb.fit(X_train, y_train)

print('Sklearn values:')
print('Mean is',gnb.theta_)
print('Variance is',gnb.var_)

(284807, 3) (284807,)
Sklearn values:
Mean is [[ 0.00825774 -0.00785987  0.00963655]
 [-4.77194844  4.5420291  -5.56873108]]
Variance is [[ 3.72416918  1.95812662  1.3895933 ]
 [45.92487267  8.23917414 51.83201046]]


In [12]:
arr_test = [[1.191857, 0.448154, -0.078803]]
X_test=pd.DataFrame(arr_test, columns=['V1', 'V4', 'V7'])
y_pred = gnb.predict(X_test)
print(gnb._joint_log_likelihood(X_test))
print(y_pred)

[[ -4.16046253 -15.75491257]]
[0]


***
<b>TestCase-2</b> <br>
V1: -3.0435406239976 V2:2.2886436183814 V3: 0.325574266158614 Class: ?  
***

In [13]:
'''
now calculate the likelihood of y1 ~ Fraudulent (1)
'''

def pdf(x, mu=0.0, sigma=1.0):
    comp_1 = float(1 / (sigma * math.sqrt(2.0*math.pi)))
    
    comp_2_1 = math.pow(float((x - mu) / sigma), 2)
    comp_2_2 = (-1/2.0)*comp_2_1
    comp_2 = math.exp(comp_2_2)
    
    comp = comp_1 * comp_2
    
    return comp

def pdf_log(x, mu=0.0, sigma=1.0):
    comp_1 = -0.5*math.log((2.0*math.pi*math.pow(sigma,2)))
    
    comp_2 = -0.5*math.pow(float((x - mu) / sigma), 2)
    
    comp = comp_1 + comp_2
    
    return comp

x_y1_v1 = -3.0435406239976
mu_y1_v1 = -4.771948
sigma_y1_v1 = 6.783687
P_y1_xv1 = pdf_log(x_y1_v1, mu=mu_y1_v1, sigma=sigma_y1_v1)

x_y1_v4 = 2.2886436183814
mu_y1_v4 = 4.542029
sigma_y1_v4 = 2.873318
P_y1_xv4 = pdf_log(x_y1_v4, mu=mu_y1_v4, sigma=sigma_y1_v4)

x_y1_v7 = 0.325574266158614
mu_y1_v7 = -5.568731
sigma_y1_v7 = 7.206773
P_y1_xv7 = pdf_log(x_y1_v7, mu=mu_y1_v7, sigma=sigma_y1_v7)

# pdf
# P_y1_xv1v4v7 = P_y1 * P_y1_xv1 * P_y1_xv4 * P_y1_xv7
# log pdf
P_y1_xv1v4v7 = math.log(P_y1) + P_y1_xv1 + P_y1_xv4 + P_y1_xv7

display(f"P_y1:- {P_y1}, P_y1_xv1:- {P_y1_xv1}, P_y1_xv4:- {P_y1_xv4}, P_y1_xv7:- {P_y1_xv7}")
display(f"P_y1_xv1v4v7:- {P_y1_xv1v4v7}")

'P_y1:- 0.001727485630620034, P_y1_xv1:- -2.8659179554189693, P_y1_xv4:- -2.281926131908617, P_y1_xv7:- -3.22842703665569'

'P_y1_xv1v4v7:- -14.737359444360562'

In [65]:
'''
now calculate the likelihood of y0 ~ Genuine (0)
'''

def pdf(x, mu=0.0, sigma=1.0):
    comp_1 = float(1 / (sigma * math.sqrt(2.0*math.pi)))
    
    comp_2_1 = math.pow(float((x - mu) / sigma), 2)
    comp_2_2 = (-1/2.0)*comp_2_1
    comp_2 = math.exp(comp_2_2)
    
    comp = comp_1 * comp_2
    
    return comp

def pdf_log(x, mu=0.0, sigma=1.0):
    comp_1 = -0.5*math.log((2.0*math.pi*math.pow(sigma,2)))
    
    comp_2 = -0.5*math.pow(float((x - mu) / sigma), 2)
    
    comp = comp_1 + comp_2
    
    return comp

x_y0_v1 = -3.0435406239976
mu_y0_v1 = 0.008258
sigma_y0_v1 = 1.929814
P_y0_xv1 = pdf_log(x_y0_v1, mu=mu_y0_v1, sigma=sigma_y0_v1)

x_y0_v4 = 2.2886436183814
mu_y0_v4 = -0.007860
sigma_y0_v4 = 1.399333
P_y0_xv4 = pdf_log(x_y0_v4, mu=mu_y0_v4, sigma=sigma_y0_v4)

x_y0_v7 = 0.325574266158614
mu_y0_v7 = 0.009637
sigma_y0_v7 = 1.178812
P_y0_xv7 = pdf_log(x_y0_v7, mu=mu_y0_v7, sigma=sigma_y0_v7)

# pdf
# P_y0_xv1v4v7 = P_y0 * P_y0_xv1 * P_y0_xv4 * P_y0_xv7
# log pdf
P_y0_xv1v4v7 = math.log(P_y0) + P_y0_xv1 + P_y0_xv4 + P_y0_xv7

display(f"P_y0:- {P_y0}, P_y0_xv1:- {P_y0_xv1}, P_y0_xv4:- {P_y0_xv4}, P_y0_xv7:- {P_y0_xv7}")
display(f"P_y0_xv1v4v7:- {P_y0_xv1v4v7}")

'P_y0:- 0.9982725143693799, P_y0_xv1:- -2.826767570250749, P_y0_xv4:- -2.6016071266066216, P_y0_xv7:- -1.11936124299125'

'P_y0_xv1v4v7:- -6.549464919303164'

In [67]:
if P_y0_xv1v4v7 > P_y1_xv1v4v7:
    print("test data point is Genuine (0)")
    print(f"P_y0_xv1v4v7:- {P_y0_xv1v4v7}, P_y1_xv1v4v7:- {P_y1_xv1v4v7}")
else:
    print("test data point is Fraudulent (1)")
    print(f"P_y0_xv1v4v7:- {P_y0_xv1v4v7}, P_y1_xv1v4v7:- {P_y1_xv1v4v7}")

test data point is Genuine (0)
P_y0_xv1v4v7:- -6.549464919303164, P_y1_xv1v4v7:- -14.737359444360562


## SciKit GaussianNB <a class="anchor" id="sci_gnb"></a>

In [14]:
arr_test = [[-3.0435406239976, 2.2886436183814, 0.325574266158614]]
X_test=pd.DataFrame(arr_test, columns=['V1', 'V4', 'V7'])
y_pred = gnb.predict(X_test)
print(gnb._joint_log_likelihood(X_test))
print(y_pred)

[[ -6.54946846 -14.73568115]]
[0]


## Resources
1) https://www.kaggle.com/code/prashant111/naive-bayes-classifier-in-python
2) https://www.youtube.com/watch?v=IvTCdrx1SHQ
3) https://www.kaggle.com/code/ryanluoli2/a-complete-guide-to-naive-bayes-classifiers
4) https://towardsdatascience.com/how-i-was-using-naive-bayes-incorrectly-till-now-part-1-4ed2a7e2212b
5) https://towardsdatascience.com/how-i-was-using-naive-bayes-incorrectly-till-now-part-2-d31feff72483 
6) http://blog.axelmendoza.fr/naive-bayes/alcohol/pytorch/eda/from-scratch/2020/09/17/Naive-Bayes-Classifier.html
7) https://pub.towardsai.net/gaussian-naive-bayes-explained-and-hands-on-with-scikit-learn-4183b8cb0e4c
8) https://www.youtube.com/watch?v=3I8oX3OUL6I
9) https://www.youtube.com/watch?v=Po6lsacF5pw
10) https://medium.com/@christopherfielding/na%C3%AFve-bayes-classification-for-discrete-and-continuous-variables-cb1103155488
11) https://www.youtube.com/watch?v=IvTCdrx1SHQ
12) https://towardsdatascience.com/why-how-to-use-the-naive-bayes-algorithms-in-a-regulated-industry-with-sklearn-python-code-dbd8304ab2cf

## QUESTIONS <a class="anchor" id="questions"></a>

1) <b>Why gaussian distribution?</b> <br>

Naive Bayes can be extended to real-valued attributes, most commonly by assuming a Gaussian distribution. <br>

This extension of naive Bayes is called Gaussian Naive Bayes. Other functions can be used to estimate the distribution of the data, 
but the Gaussian (or Normal distribution) is the easiest to work with because you only need to estimate the mean and the standard deviation from your training data.

It is better to use whether features are normally distributed or not. <br>
1) plot probability distribution - sns.kdeplot/histogram <br>
2) Q-Q plot <br>
3) Shapiro–Wilk test <br>

2) <b>Features Independence?</b> <br>

The assumption of independent features. In practice, it is almost impossible that the model will get a set of predictors that are entirely independent.

3) <b>Zero Probability?</b> <br>

If there is no training tuple of a particular class, this causes zero posterior probability. In this case, the model is unable to make predictions. 
This problem is known as Zero Probability/Frequency Problem. <br>

$P(cloudy | rain)$

$
\begin{align}
& P(cloudy \mid rain)  \\
&  = \frac{(no of points where day is cloudy and rain is yes)}{(no of points where day is cloudy and rain is yes)} \\ 
&  = \frac{(no of points where day is cloudy and rain is yes + \alpha)}{(no of points where day is cloudy and rain is yes + c\alpha)} \\ 
\end{align}
$
<br>
$\alpha$ is hyperparameter <br>
$c$ is number of classes <br>
$\alpha$ is small -> overfitting <br>
$\alpha$ is big -> underfitting