## Table of Contents:
* [Multimonial Naive Bayes](#multinomial_naive_bayes)
* [Data load ~ 1](#data_load_1)
* [Math Explanation ~ 1](#math_expl_1)
* [SciKit MultimonialNB ~ 1](#sci_mnb_1)
* [Data load ~ 2](#data_load_2)
* [SciKit MultimonialNB ~ 2](#sci_mnb_2)
* [Questions](#questions)

In [1]:
import pandas as pd
import traceback
import numpy as np
import string

import nltk
from nltk import word_tokenize
from nltk.corpus import stopwords

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import LabelBinarizer
from sklearn.naive_bayes import MultinomialNB

## Multinomial Naive Bayes <a class="anchor" id="multinomial_naive_bayes"></a>

<!-- 1) A multinomial distribution is useful to model feature vectors where each value represents, for example, the number of occurrences of a term or its relative frequency. 
If the feature vector $X$ has n features and each of the feature $x_{i}$ is the count (i.e. frequency) . can assume k different values with probability pk, then: <br>

$
\begin{align}
& \hat{y} = \underset{k \in {1, .., K}}{\mathrm{arg\,max}} P(y_k|X) \\
& = \underset{k \in {1, .., K}}{\mathrm{arg\,max}} P(y_k) \prod_{i=1}^{d} p(x_i | y_k) \\
\end{align}
$
<br>
$
\begin{align}
  & P(X \mid y_{k}) = \frac{(\sum_{i=1}^{d}x_{i})!}{\prod_{i=1}^{d}x_{i}!} \prod_{i=1}^{d} p(x_i | y_k)\\
  & log(P(X \mid y_{k})) = \sum_{i=1}^{d} x_{i} * logp(x_{i} \mid y_{k})
\end{align}
$
<br>
Now, we can say, <br>
$
\begin{align}
& \hat{y} = \underset{k \in {1, .., K}}{\mathrm{arg\,max}} \mid log(P(y_k)) + \sum_{i=1}^{d} x_{i} * logp(x_{i} \mid y_{k}) \mid
\end{align}
$ -->
$
\begin{align}
& \hat{y} = \underset{k \in {1, .., K}}{\mathrm{arg\,max}} P(y_k) \prod_{i=1}^{d} p(x_i | y_k) \\
& P(x_{i}\mid y_{k}) = \frac{(N_{tic} + \alpha)}{(N_{c} + \alpha n_{i})}
\end{align}
$
<br><br>
where $N_{tic}$ is the number of times category $t$ appears in feature $x_{i}$ when $y=k$, <b>[SUM]</b> and <br>
$N_{c}$ it the number of times $y=k$. <br>
$\alpha$ is a hyperparameter introduced to reduce overfitting on the train set and <br>
$n_{i}$ is the number of catergories in the feature $x_{i}$.<br>

In [2]:
def get_conf():
    try:
        conf = {
            "data1_fl_path": "../DataSets/iris.csv",
            "data2_fl_path": "../DataSets/BBCNews.csv"
        }       
        return conf
    except Exception as e:
        raise e

***
<b>IRIS DATA</b>
***

## Data load <a class="anchor" id="data_load_1"></a>
<b>Data1:</b> <br>
https://www.kaggle.com/datasets/saurabh00007/iriscsv <br>
-- selected three features ['SepalLengthCm', 'SepalLengthCm'] <br>
-- target is 'Class' ~ variety Iris-setosa, Iris-versicolor, Iris-virginica <br>

In [3]:
def load_iris(conf):
    try:
        df = pd.read_csv(conf["data1_fl_path"])
        df = df[['sepal.length', 'sepal.width', 'variety']]
        df.rename({'variety': 'Class'}, axis=1, inplace=True)
        return df
    except Exception as e:
        raise e

In [4]:
def data_explor():
    try:
        conf = get_conf()
        iris_df = load_iris(conf)
        display(iris_df.head())
        
        display(iris_df['Class'].value_counts())
        
        display(iris_df.describe().T)
        
        return iris_df
    except Exception as e:
        traceback.print_exc()
        
iris_df = data_explor()

Unnamed: 0,sepal.length,sepal.width,Class
0,5.1,3.5,Setosa
1,4.9,3.0,Setosa
2,4.7,3.2,Setosa
3,4.6,3.1,Setosa
4,5.0,3.6,Setosa


Setosa        50
Versicolor    50
Virginica     50
Name: Class, dtype: int64

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
sepal.length,150.0,5.843333,0.828066,4.3,5.1,5.8,6.4,7.9
sepal.width,150.0,3.057333,0.435866,2.0,2.8,3.0,3.3,4.4


## Math Explanation <a class="anchor" id="math_expl_1"></a>

In [5]:
sepal_length_sum = iris_df.groupby('Class')['sepal.length'].sum()
sepal_width_sum = iris_df.groupby('Class')['sepal.width'].sum()

sepal_sum_df = pd.concat([sepal_length_sum, sepal_width_sum], axis=1)
display(sepal_sum_df)

Unnamed: 0_level_0,sepal.length,sepal.width
Class,Unnamed: 1_level_1,Unnamed: 2_level_1
Setosa,250.3,171.4
Versicolor,296.8,138.5
Virginica,329.4,148.7


<table style="float:left">
    <tr>
         <td>
            <table>
                <tr>
                    <td colspan=3> Frequency Table </td>
                </tr>
                <tr>
                    <td> Class </td>
                    <td>  </td>
                    <td> Setosa </td>
                    <td> Versicolor </td>
                    <td> Virginica </td>
                    <td> Total </td>
                </tr>
                <tr>
                    <td>  </td>
                    <td>  </td>
                    <td> 50 </td>
                    <td> 50 </td>
                    <td> 50 </td>
                    <td> 150 </td>
                </tr>
            </table>
        </td>
        <td>
            <table>
                <tr>
                    <td colspan=3> Likelihood Table </td>
                </tr>
                <tr>
                    <td> Class </td>
                    <td>  </td>
                    <td> Setosa </td>
                    <td> Versicolor </td>
                    <td> Virginica </td>
                    <td> Total </td>
                </tr>
                <tr>
                    <td>  </td>
                    <td>  </td>
                    <td> 50/150 </td>
                    <td> 50/150 </td>
                    <td> 50/150 </td>
                    <td>  </td>
                </tr>
            </table>
        </td>
        <td>
            <table>
                <tr>
                    <td colspan=3> Likelihood Table </td>
                </tr>
                <tr>
                    <td> Class </td>
                    <td>  </td>
                    <td> Setosa </td>
                    <td> Versicolor </td>
                    <td> Virginica </td>
                    <td> Total </td>
                </tr>
                <tr>
                    <td>  </td>
                    <td>  </td>
                    <td> 0.333334 </td>
                    <td> 0.333334 </td>
                    <td> 0.333334 </td>
                    <td>  </td>
                </tr>
            </table>
        </td>
        <td>
            <table>
                <tr>
                    <td colspan=3> Log Likelihood Table </td>
                </tr>
                <tr>
                    <td> Class </td>
                    <td>  </td>
                    <td> Setosa </td>
                    <td> Versicolor </td>
                    <td> Virginica </td>
                    <td> Total </td>
                </tr>
                <tr>
                    <td>  </td>
                    <td>  </td>
                    <td> -1.0986102887 </td>
                    <td> -1.0986102887 </td>
                    <td> -1.0986102887 </td>
                    <td>  </td>
                </tr>
            </table>
        </td>
    </tr>    
</table>

<table style="float:left">
    <tr>
         <td>
            <table>
                <tr>
                    <td colspan=3> Frequency Table </td>
                </tr>
                <tr>
                    <td> Class </td>
                    <td>  </td>
                    <td> Setosa </td>
                    <td> Versicolor </td>
                    <td> Virginica </td>
                    <td> Total </td>
                </tr>
                <tr>
                    <td> S-length </td>
                    <td> </td>
                    <td> 250.3 </td>
                    <td> 296.8 </td>
                    <td> 329.4 </td>
                </tr>
                <tr>
                    <td> S-width </td>
                    <td> </td>
                    <td> 171.4 </td>
                    <td> 138.5 </td>
                    <td> 148.7 </td>
                </tr>
                <tr>
                    <td> Total </td>
                    <td> </td>
                    <td> 421.7 </td>
                    <td> 435.3 </td>
                    <td> 478.1</td>
                </tr>
            </table>
        </td>
        <td>
            <table>
                <tr>
                    <td colspan=3> Likelihood Table </td>
                </tr>
                <tr>
                    <td> Class </td>
                    <td>  </td>
                    <td> Setosa </td>
                    <td> Versicolor </td>
                    <td> Virginica </td>
                    <td> Total </td>
                </tr>
                <tr>
                    <td> S-length </td>
                    <td> </td>
                    <td> 250.3/421.7 </td>
                    <td> 296.8/435.3 </td>
                    <td> 329.4/478.1 </td>
                </tr>
                <tr>
                    <td> S-width </td>
                    <td> </td>
                    <td> 171.4/421.7 </td>
                    <td> 138.5/435.3 </td>
                    <td> 148.7/478.1 </td>
                </tr>
                <tr>
                    <td> Total </td>
                    <td> </td>
                    <td>  </td>
                    <td>  </td>
                    <td>  </td>
                </tr>
            </table>
        </td>
        <td>
            <table>
                <tr>
                    <td colspan=3> Likelihood Table </td>
                </tr>
                <tr>
                    <td> Class </td>
                    <td>  </td>
                    <td> Setosa </td>
                    <td> Versicolor </td>
                    <td> Virginica </td>
                    <td> Total </td>
                </tr>
                <tr>
                    <td> S-length </td>
                    <td> </td>
                    <td> 0.593549 </td>
                    <td> 0.6818286 </td>
                    <td> 0.6889772 </td>
                </tr>
                <tr>
                    <td> S-width </td>
                    <td> </td>
                    <td> 0.40645008 </td>
                    <td> 0.3181714 </td>
                    <td> 0.3110228 </td>
                </tr>
                <tr>
                    <td> Total </td>
                    <td> </td>
                    <td>  </td>
                    <td>  </td>
                    <td>  </td>
                </tr>
            </table>
        </td>
        <td>
            <table>
                <tr>
                    <td colspan=3> Log Likelihood Table </td>
                </tr>
                <tr>
                    <td> Class </td>
                    <td>  </td>
                    <td> Setosa </td>
                    <td> Versicolor </td>
                    <td> Virginica </td>
                    <td> Total </td>
                </tr>
                <tr>
                    <td> S-length </td>
                    <td> </td>
                    <td> -0.52163550726 </td>
                    <td> -0.38297697237 </td>
                    <td> -0.3725470999 </td>
                </tr>
                <tr>
                    <td> S-width </td>
                    <td> </td>
                    <td> -0.90029416196 </td>
                    <td> -1.1451650476 </td>
                    <td> -1.16788905759 </td>
                </tr>
                <tr>
                    <td> Total </td>
                    <td> </td>
                    <td>  </td>
                    <td>  </td>
                    <td>  </td>
                </tr>
            </table>
        </td>
    </tr>    
</table>

--> <b>let's take TEST instance</b> <br>
sepal.length : 4.6 &nbsp; sepal.width : 3.1 &nbsp; Class : ? <br>
sepal.length: 6.3 &nbsp; sepal.width: 3.3  &nbsp; Class : ? <br>

--> <b>TestCase-1</b> <br>
sepal.length : 4.6 &nbsp; sepal.width : 3.1 &nbsp; Class : ? <br>

<b>let's calcualte $ P(y_{Class}|X_{test}) $ for test sample 1 </b>
***

<b>Probability $ P(X|y):- $ </b><br>
$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Setosa ) = \\
(P(X_{S-length} | y_{Class} = Setosa ) ** 4.6) * \\
(P(X_{S-width} | y_{Class} = Setosa ) ** 3.1)
\end{align}
$
<br>

$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Setosa) = (0.593549 ** 4.6) * (0.40645008 ** 3.1)
\end{align}
$
<br>
$ 
\begin{align}
P(y_{Class} = Setosa) = 0.333334
\end{align}
$
<br>
So, $ P(X|y):- $ is [ ignoring denominator $P(X)$] <br>
$
\begin{align}
P(y_{Class|X_{test}} = Setosa) = 0.333334 * (0.593549 ** 4.6) * (0.40645008 ** 3.1) = 0.0018565311155443827
\end{align}
$


***
$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Versicolor ) = \\
(P(X_{S-length} | y_{Class} = Versicolor ) ** 4.6) * \\
(P(X_{S-width} | y_{Class} = Versicolor ) ** 3.1)
\end{align}
$
<br>

$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Versicolor) = (0.6818286 ** 4.6) * (0.3181714 ** 3.1)
\end{align}
$
<br>
$ 
\begin{align}
P(y_{Class} = Versicolor) = 0.333334
\end{align}
$
<br>
So, $ P(X|y):- $ is [ ignoring denominator $P(X)$] <br>
$
\begin{align}
P(y_{Class|X_{test}} = Versicolor) = 0.333334 * (0.6818286 ** 4.6) * (0.3181714 ** 3.1) = 0.0016445047409222961
\end{align}
$


***
$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Virginica ) = \\
4.6 ** P(X_{S-length} | y_{Class} = Virginica ) * \\
3.1 ** P(X_{S-width} | y_{Class} = Virginica )
\end{align}
$
<br>

$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Virginica) = (0.6889772 ** 4.6) * (0.3110228 ** 3.1)
\end{align}
$
<br>
$ 
\begin{align}
P(y_{Class} = Virginica) = 0.333334
\end{align}
$
<br>
So, $ P(X|y):- $ is [ ignoring denominator $P(X)$] <br>
$
\begin{align}
P(y_{Class|X_{test}} = Virginica) = 0.333334 * (0.6889772 ** 4.6) * (0.3110228 ** 3.1) = 0.00160796957891167
\end{align}
$

***
$
\begin{align}
& evidence = 0.0018565311155443827 + 0.0016445047409222961 + 0.00160796957891167 = 0.005109005435378349 \\
& P(y_{Class|X_{test}} = Setosa) = 0.0018565311155443827/0.005109005435378349 = 0.363384 \\
& P(y_{Class|X_{test}} = Versicolor) = 0.0016445047409222961/0.005109005435378349 = 0.321883 \\
& P(y_{Class|X_{test}} = Virginica) = 0.00160796957891167/0.005109005435378349 = 0.31473240716 \\
\end{align}
$
***
So, The anser is Class = Setosa

<b>Joint Likelihood $ P(X|y):- $ </b><br>
$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Setosa ) = \\
(P(X_{S-length} | y_{Class} = Setosa ) ** 4.6) * \\
(P(X_{S-width} | y_{Class} = Setosa ) ** 3.1)
\end{align}
$
<br>
$ 
\begin{align}
log(P(X_{S-length, S-width}|y_{Class} = Setosa ) )= \\
4.6 * log(P(X_{S-length} | y_{Class} = Setosa )) \\
3.1 * log(P(X_{S-width} | y_{Class} = Setosa ))
\end{align}
$
<br>

$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Setosa) = (4.6 * -0.52163550726) + (3.1 * -0.90029416196)
\end{align}
$
<br>
$ 
\begin{align}
P(y_{Class} = Setosa) = -1.0986102887
\end{align}
$
<br>
So, $ P(X|y):- $ is [ ignoring denominator $P(X)$] <br>
$
\begin{align}
P(y_{Class|X_{test}} = Setosa) = -1.0986102887 + (4.6 * -0.52163550726) + (3.1 * -0.90029416196) = -6.28904552417
\end{align}
$


***
$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Versicolor ) = \\
(P(X_{S-length} | y_{Class} = Versicolor ) ** 4.6) * \\
(P(X_{S-width} | y_{Class} = Versicolor ) ** 3.1)
\end{align}
$
<br>
$ 
\begin{align}
log(P(X_{S-length, S-width}|y_{Class} = Versicolor ) )= \\
4.6 * log(P(X_{S-length} | y_{Class} = Versicolor )) \\
3.1 * log(P(X_{S-width} | y_{Class} = Versicolor ))
\end{align}
$
<br>

$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Versicolor) = (4.6 * -0.38297697237) + (3.1 * -1.1451650476)
\end{align}
$
<br>
$ 
\begin{align}
P(y_{Class} = Versicolor) = 0.333334
\end{align}
$
<br>
So, $ P(X|y):- $ is [ ignoring denominator $P(X)$] <br>
$
\begin{align}
P(y_{Class|X_{test}} = Versicolor) = -1.0986102887 + (4.6 * -0.38297697237) + (3.1 * -1.1451650476) = -6.41031600916
\end{align}
$


***
$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Virginica ) = \\
(P(X_{S-length} | y_{Class} = Virginica ) ** 4.6) * \\
(P(X_{S-width} | y_{Class} = Virginica ) ** 3.1)
\end{align}
$
<br>
$ 
\begin{align}
log(P(X_{S-length, S-width}|y_{Class} = Virginica ) )= \\
4.6 * log(P(X_{S-length} | y_{Class} = Virginica )) \\
3.1 * log(P(X_{S-width} | y_{Class} = Virginica ))
\end{align}
$
<br>

$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Virginica) = (4.6 * -0.3725470999) + (3.1 * -1.16788905759 )
\end{align}
$
<br>
$ 
\begin{align}
P(y_{Class} = Virginica) = 0.333334
\end{align}
$
<br>
So, $ P(X|y):- $ is [ ignoring denominator $P(X)$] <br>
$
\begin{align}
P(y_{Class|X_{test}} = Virginica) = -1.0986102887 + (4.6 * -0.3725470999) + (3.1 * -1.16788905759 ) = -6.43278302677
\end{align}
$

***
So, The anser is Class = Setosa


--> <b>TestCase-2</b> <br>
sepal.length: 6.3 &nbsp; sepal.width: 3.3  &nbsp; Class : ? <br>

<b>Probability $ P(X|y):- $ </b><br>
$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Setosa ) = \\
6.3 ** P(X_{S-length} | y_{Class} = Setosa ) * \\
3.3 ** P(X_{S-width} | y_{Class} = Setosa )
\end{align}
$
<br>

$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Setosa) = (0.593549 ** 6.3) * (0.40645008 ** 3.3)
\end{align}
$
<br>
$ 
\begin{align}
P(y_{Class} = Setosa) = 0.333334
\end{align}
$
<br>
So, $ P(X|y):- $ is [ ignoring denominator $P(X)$] <br>
$
\begin{align}
P(y_{Class|X_{test}} = Setosa) = 0.333334 * (0.593549 ** 6.3) * (0.40645008 ** 3.3) = 0.0006388219896140624
\end{align}
$


***
$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Versicolor ) = \\
6.3 * P(X_{S-length} | y_{Class} = Versicolor ) * \\
3.3 * P(X_{S-width} | y_{Class} = Versicolor )
\end{align}
$
<br>

$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Versicolor) = (0.6818286 ** 6.3) * (0.3181714 ** 3.3)
\end{align}
$
<br>
$ 
\begin{align}
P(y_{Class} = Versicolor) = 0.333334
\end{align}
$
<br>
So, $ P(X|y):- $ is [ ignoring denominator $P(X)$] <br>
$
\begin{align}
P(y_{Class|X_{test}} = Versicolor) = 0.333334 * (0.6818286 ** 6.3) * (0.3181714 ** 3.3) = 0.0006820484428875236
\end{align}
$


***
$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Virginica ) = \\
6.3 * P(X_{S-length} | y_{Class} = Virginica ) * \\
3.3 * P(X_{S-width} | y_{Class} = Virginica )
\end{align}
$
<br>

$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Virginica) = (0.6889772 ** 6.3) * (0.3110228 ** 3.3)
\end{align}
$
<br>
$ 
\begin{align}
P(y_{Class} = Virginica) = 0.333334
\end{align}
$
<br>
So, $ P(X|y):- $ is [ ignoring denominator $P(X)$] <br>
$
\begin{align}
P(y_{Class|X_{test}} = Virginica) = 0.333334 * (0.6889772 ** 6.3) * (0.3110228 ** 3.3) = 0.0006757476108074129
\end{align}
$

***
$
\begin{align}
& evidence = 0.0006388219896140624 + 0.0006820484428875236 + 0.0006757476108074129 = 0.001996618043308999 \\
& P(y_{Class|X_{test}} = Setosa) = 0.0006388219896140624/0.001996618043308999 = 0.319952 \\
& P(y_{Class|X_{test}} = Versicolor) = 0.0006820484428875236/0.001996618043308999 = 0.3416018 \\
& P(y_{Class|X_{test}} = Virginica) = 0.0006757476108074129/0.001996618043308999 = 0.338446 \\
\end{align}
$
***
So, The anser is Class = Versicolor

<b>Joint Likelihood $ P(X|y):- $ </b><br>
$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Setosa ) = \\
(P(X_{S-length} | y_{Class} = Setosa ) ** 6.3) * \\
(P(X_{S-width} | y_{Class} = Setosa ) ** 3.3)
\end{align}
$
<br>
$ 
\begin{align}
log(P(X_{S-length, S-width}|y_{Class} = Setosa ) )= \\
6.3 * log(P(X_{S-length} | y_{Class} = Setosa )) \\
3.3 * log(P(X_{S-width} | y_{Class} = Setosa ))
\end{align}
$
<br>

$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Setosa) = (6.3 * -0.52163550726) + (3.3 * -0.90029416196)
\end{align}
$
<br>
$ 
\begin{align}
P(y_{Class} = Setosa) = -1.0986102887
\end{align}
$
<br>
So, $ P(X|y):- $ is [ ignoring denominator $P(X)$] <br>
$
\begin{align}
P(y_{Class|X_{test}} = Setosa) = -1.0986102887 + (6.3 * -0.52163550726) + (3.3 * -0.90029416196) = -7.35588471891
\end{align}
$


***
$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Versicolor ) = \\
(P(X_{S-length} | y_{Class} = Versicolor ) ** 6.3) * \\
(P(X_{S-width} | y_{Class} = Versicolor ) ** 3.3)
\end{align}
$
<br>
$ 
\begin{align}
log(P(X_{S-length, S-width}|y_{Class} = Versicolor ) )= \\
6.3 * log(P(X_{S-length} | y_{Class} = Versicolor )) \\
3.3 * log(P(X_{S-width} | y_{Class} = Versicolor ))
\end{align}
$
<br>

$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Versicolor) = (6.3 * -0.38297697237) + (3.3 * -1.1451650476)
\end{align}
$
<br>
$ 
\begin{align}
P(y_{Class} = Versicolor) = 0.333334
\end{align}
$
<br>
So, $ P(X|y):- $ is [ ignoring denominator $P(X)$] <br>
$
\begin{align}
P(y_{Class|X_{test}} = Versicolor) = -1.0986102887 + (6.3 * -0.38297697237) + (3.3 * -1.1451650476) = -7.29040987171
\end{align}
$


***
$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Virginica ) = \\
(P(X_{S-length} | y_{Class} = Virginica ) ** 6.3) * \\
(P(X_{S-width} | y_{Class} = Virginica ) ** 3.3)
\end{align}
$
<br>
$ 
\begin{align}
log(P(X_{S-length, S-width}|y_{Class} = Virginica ) )= \\
6.3 * log(P(X_{S-length} | y_{Class} = Virginica )) \\
3.3 * log(P(X_{S-width} | y_{Class} = Virginica ))
\end{align}
$
<br>

$ 
\begin{align}
P(X_{S-length, S-width}|y_{Class} = Virginica) = (6.3 * -0.3725470999) + (3.3 * -1.16788905759 )
\end{align}
$
<br>
$ 
\begin{align}
P(y_{Class} = Virginica) = 0.333334
\end{align}
$
<br>
So, $ P(X|y):- $ is [ ignoring denominator $P(X)$] <br>
$
\begin{align}
P(y_{Class|X_{test}} = Virginica) = -1.0986102887 + (6.3 * -0.3725470999) + (3.3 * -1.16788905759 ) = -7.29969090812
\end{align}
$

***
So, The anser is Class = Versicolor


## SciKit MultinomialNB <a class="anchor" id="sci_mnb_1"></a>

In [6]:
# store the feature matrix (X) and response vector (y)
X_train = iris_df[['sepal.length', 'sepal.width']]
y_train = iris_df['Class']

mnb = MultinomialNB(alpha=1.0e-10)
mnb.fit(X_train, y_train)

print('Sklearn values:')
print('feture class_count_\n',mnb.class_count_)
print('feture class_log_prior_\n',mnb.class_log_prior_)
print('feture feature_count_\n',mnb.feature_count_)
print('feture log-probabilities\n',mnb.feature_log_prob_)
print(mnb.feature_names_in_)

Sklearn values:
feture class_count_
 [50. 50. 50.]
feture class_log_prior_
 [-1.09861229 -1.09861229 -1.09861229]
feture feature_count_
 [[250.3 171.4]
 [296.8 138.5]
 [329.4 148.7]]
feture log-probabilities
 [[-0.52163396 -0.90029415]
 [-0.38297694 -1.14516512]
 [-0.3725471  -1.16788906]]
['sepal.length' 'sepal.width']


In [7]:
arr_test = [[4.6, 3.1]]
X_test=pd.DataFrame(arr_test, columns=['sepal.length', 'sepal.width'])
y_pred = mnb.predict(X_test)
print("X_test\n" ,X_test)


print(y_pred)
print('Sklearn predict_proba\n', mnb.predict_proba(X_test))
print('Sklearn predict_log_proba\n',mnb.predict_log_proba(X_test))
print('Sklearn _joint_log_likelihood\n',mnb._joint_log_likelihood(X_test))

X_test
    sepal.length  sepal.width
0           4.6          3.1
['Setosa']
Sklearn predict_proba
 [[0.36338571 0.32188269 0.3147316 ]]
Sklearn predict_log_proba
 [[-1.01229044 -1.13356812 -1.15603507]]
Sklearn _joint_log_likelihood
          0         1         2
0 -6.28904 -6.410318 -6.432785


In [8]:
arr_test = [[6.3, 3.3]]
X_test=pd.DataFrame(arr_test, columns=['sepal.length', 'sepal.width'])
y_pred = mnb.predict(X_test)
print(X_test)


print(y_pred)
print('Sklearn predict_proba\n', mnb.predict_proba(X_test))
print('Sklearn predict_log_proba\n',mnb.predict_log_proba(X_test))
print('Sklearn _joint_log_likelihood\n',mnb._joint_log_likelihood(X_test))

   sepal.length  sepal.width
0           6.3          3.3
['Versicolor']
Sklearn predict_proba
 [[0.31995415 0.34160079 0.33844506]]
Sklearn predict_log_proba
 [[-1.13957757 -1.0741125  -1.08339351]]
Sklearn _joint_log_likelihood
           0         1         2
0 -7.355877 -7.290412 -7.299693


***
<b>BBC DATA</b>
***

## Data load <a class="anchor" id="data_load_2"></a>
https://www.kaggle.com/competitions/learn-ai-bbc/data <br>
-- selected features ['Article'] ~ text of the header and article<br>
-- target is 'Category' ~ tech, business, sport, entertainment, politics <br>

In [9]:
def load_bbc(conf):
    try:
        columns = ['Article', 'Category']
        df = pd.read_csv(conf["data2_fl_path"])
        df = df[columns]
        df.rename({'Category': 'Class'}, axis=1, inplace=True)
        return df.head(100)
    except Exception as e:
        raise e

In [10]:
def data_explor():
    try:
        conf = get_conf()
        bbc_df = load_bbc(conf)
        display(bbc_df.head())
        
        count_df=pd.DataFrame()
        
        cls_cnt = bbc_df['Class'].value_counts().to_frame()
        
        count_df = pd.concat([cls_cnt], axis=1)
        display(count_df)
        
        return bbc_df
    except Exception as e:
        traceback.print_exc()
        
bbc_df = data_explor()

Unnamed: 0,Article,Class
0,worldcom ex-boss launches defence lawyers defe...,business
1,german business confidence slides german busin...,business
2,bbc poll indicates economic gloom citizens in ...,business
3,lifestyle governs mobile choice faster bett...,tech
4,enron bosses in $168m payout eighteen former e...,business


Unnamed: 0,Class
sport,31
business,23
tech,19
politics,14
entertainment,13


In [11]:
def gen_clean_data(df_row):
    try:
        remove_punc = [char for char in df_row if char not in string.punctuation]
        remove_punc = "".join(remove_punc)
        remove_punc = remove_punc.split()
        
        lower_word = [word.lower() for word in remove_punc]
        
        clean_word = [word for word in lower_word if word not in stopwords.words('english')]
        
        join_word = " ".join(clean_word)
        
        return join_word       
    except Exception as e:
        raise e

In [12]:
'''
Convert the text into BOW using CountVectorizer
'''
def convert_to_BOW(corpus):
    try:
        # Given text return the BOW representation of the words
        vectorizer = CountVectorizer()
        X = vectorizer.fit_transform(corpus)
        # Save vectorizer.vocabulary_
        # pickle.dump(vectorizer.vocabulary_,open("vocab.pkl","wb"))
        return X.toarray(), vectorizer
    except Exception as e:
        raise e

In [13]:
bbc_df['Article'] = bbc_df.iloc[:,0].apply(gen_clean_data)
# display(df.head(5))
display(bbc_df.head(5))
X, vectorizer = convert_to_BOW(bbc_df["Article"].values)
y = bbc_df["Class"].values

Unnamed: 0,Article,Class
0,worldcom exboss launches defence lawyers defen...,business
1,german business confidence slides german busin...,business
2,bbc poll indicates economic gloom citizens maj...,business
3,lifestyle governs mobile choice faster better ...,tech
4,enron bosses 168m payout eighteen former enron...,business


In [14]:
display(X.shape) # total number of words 6369 (vocabulary)
display(X[0:5]) # for each sentence, it creates a vector with the count of the word
counts = pd.DataFrame(X[0:5], index=['Article_1','Article_2','Article_3','Article_4','Article_5'], columns=vectorizer.get_feature_names_out()) #only first 5 Articles
display(counts[['worldcom','german', 'bbc', 'business']]) # display frequency for selected words

(100, 6369)

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

Unnamed: 0,worldcom,german,bbc,business
Article_1,9,0,0,1
Article_2,0,4,0,3
Article_3,0,0,5,0
Article_4,0,0,1,0
Article_5,0,0,0,0


## SciKit MultinomialNB <a class="anchor" id="sci_mnb_2"></a>

In [15]:
X_train = X
y_train = y

print(X_train.shape, y_train.shape)

# instantiate the model
mnb = MultinomialNB()

# fit the model
mnb.fit(X_train, y_train)

print('Sklearn values:')
print('feture log-probabilities',mnb.feature_log_prob_)

(100, 6369) (100,)
Sklearn values:
feture log-probabilities [[-7.47231123 -9.26407069 -9.26407069 ... -9.26407069 -9.26407069
  -9.26407069]
 [-7.01806602 -9.09750756 -9.09750756 ... -8.40436038 -9.09750756
  -9.09750756]
 [-7.37525578 -8.47386807 -9.16701525 ... -9.16701525 -9.16701525
  -9.16701525]
 [-8.00903069 -9.39532505 -7.78588713 ... -9.39532505 -8.70217787
  -8.70217787]
 [-7.18496605 -9.38219063 -9.38219063 ... -9.38219063 -9.38219063
  -9.38219063]]


In [16]:
arr_test = [['bbc poll indicates economic gloom citizens']]
X_test_df=pd.DataFrame(arr_test, columns=['Article'])
# Convert into BOW
#loaded_vec = CountVectorizer(decode_error="replace",vocabulary=pickle.load(open("vocab.pkl", "rb")))
# loaded_vec = CountVectorizer(decode_error="replace",vocabulary=vectorizer.vocabulary_)
X_test = vectorizer.transform(X_test_df.Article)
y_pred = mnb.predict(X_test)


print(y_pred)
print('Sklearn predict_proba\n', mnb.predict_proba(X_test))
print('Sklearn predict_log_proba\n',mnb.predict_log_proba(X_test))
print('Sklearn _joint_log_likelihood\n',mnb._joint_log_likelihood(X_test))

['business']
Sklearn predict_proba
 [[9.93464868e-01 8.51245278e-04 2.21508303e-03 2.26641201e-03
  1.20239175e-03]]
Sklearn predict_log_proba
 [[-6.55657953e-03 -7.06881025e+00 -6.11246539e+00 -6.08955731e+00
  -6.72344258e+00]]
Sklearn _joint_log_likelihood
 [[-48.46440025 -55.52665392 -54.57030907 -54.54740099 -55.18128626]]


In [17]:
arr_test = [['lifestyle governs mobile choice faster better']]
X_test_df=pd.DataFrame(arr_test, columns=['Article'])
# Convert into BOW
#loaded_vec = CountVectorizer(decode_error="replace",vocabulary=pickle.load(open("vocab.pkl", "rb")))
# loaded_vec = CountVectorizer(decode_error="replace",vocabulary=vectorizer.vocabulary_)
X_test = vectorizer.transform(X_test_df.Article)
y_pred = mnb.predict(X_test)


print(y_pred)
print('Sklearn predict_proba\n', mnb.predict_proba(X_test))
print('Sklearn predict_log_proba\n',mnb.predict_log_proba(X_test))
print('Sklearn _joint_log_likelihood\n',mnb._joint_log_likelihood(X_test))

['tech']
Sklearn predict_proba
 [[0.00998176 0.00255444 0.01087705 0.00238039 0.97420636]]
Sklearn predict_log_proba
 [[-4.6069962  -5.96992175 -4.52110041 -6.04049094 -0.02613212]]
Sklearn _joint_log_likelihood
 [[-54.16372838 -55.52665392 -54.07783258 -55.59722311 -49.5828643 ]]


## Resources
1) https://towardsdatascience.com/how-i-was-using-naive-bayes-incorrectly-till-now-part-1-4ed2a7e2212b
2) https://towardsdatascience.com/multinomial-na%C3%AFve-bayes-for-documents-classification-and-natural-language-processing-nlp-e08cc848ce6
3) https://towardsdatascience.com/multi-class-text-classification-with-scikit-learn-12f1e60e0a9f
4) https://www.ritchieng.com/machine-learning-multinomial-naive-bayes-vectorization/
5) https://developer.nvidia.com/blog/faster-text-classification-with-naive-bayes-and-gpus/
6) https://www.youtube.com/watch?v=xrB6JzTRzmA
7) https://www.youtube.com/watch?v=oq68P8Kv7nE
8) https://www.youtube.com/watch?v=IvTCdrx1SHQ
9) https://taylanbil.github.io/multinbvsbinomnb
10) https://towardsdatascience.com/why-how-to-use-the-naive-bayes-algorithms-in-a-regulated-industry-with-sklearn-python-code-dbd8304ab2cf

In [None]:
# import nltk
# import ssl

# try:
#     _create_unverified_https_context = ssl._create_unverified_context
# except AttributeError:
#     pass
# else:
#     ssl._create_default_https_context = _create_unverified_https_context
    
# nltk.download("all")