# Bayes Theorem

<img src="imgs/probability.png" width=150 align = "right">
<br><br>


1. Bayes theorem is one of the most important concepts in Probability Theory.
1. Bayes theorem is widely used for text classification.
1. Naive Bayes was the first algorithm to be designed for Email spam filters.

<img src="imgs/bayes.png">

# Derivation of Bayes theorem

In [112]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Bayes Theorem Question

**Q- Given that, the probability that it rains on Saturday is 25%. It if rains on Saturday, The probability it rains on Sunday is 50%. If it does not rain on saturday, the probability it rains on Sunday is 25%. 
Given that it rained on Sunday, what is the probability that it rained on Saturday.**

Let A = rain on Saturday. B = rain on Sunday.

Given:
1. P(A) = 25%
1. P(B|A) = 50%
1. P(B|$A^c$) = 25%

Find:
 P(A|B)

`P(A|B) = P(B|A).P(A) / P(B)`


# Naive Bayes Algorithm

# Spam or Non Spam Emails?

<br>
<img src="imgs/email.png" width=200>

**Problem Statement**: Given X as an email content. Predict if the email is a spam or non-spam.

**Solution**:

Find Posterior Probability of Both classes Y=1, Y=0

$$ P(Y=1|X)$$ and $$ P(Y=0|X)$$

$$ argmax_i  P(y_i|X)$$

How to find $P(Y=i|X)$?

**Prior** :

$$P(Y=1) = \frac { spam}{ total}$$

X = `get unlimited mobile data at 95% discounted price`

$P(X|Y = 1)$ : In how many spam emails , these words are occuring. Count the frequency of these words in spam email.

$P(X|Y = 0)$ : In how many non-spam emails , these words are occuring. Count the frequency of these words in non-spam email.

**Marginal likelihood is used for normalizing the data.**

# Naive Bayes Classification

**Prior Formula**

**Likelihood Formula**

# Play Golf Dataset

<table>
    <tr>
        <td>        
            <img src="imgs/golf.png" width=500 align=left>
        </td>
        <td>
            <img src="https://media.giphy.com/media/l2JdTikgn9Y6HPNmg/giphy.gif" width=300 align=right>
        </td>
    </tr>
</table>


### Test Data
`today = (Sunny, Hot, Normal, False)`

# Naive Bayes Code - Scratch

In [1]:
import pandas as pd
import numpy as np
np.set_printoptions(legacy='1.25')
import json
import pprint

In [114]:
golf = pd.read_csv("golf.csv")

In [115]:
golf.head(16)

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes
4,rainy,cool,normal,False,yes
5,rainy,cool,normal,True,no
6,overcast,cool,normal,True,yes
7,sunny,mild,high,False,no
8,sunny,cool,normal,False,yes
9,rainy,mild,normal,False,yes


In [116]:
golf.shape[0]

14

In [117]:

class_examples = (golf['Play'] == 'yes').sum()
print(class_examples)

9


In [118]:
class_examples = (golf['Play'] == 'no').sum()
print(class_examples)

5


In [119]:
def prior_prob(golf, label):
    # label Y = 0,1 (no, yes)
    total_examples = golf.shape[0]
    class_examples = (golf['Play'] == label).sum()
    
    return class_examples/total_examples

In [120]:
print("Total Prior Probability of Yes:- ",prior_prob(golf,'yes'))
print("Total Prior Probability of No :- ",prior_prob(golf,'no'))


Total Prior Probability of Yes:-  0.6428571428571429
Total Prior Probability of No :-  0.35714285714285715


In [121]:
# Dictionary to store prior probability
PRIOR = {
    'yes' : prior_prob(golf, 'yes'),
    'no' : prior_prob(golf, 'no')
}
print(PRIOR)

{'yes': 0.6428571428571429, 'no': 0.35714285714285715}


In [122]:
#filtered_data = golf[golf['Play'] == 'yes']
#print(filtered_data)
#filtered_data.shape[0]

In [123]:
#filtered_data = golf[golf['Play'] == 'no']
#print(filtered_data)
#filtered_data.shape[0]

In [124]:
## Calculate conditional probability : P(Y=1 | X) => {Π P(Xi|Y=1)}. P(Y=1)
## Example of feature : Outlook, Temperature, Humidity, Windy
## feature values: Outlook(sunny,overcats, rainy),Temperature(hot,mild,cool),Humidity(high,normal), Windy(False,True)

def cond_prob(golf, feature, feature_value, label):
    filtered_data = golf[golf['Play'] == label]
    numerator = np.sum(filtered_data[feature] == feature_value)
    denominator = filtered_data.shape[0]
   
    return numerator/denominator

In [125]:
filtered_data = golf[golf['Play'] == 'yes']
numerator = np.sum(filtered_data['Windy'] == False)
denominator = filtered_data.shape[0]
print(filtered_data)
print("numerator :-",numerator)
print("denominator :-", denominator)
numerator/denominator

     Outlook Temperature Humidity  Windy Play
2   overcast         hot     high  False  yes
3      rainy        mild     high  False  yes
4      rainy        cool   normal  False  yes
6   overcast        cool   normal   True  yes
8      sunny        cool   normal  False  yes
9      rainy        mild   normal  False  yes
10     sunny        mild   normal   True  yes
11  overcast        mild     high   True  yes
12  overcast         hot   normal  False  yes
numerator :- 6
denominator :- 9


0.6666666666666666

In [126]:
cond_prob(golf,'Outlook', 'sunny', 'yes' )

0.2222222222222222

In [127]:
cond_prob(golf,'Windy', False, 'yes' )

0.6666666666666666

In [128]:
list(golf.columns)

['Outlook', 'Temperature', 'Humidity', 'Windy', 'Play']

In [129]:
list(golf.columns)[:-1]
features = list(golf.columns)[:-1]
print(features)

['Outlook', 'Temperature', 'Humidity', 'Windy']


In [130]:
features = list(golf.columns)[:-1]
COND_PROB = {} # dictionary to store conditional probability cases.


for label in golf['Play'].unique():
    COND_PROB[label] = {}         # create dictionary for label(yes,no)
    for feature in features:
        COND_PROB[label][feature] = {}  # create dictionary for labels with features
        
        feature_values = golf[feature].unique()
        
        for fea_value in feature_values:
            # no, Outlook, sunny
            prob = round(cond_prob(golf, feature, fea_value, label), 2)
            COND_PROB[label][feature][fea_value] = prob
            print(label, feature, fea_value, prob)
    print()

no Outlook sunny 0.6
no Outlook overcast 0.0
no Outlook rainy 0.4
no Temperature hot 0.4
no Temperature mild 0.4
no Temperature cool 0.2
no Humidity high 0.8
no Humidity normal 0.2
no Windy False 0.4
no Windy True 0.6

yes Outlook sunny 0.22
yes Outlook overcast 0.44
yes Outlook rainy 0.33
yes Temperature hot 0.22
yes Temperature mild 0.44
yes Temperature cool 0.33
yes Humidity high 0.33
yes Humidity normal 0.67
yes Windy False 0.67
yes Windy True 0.33



In [131]:
COND_PROB

{'no': {'Outlook': {'sunny': 0.6, 'overcast': 0.0, 'rainy': 0.4},
  'Temperature': {'hot': 0.4, 'mild': 0.4, 'cool': 0.2},
  'Humidity': {'high': 0.8, 'normal': 0.2},
  'Windy': {False: 0.4, True: 0.6}},
 'yes': {'Outlook': {'sunny': 0.22, 'overcast': 0.44, 'rainy': 0.33},
  'Temperature': {'hot': 0.22, 'mild': 0.44, 'cool': 0.33},
  'Humidity': {'high': 0.33, 'normal': 0.67},
  'Windy': {False: 0.67, True: 0.33}}}

In [132]:
pprint.pprint(COND_PROB)

{'no': {'Humidity': {'high': 0.8, 'normal': 0.2},
        'Outlook': {'overcast': 0.0, 'rainy': 0.4, 'sunny': 0.6},
        'Temperature': {'cool': 0.2, 'hot': 0.4, 'mild': 0.4},
        'Windy': {False: 0.4, True: 0.6}},
 'yes': {'Humidity': {'high': 0.33, 'normal': 0.67},
         'Outlook': {'overcast': 0.44, 'rainy': 0.33, 'sunny': 0.22},
         'Temperature': {'cool': 0.33, 'hot': 0.22, 'mild': 0.44},
         'Windy': {False: 0.67, True: 0.33}}}


# Prediction

In [133]:
X_test = ["sunny", "hot", "normal", False]

In [134]:
features

['Outlook', 'Temperature', 'Humidity', 'Windy']

In [135]:
for label in golf['Play'].unique():
    
    prior = PRIOR[label]
    likelihood = 1.0
    
    
    for i in range(len(features)):
        feature = features[i]
        fea_value = X_test[i]
        
        likelihood *= COND_PROB[label][feature][fea_value]
    
    posterior = likelihood*prior
    
    print(label, posterior)

no 0.006857142857142858
yes 0.013967202857142858


In [136]:
0.006/ (0.006 + 0.013)

0.31578947368421056

In [137]:
0.01396/ (0.006 + 0.013)

0.7347368421052631

# Implementation - Naive Bayes Sklearn


In [138]:
golf = pd.read_csv('golf.csv')

In [139]:
golf

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes
4,rainy,cool,normal,False,yes
5,rainy,cool,normal,True,no
6,overcast,cool,normal,True,yes
7,sunny,mild,high,False,no
8,sunny,cool,normal,False,yes
9,rainy,mild,normal,False,yes


In [140]:
from sklearn.preprocessing import LabelEncoder

In [141]:
le1 = LabelEncoder()
golf['Outlook'] = le1.fit_transform(golf['Outlook'])

In [142]:
golf

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play
0,2,hot,high,False,no
1,2,hot,high,True,no
2,0,hot,high,False,yes
3,1,mild,high,False,yes
4,1,cool,normal,False,yes
5,1,cool,normal,True,no
6,0,cool,normal,True,yes
7,2,mild,high,False,no
8,2,cool,normal,False,yes
9,1,mild,normal,False,yes


In [143]:
le2 = LabelEncoder()
golf['Temperature'] = le2.fit_transform(golf['Temperature'])

In [144]:
le3 = LabelEncoder()
golf['Humidity'] = le3.fit_transform(golf['Humidity'])

In [145]:
le4 = LabelEncoder()
golf['Windy'] = le4.fit_transform(golf['Windy'])

In [146]:
le5 = LabelEncoder()
golf['Play'] = le5.fit_transform(golf['Play'])

In [147]:
golf

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play
0,2,1,0,0,0
1,2,1,0,1,0
2,0,1,0,0,1
3,1,2,0,0,1
4,1,0,1,0,1
5,1,0,1,1,0
6,0,0,1,1,1
7,2,2,0,0,0
8,2,0,1,0,1
9,1,2,1,0,1


In [148]:
X = golf.iloc[:, :-1]
y = golf.iloc[:, -1]



In [149]:
from sklearn.naive_bayes import CategoricalNB

In [150]:
model = CategoricalNB()

In [151]:
model.fit(X, y)

0,1,2
,alpha,1.0
,force_alpha,True
,fit_prior,True
,class_prior,
,min_categories,


In [152]:
X_test = ["sunny", "hot", "normal", False]

In [153]:
le1.transform(['sunny'])

array([2])

In [154]:
le2.transform(['hot'])

array([1])

In [155]:
le3.transform(['normal'])

array([1])

In [156]:
le4.transform([False])

array([0])

In [157]:
X_test = np.array([[2,1,1,0]])

In [158]:
model.predict(X_test)



array([1])

In [159]:
model.predict_proba(X_test)



array([[0.33508723, 0.66491277]])

# Naive Bayes Classifier for Text Data

<img src="https://media.giphy.com/media/dQpUkK59l5Imxsh8jN/giphy.gif" width=300 > 

#### Multinomial Naive Bayes
- Important is to compute the likelihood 

$$P(x_i|Y_i = c) = \frac {count(x_i, Y_i = c)} {\sum_{w \in V}{count(w, Y_i=c)}} $$


# Laplace Smoothing

<img src="https://media.giphy.com/media/SqmkZ5IdwzTP2/giphy.gif" width=300 align=left>

$$P(x_i|Y_i = c) = \frac {count(x_i, Y_i = c) + \alpha} {\sum_{w \in V}{count(w, Y_i=c)} + \alpha |V|} $$

# A Practical Example of Multinomial Naive Bayes

<table align=left>
    <tr style="font-weight:bold;">
        <td></td>
        <td>docID</td>
        <td>words in document</td>
        <td>c = China?</td>
    </tr> 
    <tr>
        <td  rowspan=4> training set </td>
        <td> 1 </td>
        <td> Chinese Beijing Chinese </td>
        <td> yes </td>
    </tr>
    <tr>
        <td> 2 </td>
        <td> Chinese Chinese Shanghai </td>
        <td> yes </td>
    </tr>
    <tr>
        <td> 3 </td>
        <td> Chinese Macao </td>
        <td> yes </td>
    </tr>
    <tr>
        <td> 4 </td>
        <td> Tokyo Japan Chinese </td>
        <td> no </td>
    </tr>
    <tr>
        <td> test set </td>
        <td> 5 </td>
        <td> Chinese Chinese Chinese Tokyo Japan </td>
        <td> ? </td>
    </tr>
</table>

# Bernoulli Naive Bayes

<img src="https://media.giphy.com/media/xTiN0h0Kh5gH7yQYUw/giphy.gif" width=300 align=right>
<br><br>

1. Bernoulli doesn't talk about the frequency of a feature/word.
1. It is only concerned about whether a word is present or not (1 or 0).

**Likelihood :**

$$P(x_i|Y_i = c) = \frac {count(d_i \; contains \; x_i, Y_i = c) + \alpha} {{count(Y_i=c)} + \alpha \, |V|} $$

**Prediction:**

$$P(Y=1|X) = \prod_{i=1}^{|V|} { P(x_i|Y=1)^b . \big(1 - P(x_i|Y=spam)\big)^{1-b}} * P(Y=1)$$

## Example of Bernoulli Naive Bayes

<table align=left>
    <tr style="font-weight:bold;">
        <td></td>
        <td>docID</td>
        <td>words in document</td>
        <td>c = China?</td>
    </tr> 
    <tr>
        <td  rowspan=4> training set </td>
        <td> 1 </td>
        <td> Chinese Beijing Chinese </td>
        <td> yes </td>
    </tr>
    <tr>
        <td> 2 </td>
        <td> Chinese Chinese Shanghai </td>
        <td> yes </td>
    </tr>
    <tr>
        <td> 3 </td>
        <td> Chinese Macao </td>
        <td> yes </td>
    </tr>
    <tr>
        <td> 4 </td>
        <td> Tokyo Japan Chinese </td>
        <td> no </td>
    </tr>
    <tr>
        <td> test set </td>
        <td> 5 </td>
        <td> Chinese Chinese Chinese Tokyo Japan </td>
        <td> ? </td>
    </tr>
</table>

# Bias Variance Tradeoff   <img src="imgs/alpha.png" width=100 >

# Gaussian Naive Bayes
<img src="imgs/gaussian.png" width=500>

# Scikit Learn code for Naive Bayes