It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

Bayes theorem provides a way of calculating posterior probability P(c|x) from priori P(c), P(x) and likelihood P(x|c). Look at the equation below:


$P(c|x) = \frac{P(x|c) P(c)}{P(x)}$

Above,

P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability of predictor given class.
P(x) is the prior probability of predictor.


https://en.wikipedia.org/wiki/Naive_Bayes_classifier

given class variable $y$ and dependent feature vector $x_{1}$ through $x_{n}$


$P(y|x_{1},...,x_{n}) = \frac{P(x_{1},...,x_{n}|y)P(y)}{P(x_{1},...,x_{n})}$

naive conditional independence assumption

$ P(x_{i}|y,x_{i+1},...x_{n}) = P(x_{i}|y)$
,$\forall i$

$ \Rightarrow P(y|x_{1},...,x_{n}) = \frac{P(y)\prod_{i=1}^{n}P(x_{i}|y)}{P(x_{1},...,x_{n})}$

since $P(x_{1},...,x_{n})$ is constant given the input

$P(x_{1},...,x_{n}) \propto P(y)\prod_{i=1}^{n}P(x_{i}|y)$

$\hat{y}=\underset{y}{argmax}P(y)\prod_{i=1}^{n}P(x_{i}|y)$

The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of $P(x_{i}|y)$

#### Gaussian Naive Bayes

The likelihood of the features is assumed to be Gaussian

$P(x_{i}|y) = \frac{1}{\sqrt{2\pi \sigma_{y}^{2}}} e^{-\frac{1}{2}(\frac{(x_{i}-\mu_{y} )}{\sigma_{y} })^2}$

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('Downloads/weather_nominal.csv')

In [3]:
df.head()

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play golf
0,Rainy,Hot,High,False,No
1,Rainy,Hot,High,True,No
2,Overcast,Hot,High,False,Yes
3,Sunny,Mild,High,False,Yes
4,Sunny,Cool,Normal,False,Yes


create a dataframe with only outlook and playgolf

In [17]:
outlook = df[['Outlook','Play golf']]

In [18]:
outlook

Unnamed: 0,Outlook,Play golf
0,Rainy,No
1,Rainy,No
2,Overcast,Yes
3,Sunny,Yes
4,Sunny,Yes
5,Sunny,No
6,Overcast,Yes
7,Rainy,No
8,Rainy,Yes
9,Sunny,Yes


Pandas dataframe.groupby() function is used to split the data into groups based on Outlook 

In [28]:
outlook.groupby('Outlook').sum()

Unnamed: 0_level_0,Play golf
Outlook,Unnamed: 1_level_1
Overcast,YesYesYesYes
Rainy,NoNoNoYesYes
Sunny,YesYesNoYesNo


In [30]:
df.groupby('Outlook')[['Play golf']].sum()

Unnamed: 0_level_0,Play golf
Outlook,Unnamed: 1_level_1
Overcast,YesYesYesYes
Rainy,NoNoNoYesYes
Sunny,YesYesNoYesNo


In [33]:
from sklearn.preprocessing import LabelEncoder 

In [34]:
target = LabelEncoder()

Encode Play golf column into numbers

In [35]:
df['target'] = target.fit_transform(df['Play golf'])

In [36]:
df

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play golf,target
0,Rainy,Hot,High,False,No,0
1,Rainy,Hot,High,True,No,0
2,Overcast,Hot,High,False,Yes,1
3,Sunny,Mild,High,False,Yes,1
4,Sunny,Cool,Normal,False,Yes,1
5,Sunny,Cool,Normal,True,No,0
6,Overcast,Cool,Normal,True,Yes,1
7,Rainy,Mild,High,False,No,0
8,Rainy,Cool,Normal,False,Yes,1
9,Sunny,Mild,Normal,False,Yes,1


Pandas dataframe.groupby() function is used to split the data into groups based on Outlook and considering target column and taking sum over target column.

Sum gives number of plat golf == Yes

In [57]:
ply = df.groupby('Outlook')[['target']].sum()

In [58]:
ply

Unnamed: 0_level_0,target
Outlook,Unnamed: 1_level_1
Overcast,4
Rainy,2
Sunny,3


to find number of play golf == No, we need to count

In [53]:
nt_ply = df[df['target'] == 0].groupby('Outlook')[['target']].count()

In [54]:
nt_ply

Unnamed: 0_level_0,target
Outlook,Unnamed: 1_level_1
Rainy,3
Sunny,2


combinig both dataframes, considering rows which doesn't has entries in few rows(outer)

'on' specifies merging process with respect to that column


since both has same column name, need to specify suffix

In [76]:
seg_outlook = pd.merge(ply, nt_ply,how = "outer", on = "Outlook", suffixes=['_Yes', '_No'])

In [77]:
seg_outlook

Unnamed: 0_level_0,target_Yes,target_No
Outlook,Unnamed: 1_level_1,Unnamed: 2_level_1
Overcast,4,
Rainy,2,3.0
Sunny,3,2.0


### Problem: Players will play if weather is sunny ?

We can solve it using above discussed method of posterior probability.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

Here we have P (Sunny |Yes) = 3/9 = 0.33, 
                    
             P(Sunny) = 5/14 = 0.36, 
                    
             P( Yes)= 9/14 = 0.64

Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.

Naive Bayes uses a similar method to predict the probability of different class based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes.



In [88]:
from sklearn.datasets import load_iris

""" 
    
    Loading iris dataset available online through sklearn datasets library, and create an object iris to that dataset
    
"""

In [89]:
iris = load_iris()

"""

    Point data as X and Y as target(different specie type)

"""

In [90]:
X = iris.data
y = iris.target

"""
    
    Split data into test and train, 80% as Train and 20% as Test
    
"""

In [91]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

"""

    We are implementing Gaussian Naive Bayes Algorithm, as out data(sepal length, width and petal length, width are continuous)
    
"""

In [92]:
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train)

GaussianNB(priors=None, var_smoothing=1e-09)

In [94]:
y_pred = gnb.predict(X_test)

In [95]:
y_pred

array([2, 2, 0, 2, 2, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 0, 2, 2, 2, 1, 0,
       0, 2, 1, 1, 2, 1, 1, 1])

In [96]:
y_test

array([2, 1, 0, 2, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 2, 0, 2, 2, 2, 1, 0,
       0, 2, 1, 1, 2, 1, 1, 1])

In [97]:
gnb.score(X_test,y_test)

0.9