### Naïve Bayes Classifier 

- based on Bayes theorem and used for solving classification problems.
- mainly used in text classification that includes a high-dimensional training dataset.
- It is a probabilistic classifier, which means it predicts on the basis of the probability of an observation.

In [2]:
# define probability

Probability of an event is defined as the ratio of the number of occurrence of an event to the total number of trails

In [3]:
# Joint Probability

The chance or likelihood that two events will occur at the same time, although they remain independent on each other. This also implies that the outcome of one event cannot influence the outcome of others.
    
lets we r working on an email classification problem, we want to classifiy spam and not spam based on whether it contains the word "lottery".

Now,all the messages with the word ‘lottery’ are not spam nor all the messages which are spam contain the word ‘lottery’

Now if we have to calculate the joint probability of both that is, probability of the message being spam and the word ‘lottery’ occurring in a message, we can do the following.

p(spam ∩ lottery) = p(spam) * p(lottery) = 0.1 * 0.05 = 0.005

We can say that 5 percent are spam having the word ‘lottery’

In [1]:
# Conditional Probability

The probability of occurrence of an event A1, given that A2 has already occurred.

    It is denoted by P(A1∣A2) 
    P(A1∣A2)=P(A2)P(A1∩A2),Provided P(B) ≠ 0.
    
- Here, A and B are events and P(B) ≠ 0.
- Event B is also termed as evidence.

In [1]:
# Why is it called Naïve Bayes?

- comprised of two words Naïve and Bayes, 
- Naïve because it makes a naive assumptions that the occurrence of a certain feature is independent of the occurrence of other features.
- Called Bayes because it depends on the principle of THomas Bayes' Theorem
- it also assumes that the inputs have an equal effect on the outcomes or responses in the data.


Such as if the fruit is identified on the basis of color, shape, and taste, Then red, spherical and sweet fruit is recognized as an apple. Hence each feature individually contributes to identify that it is an apple without depending on each other.

In [1]:
# Define bayes rule or Theorem

Bayes' theorem/Rule/law, 
- used to determine the probability of a hypothesis with prior knowledge. 
- depends on the conditional probability.

    P(A|B)=(P(B|A)/P(A)) / P(B)
    
    See formula below

<img src='bays theorem.jpg'>

Where,

    P(A|B) is Posterior probability: 
    Probability of hypothesis A given event B has occured .

    P(B|A) is Likelihood probability: 
    Probability of B given A has occured.
    
P(A) is Prior Probability: Probability of hypothesis before B occured.

P(B) is Marginal Probability: Probability of Evidence.

### Working of Naïve Bayes' Classifier:

In [3]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

sns.get_dataset_names()

In [18]:
df=pd.read_csv('weather2.csv')
df

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play Golf
0,Rainy,Hot,High,False,No
1,Rainy,Hot,High,True,No
2,Overcast,Hot,High,False,Yes
3,Sunny,Mild,High,False,Yes
4,Sunny,Cool,Normal,False,Yes
5,Sunny,Cool,Normal,True,No
6,Overcast,Cool,Normal,True,Yes
7,Rainy,Mild,High,False,No
8,Rainy,Cool,Normal,False,Yes
9,Sunny,Mild,Normal,False,Yes


The fundamental Naive Bayes assumption is that each feature makes an:

    - independent
    - and equal contribution to the outcome, i.e.  each feature is given the same weight(or importance)

Now consider the bay's formula:

<img src='bays theorem.jpg'>

    where A and B are events and P(B) ≠ 0.

    Basically, we are trying to find probability of event A, given the event B is true. 
    Event B is also termed as evidence.
    P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is seen). 
    
    P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.

<img src='nb.jpeg'>

    All calculation is done on the basis of these equation

In [23]:
# lets proceed further with the dataset we have uploaded

In [34]:
df=pd.read_csv('weather2.csv')
df.head(2)

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play Golf
0,Rainy,Hot,High,False,No
1,Rainy,Hot,High,True,No


In [35]:
df.dtypes

Outlook        object
Temperature    object
Humidity       object
Windy            bool
Play Golf      object
dtype: object

In [36]:
x=df.drop('Play Golf',axis=1)
y=df['Play Golf']

In [37]:
# All the features need to be converted into numerical variable before fitting into any Naive Bays Algorithm

In [38]:
from sklearn.preprocessing import StandardScaler, LabelEncoder
ss=StandardScaler()
le=LabelEncoder()

In [40]:
# before getting dummy variables all columns should be converted to object types

df['Windy']=df['Windy'].astype('object')

In [46]:
# df.dtypes
# df.head(2)

In [47]:
x=pd.get_dummies(x,drop_first=True)

In [48]:
x.head(2)

Unnamed: 0,Windy,Outlook_Rainy,Outlook_Sunny,Temperature_Hot,Temperature_Mild,Humidity_Normal
0,False,1,0,1,0,0
1,True,1,0,1,0,0


In [49]:
# since Windy not converted to dummy variable, label encode it
x['Windy']=le.fit_transform(x['Windy'])

In [50]:
x.head(2)

Unnamed: 0,Windy,Outlook_Rainy,Outlook_Sunny,Temperature_Hot,Temperature_Mild,Humidity_Normal
0,0,1,0,1,0,0
1,1,1,0,1,0,0


In [52]:
# scale windy,

x['Windy']=ss.fit_transform(x[['Windy']])

In [53]:
x.head(2)

Unnamed: 0,Windy,Outlook_Rainy,Outlook_Sunny,Temperature_Hot,Temperature_Mild,Humidity_Normal
0,-0.866025,1,0,1,0,0
1,1.154701,1,0,1,0,0


In [54]:
# model fitting
from sklearn.naive_bayes import BernoulliNB
nb=BernoulliNB()
nb.fit(x,y)

BernoulliNB()

In [55]:
ypred=nb.predict(x)

In [56]:
ypred

array(['No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes',
       'Yes', 'Yes', 'Yes', 'Yes'], dtype='<U3')

> So we can see that we don't need to label encode output variable while using bernoulliNB algorithm

In [19]:
### Advantages of Naïve Bayes Classifier:

- fast and easy
- used for Binary as well as Multi-class Classifications
- most popular choice for text classification problems, super freindly with text classfication
- It performs well even when the data does not follow the assumptions it holds.

In [20]:
### Disadvantages of Naïve Bayes Classifier:

- Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship between features.
- when the features have a strong relationship with each other, then definitely Naïve Bayes is a bad choice

### Types of Naïve Bayes Model:

There are three types of Naive Bayes Model

    1.Gaussian
    2.Multinomial
    3.Bernoulli

In [5]:
# Gaussian Naive bayes algorithm

The Gaussian model assumes that features follow a normal distribution.

This means if predictors take continuous values instead of discrete, then the model assumes that these values are sampled from the Gaussian distribution.

More clearly:
- GaussianNB is used when the features are continuous in nature 
- The features present in the data seems to follow a Gaussian distribution

In [21]:
# Multinomial Naive bayes algorithm

Multinomial:
- used when the data is multinomially distributed.
- primarily used for document classification problems, The attributes required for this classification are basically the frequency of the words that are converted from the text document.

In [22]:
# Bernoulli Naive bayes algorithm

Bernoulli:
- this algorithm is useful in data having binary features. The features can be of value Yes or Not, Granted or Not Granted, useful or useless, etc.

### Python Implementation of the Naïve Bayes

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [2]:
from sklearn.datasets import load_iris
iris = load_iris()

In [3]:
X = iris.data
y = iris.target

In [4]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)

In [5]:
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train)

GaussianNB()

In [6]:
y_pred = gnb.predict(X_test)

In [7]:
from sklearn import metrics
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test, y_pred)*100)

Gaussian Naive Bayes model accuracy(in %): 95.0


### Email spam detection

In [4]:
import pandas as pd

In [7]:
df=pd.read_excel('spam.xlsx')

In [8]:
df.head()

Unnamed: 0,Category,Message,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,


In [9]:
df.columns

Index(['Category', 'Message ', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4'], dtype='object')

In [11]:
df=df[['Category','Message ']]

In [12]:
df.columns

Index(['Category', 'Message '], dtype='object')

In [14]:
df.columns=['category','message']

In [15]:
df.head(2)

Unnamed: 0,category,message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...


In [16]:
df.groupby('category').describe()

Unnamed: 0_level_0,message,message,message,message
Unnamed: 0_level_1,count,unique,top,freq
category,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
ham,4825,4516,"Sorry, I'll call later",30
spam,747,653,Please call our customer service representativ...,4


In [17]:
df.shape

(5572, 2)

In [18]:
df['spam']=df['category'].apply(lambda x: 1 if x=='spam' else 0)
df.head()

Unnamed: 0,category,message,spam
0,ham,"Go until jurong point, crazy.. Available only ...",0
1,ham,Ok lar... Joking wif u oni...,0
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,1
3,ham,U dun say so early hor... U c already then say...,0
4,ham,"Nah I don't think he goes to usf, he lives aro...",0


In [20]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df.message,df.spam)

In [21]:
from sklearn.feature_extraction.text import CountVectorizer
v=CountVectorizer()

In [23]:
X_train[0:5]

786     It does it on its own. Most of the time it fix...
3095    We walked from my moms. Right on stagwood pass...
2093    Final Chance! Claim ur å£150 worth of discount...
3877         What you need. You have a person to give na.
2554                      I'll reach in ard 20 mins ok...
Name: message, dtype: object

In [26]:
type(X_train)

pandas.core.series.Series

In [28]:
X_train.values[0:5]

array(['It does it on its own. Most of the time it fixes my spelling. But sometimes it gets a completely diff word. Go figure',
       'We walked from my moms. Right on stagwood pass right on winterstone left on victors hill. Address is &lt;#&gt;',
       'Final Chance! Claim ur å£150 worth of discount vouchers today! Text YES to 85023 now! SavaMob, member offers mobile! T Cs SavaMob POBOX84, M263UZ. å£3.00 Subs 16',
       'What you need. You have a person to give na.',
       "I'll reach in ard 20 mins ok..."], dtype=object)

    X_train_count = v.fit_transform(X_train.values)
    X_train_count.toarray()[:2]

    from sklearn.naive_bayes import MultinomialNB
    model = MultinomialNB()
    model.fit(X_train_count,y_train)

    emails = [
        'Hey mohan, can we get together to watch footbal game tomorrow?',
        'Upto 20% discount on parking, exclusive offer just for you. Dont miss this reward!'
    ]
    emails_count = v.transform(emails)
    model.predict(emails_count)

    X_test_count = v.transform(X_test)
    model.score(X_test_count, y_test)

    from sklearn.pipeline import Pipeline
    clf = Pipeline([
        ('vectorizer', CountVectorizer()),
        ('nb', MultinomialNB())
    ])

    clf.fit(X_train, y_train)

    clf.score(X_test,y_test)

    clf.predict(emails)