### Naive Bayes Classifiers

It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below:

### Import Libraries

In [10]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Import Dataset

In [14]:
df = pd.read_csv(r'emails.csv')

In [15]:
del df['Unnamed: 2']
del df['Unnamed: 3']
df.head()

Unnamed: 0,text,spam
0,Subject: naturally irresistible your corporate...,1
1,Subject: the stock trading gunslinger fanny i...,1
2,Subject: unbelievable new homes made easy im ...,1
3,Subject: 4 color printing special request add...,1
4,"Subject: do not have money , get software cds ...",1


In [16]:
df.isnull().sum()

text    0
spam    0
dtype: int64

### Separate Dependent & Independent Value

In [17]:
x = df.text.values
y = df.spam.values

### Split Train and Test Dataset

In [18]:
from sklearn.model_selection import train_test_split

In [19]:
xtrain,xtest,ytrain,ytest = train_test_split(x,y,test_size=0.3)

### Data Preprocessing

In [20]:
from sklearn.feature_extraction.text import CountVectorizer

In [21]:
cv = CountVectorizer()

In [22]:
x_train = cv.fit_transform(xtrain)

In [23]:
x_train.toarray()

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [3, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

### Apply Naive Bayes Classifiers Algorithm

In [24]:
from sklearn.naive_bayes import MultinomialNB

In [25]:
model = MultinomialNB()

In [26]:
model.fit(x_train,ytrain)

In [27]:
x_test = cv.fit_transform(xtest)

In [28]:
x_test.toarray()

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [29]:
model.score(x_train,ytrain)

0.991519082065353