# Bayesian models with Naive Bayes

Naïve Bayes classification is a machine learning method that you can use to predict the likelihood that an event will occur given evidence that's present in your data.

## Three types of Naïve Bayes model

- Multinomial
- Bernoulli
- Gaussian

The **Multinomial** Naïve Bayes is good for when your features are categorical or continuous and describe discrete frequency counts.  
The **Bernoulli** Naïve Bayes model is good for making predictions from binary features.  
The **Gaussian** Naïve Bayes model is good for making predictions from normally distributed features.


## Use cases for Naïve Bayes includes:

- spam detection
- customer classification
- credit risk prediction 
- health risk prediction.


## Assumptions of the Naïve Bayes model

- Predictors are independent of each other
- An a-priori assumption: this assumption is that past conditions still hold true, when we make predictions from historical values, we will get incorrect results if present circumstances have changed.


In [4]:
import numpy as np 
import pandas as pd 
import urllib
import sklearn 

from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import accuracy_score

In [2]:
from sklearn.naive_bayes import BernoulliNB, GaussianNB, MultinomialNB

In [8]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data'
raw_data = urllib.request.urlopen(url)
dataset = np.loadtxt(raw_data, delimiter=',')
print(dataset[0])

[  0.      0.64    0.64    0.      0.32    0.      0.      0.      0.
   0.      0.      0.64    0.      0.      0.      0.32    0.      1.29
   1.93    0.      0.96    0.      0.      0.      0.      0.      0.
   0.      0.      0.      0.      0.      0.      0.      0.      0.
   0.      0.      0.      0.      0.      0.      0.      0.      0.
   0.      0.      0.      0.      0.      0.      0.778   0.      0.
   3.756  61.    278.      1.   ]


In [11]:
x = dataset[:,0:48]

# isolate target variable
y = dataset[:,-1]

In [12]:
# break data in train and test sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.2, random_state=17)

### Bernoulli

In [13]:
BernNB = BernoulliNB(binarize=True)
BernNB.fit(x_train, y_train)

BernoulliNB(binarize=True)

In [14]:
y_pred = BernNB.predict(x_test)

accuracy_score(y_test, y_pred)

0.8577633007600435

### Multinomial

In [16]:
MultiNB = MultinomialNB()
MultiNB.fit(x_train, y_train)

MultinomialNB()

In [17]:
y_pred = MultiNB.predict(x_test)

accuracy_score(y_test, y_pred)

0.8816503800217155

### Gaussian

In [19]:
GausNB = GaussianNB()
GausNB.fit(x_train, y_train)

GaussianNB()

In [20]:
y_pred = GausNB.predict(x_test)

accuracy_score(y_test, y_pred)

0.8197611292073833

### Improve scores

In [22]:
BernNBi = BernoulliNB(binarize=.1)
BernNBi.fit(x_train, y_train)

BernoulliNB(binarize=0.1)

In [23]:
y_pred = BernNBi.predict(x_test)

accuracy_score(y_test, y_pred)

0.9109663409337676