# Naive Bayes Classifier

* Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for solving classification problems
* It is mainly used in text classification that includes a high-dimensional training dataset.
* It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
* Types of Naive Bayes Classifier:-
  1) Gaussian Naive Bayes - Best for continous data
  2) Multinomial Naive Bayes - Best for discrete data
  4) Bernoulli Naive Bayes - Best for binary data (0/1 or True/False)

### Gaussian Naive Bayes

In [129]:
import pandas as pd 

df = pd.read_csv("ML Practice Files/Naive Bayes Classifier/titanic.csv")
df.drop(['PassengerId','Name','SibSp','Parch','Ticket','Cabin','Embarked'],axis="columns",inplace=True)

dummies = pd.get_dummies(df.Sex)

merged = pd.concat([df,dummies],axis="columns")
merged.drop(['Sex'],axis="columns",inplace=True)
merged.Age = merged.Age.fillna(merged.Age.mean())

merged.head()

Unnamed: 0,Survived,Pclass,Age,Fare,female,male
0,0,3,22.0,7.25,False,True
1,1,1,38.0,71.2833,True,False
2,1,3,26.0,7.925,True,False
3,1,1,35.0,53.1,True,False
4,0,3,35.0,8.05,False,True


In [130]:
from sklearn.model_selection import train_test_split

x = merged.drop(['Survived'],axis="columns")
y = merged.Survived

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.02)

In [131]:
from sklearn.naive_bayes import GaussianNB

gaussian_model = GaussianNB()
gaussian_model.fit(x_train,y_train)
gaussian_model.score(x_test,y_test)

0.8888888888888888

In [132]:
y_test[:10]

125    1
511    0
349    0
57     0
620    0
328    1
687    0
573    1
588    0
98     1
Name: Survived, dtype: int64

In [133]:
gaussian_model.predict(x_test[:10])

array([0, 0, 0, 0, 0, 1, 0, 1, 0, 1])

### Multinomial Naive Bayes (Spam Detection)

In [134]:
import pandas as pd 

df1 = pd.read_csv("ML Practice Files/Naive Bayes Classifier/spam.csv")
df1.head()

Unnamed: 0,Category,Message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


In [135]:
df1.groupby('Category').describe()

Unnamed: 0_level_0,Message,Message,Message,Message
Unnamed: 0_level_1,count,unique,top,freq
Category,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
ham,4825,4516,"Sorry, I'll call later",30
spam,747,641,Please call our customer service representativ...,4


In [136]:
df1['spam']=df1['Category'].apply(lambda x: 1 if x=='spam' else 0)
df1.head()

Unnamed: 0,Category,Message,spam
0,ham,"Go until jurong point, crazy.. Available only ...",0
1,ham,Ok lar... Joking wif u oni...,0
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,1
3,ham,U dun say so early hor... U c already then say...,0
4,ham,"Nah I don't think he goes to usf, he lives aro...",0


In [137]:
from sklearn.model_selection import train_test_split
x_train1, x_test1, y_train1, y_test1 = train_test_split(df1.Message,df1.spam)

In [138]:
from sklearn.feature_extraction.text import CountVectorizer
v = CountVectorizer()
x_train_count = v.fit_transform(x_train1.values)
x_train_count.toarray()[:2]

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [139]:
from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB()
model.fit(x_train_count,y_train1)

In [140]:
emails = [
    'Hey mohan, can we get together to watch footbal game tomorrow?',
    'Upto 20% discount on parking, exclusive offer just for you. Dont miss this reward!'
]
emails_count = v.transform(emails)
model.predict(emails_count)

array([0, 1])

In [141]:
x_test_count = v.transform(x_test1)
model.score(x_test_count, y_test1)

0.9842067480258435

#### Sklearn pipeline

In [142]:
from sklearn.pipeline import Pipeline

clf = Pipeline([
    ("vectorizer",CountVectorizer()),
    ("model",MultinomialNB())
])

clf.fit(x_train1,y_train1)
clf.score(x_test1,y_test1)

0.9842067480258435

In [143]:
clf.predict(emails)

array([0, 1])

<h2 style="color:green">Exercise</h2>

**Use wine dataset from sklearn.datasets to classify wines into 3 categories. Load the dataset and split it into test and train. After that train the model using Gaussian and Multinominal classifier and post which model performs better. Use the trained model to perform some predictions on test data.**

In [144]:
import pandas as pd
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_wine

wine = load_wine()
print("Directory of wine dataset = ",dir(wine))

wine_df = pd.DataFrame(wine.data,columns=wine.feature_names)
wine_df['target'] = wine.target
print(wine_df.head())

input_train, input_test, target_train, target_test = train_test_split(wine.data,wine.target, test_size=0.2)

model1 = GaussianNB()
model1.fit(input_train,target_train)
print("Score of Gaussian Naive Bayes Classifier = ",model1.score(input_test,target_test))

model2 = MultinomialNB()
model2.fit(input_train,target_train)
print("Score of Multinomial Naive Bayes Classifier = ",model2.score(input_test,target_test))


Directory of wine dataset =  ['DESCR', 'data', 'feature_names', 'frame', 'target', 'target_names']
   alcohol  malic_acid   ash  alcalinity_of_ash  ...   hue  od280/od315_of_diluted_wines  proline  target
0    14.23        1.71  2.43               15.6  ...  1.04                          3.92   1065.0       0
1    13.20        1.78  2.14               11.2  ...  1.05                          3.40   1050.0       0
2    13.16        2.36  2.67               18.6  ...  1.03                          3.17   1185.0       0
3    14.37        1.95  2.50               16.8  ...  0.86                          3.45   1480.0       0
4    13.24        2.59  2.87               21.0  ...  1.04                          2.93    735.0       0

[5 rows x 14 columns]
Score of Gaussian Naive Bayes Classifier =  0.9722222222222222
Score of Multinomial Naive Bayes Classifier =  0.8611111111111112
