# Naive Bayes Classifier - Email Classification
---
Naive Bayes Classifier is a supervised learning classifier that performs classification by using conditional probability based on given data

The word `Naive` in `Naive Bayes Classifier` is there because it assumes that the presence of a particular feature is unrelated to the presence of any other feature

The word `Bayes` in `Naive Bayes Classifier` is there because it is based on `Bayes Theorem`, which uses conditinal probability

`Naive Bayes` has several use cases
- Spam Filtering
- News Classification
- Emotion Detection
- Face Detection
- Fact Checking

> ## Import the required libraries

In [2]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, plot_confusion_matrix

> ## Import the Dataset
The `SMS Spam` dataset has been downloaded from the `Kaggle Datasets`

In [5]:
data = pd.read_csv('./sms_spam.csv', encoding='latin-1')

In [6]:
data.head()

Unnamed: 0,type,text
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


In [7]:
data.isnull().sum()

type    0
text    0
dtype: int64

> ## Change feature names

In [8]:
data.rename(columns = {'type': 'label', 'text': 'text'}, inplace = True)
data.head()

Unnamed: 0,label,text
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


> ## Extract Independent and Dependent features

In [12]:
X, y = data.text, data.label

In [13]:
X.head

<bound method NDFrame.head of 0       Go until jurong point, crazy.. Available only ...
1                           Ok lar... Joking wif u oni...
2       Free entry in 2 a wkly comp to win FA Cup fina...
3       U dun say so early hor... U c already then say...
4       Nah I don't think he goes to usf, he lives aro...
                              ...                        
5569    This is the 2nd time we have tried 2 contact u...
5570                Will Ã¼ b going to esplanade fr home?
5571    Pity, * was in mood for that. So...any other s...
5572    The guy did some bitching but I acted like i'd...
5573                           Rofl. Its true to its name
Name: text, Length: 5574, dtype: object>

In [14]:
y.head()

0     ham
1     ham
2    spam
3     ham
4     ham
Name: label, dtype: object

> ## Split data into Train and Test datasets

In [15]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=0)

> ## Vectorize the input data

In [16]:
vectorizer = CountVectorizer()
count = vectorizer.fit_transform(X_train.values)

In [17]:
count

<3901x7242 sparse matrix of type '<class 'numpy.int64'>'
	with 51857 stored elements in Compressed Sparse Row format>

> ## Fit the Classifier

In [18]:
clf = MultinomialNB()
targets = y_train.values
clf.fit(count, targets)

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

> ## Model Predictions

In [19]:
y_pred = clf.predict(vectorizer.transform(X_test))

> ## Evaluate the metrics

In [20]:
confusion_matrix(y_test, y_pred)

array([[1425,    6],
       [  23,  219]])

In [25]:
accuracy_score(y_test, y_pred)

0.9826658696951583