# Hands-On Exercise 6.3:
# Working with Naïve Bayes in Python
***

## Objectives

#### In this exercise, you will work with a naive bayes model on unstructured data in Python. This exercise allows you to predict a target variable from a number of predictor variables. The goal is to show you how a naive bayes model can be used to predict unknown values from a model trained on an existing data set.

### Overview

You will work on a data set called sms_spam that you will import from a csv file. You will:<br>
● Preprocess the unstructured data into a format suitable for naive bayes<br>
● Examine the predictor variables<br>
● Train a naive bayes model that can be used to make future predictions<br><br>

1. ❏ Import the **CountVectorizer** library from **sklearn.feature_extraction.text**

In [15]:
from sklearn.feature_extraction.text import CountVectorizer

2. ❏ Import the **MultinomialNB** library from **sklearn.naive_bayes**

In [16]:
from sklearn.naive_bayes import MultinomialNB

3. ❏ Import the **pandas** library and use the **read_csv()** function to import the **sms_spam.csv** dataset

In [17]:
import pandas as pd
sms = pd.read_csv('sms_spam.csv')

4. ❏ Count the unique values in the target variable **type**

In [18]:
sms['type'].value_counts()

ham     4827
spam     747
Name: type, dtype: int64

5. ❏ Examine the proportions in the target variable

In [19]:
100 * sms['type'].value_counts() / len(sms['type'])

ham     86.598493
spam    13.401507
Name: type, dtype: float64

6. ❏ Separate the **text** variable into a dataframe called **Pred**, and the **type** variable into a dataframe called **target**

In [20]:
Pred = sms['text']
target = sms['type']

7. ❏ Split the dataset into training and test datasets using the **train_test_split()** function<br><br>
*Hint: train_test_split will need to be imported from the sklearn.model_selection library*

In [21]:
from sklearn.model_selection import train_test_split
Pred_train, Pred_test, target_train, target_test = train_test_split(Pred, target, test_size = 0.1, random_state = 0)

8. ❏ Use the **CountVectorizer()** and **fit_transform()** functions to produce a Term Document Matrix

In [22]:
vectorizer = CountVectorizer()
counts = vectorizer.fit_transform(Pred_train.values)

9. ❏ Assign the target values from the training dataset into a variable called **targets**

In [23]:
targets = target_train.values

10. ❏ Train the Naive Bayes model with the **MultinomialNB()** function using the Term Document Matrix and the **targets** variable

In [24]:
classifier = MultinomialNB()

classifier.fit(counts, targets)

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

11. ❏ Vectorize the predictor data in the test dataset into token counts<br><br>
*Hint: Use the initialized CountVectorizer() function from step 8*

In [25]:
test_count = vectorizer.transform(Pred_test)

12. ❏ Use the trained model to make predictions for the vectorized test dataset, and display them

In [26]:
predictions = classifier.predict(test_count)
predictions

array(['ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham',
       'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham',
       'ham', 'ham', 'spam', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham',
       'ham', 'spam', 'ham', 'ham', 'spam', 'ham', 'ham', 'ham', 'ham',
       'ham', 'ham', 'spam', 'ham', 'ham', 'spam', 'spam', 'spam', 'ham',
       'ham', 'ham', 'spam', 'ham', 'ham', 'ham', 'ham', 'spam', 'ham',
       'ham', 'ham', 'spam', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham',
       'ham', 'ham', 'spam', 'ham', 'spam', 'ham', 'ham', 'ham', 'spam',
       'ham', 'ham', 'ham', 'spam', 'ham', 'ham', 'ham', 'ham', 'ham',
       'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'spam', 'ham', 'ham',
       'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham',
       'ham', 'ham', 'ham', 'spam', 'ham', 'ham', 'spam', 'ham', 'ham',
       'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham',
       'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham',
   

13. ❏ Evaluate the model using the **classification_report()** function<br><br>
*Hint: This will need to be imported from sklearn.metrics*

In [27]:
from sklearn.metrics import classification_report
print(classification_report(target_test, predictions))

              precision    recall  f1-score   support

         ham       0.98      0.99      0.99       469
        spam       0.96      0.90      0.93        89

    accuracy                           0.98       558
   macro avg       0.97      0.95      0.96       558
weighted avg       0.98      0.98      0.98       558



14. ❏ Evaluate the model using the **confusion_matrix()** function<br><br>
*Hint: This will need to be imported from sklearn.metrics*

In [28]:
from sklearn.metrics import confusion_matrix
confusion_matrix(target_test,predictions)

array([[466,   3],
       [  9,  80]], dtype=int64)

## <center>**Congratulations! You have completed the exercise.**</center>

![image.png](attachment:image.png)

# <center>**This is the end of the exercise.**</center>