# Bayesian Network - Documentation

## About Multinomial Naive Bayes (implemented here)

- One of the **Popular Supervised Learning Classifications** used in Analysis of **Text** Data.

- It is a **Probablistic Learning** Method mostly based on Natural Language Processing (**NLP**).

- It is based on Bayes Theorem which predicts the **label** of the Text data given as an input.

<hr>

- Naive Bayes working is understood better if we know **Bayes Theorem**:

**P(A|B) = [P(A) * P(B|A)] / P(B)**

- In words, Probablity of A given B equals product of Probablity of A and Probablity of B given A divided by Probablity of B.

- It basically works on Probablity and hence, the computational power required is less. Also the Accuracy of the Model is high as well.

- Therefore, it can handle large datasets with ease. (*here implemented in a small scale.*)

- It works on both discrete and continous data

## Importing Necessary Libraries

- ```Pandas:``` All Basic ML works
- ```Sklearn:``` Rich Library possessing all the ML Algorithms

**NOTE:** *Other Required Libraries are imported as and when Required.*

In [21]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics

## Importing Dataset

```Dataset:``` It is a Random Dataset and is manual one.

In [22]:
msg = pd.read_csv("naivetext.csv", names=["message", "label"])
print("The dimensions of the dataset", msg.shape)

The dimensions of the dataset (18, 2)


In [23]:
msg

Unnamed: 0,message,label
0,I love this sandwich,pos
1,This is an amazing place,pos
2,I feel very good about these beers,pos
3,This is my best work,pos
4,What an awesome view,pos
5,I do not like this restaurant,neg
6,I am tired of this stuff,neg
7,I can't deal with this,neg
8,He is my sworn enemy,neg
9,My boss is horrible,neg


## Splitting the Dataset

**Encoding the Target Column: LABEL**
> **Positive** Sentence: *1*

> **Negative** Sentence: *0*

*Can be random but use the same encoding for ease of Understanding.*

In [24]:
msg["labelnum"] = msg.label.map({"pos": 1, "neg": 0})

**Splitting DataFrame as Independent and Dependent Variables.**

> **X** --> *Inependent*

> **y** --> *Dependent*

In [25]:
X = msg['message']
y = msg['labelnum']

**Splitting the Dataset for Test and Train also displaying the Size of the Dataset**

In [26]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Training Dataset
print(X_test.shape)
print(X_train.shape)


# Testing Dataset
print(y_test.shape)
print(y_train.shape)

# Display Input-Training Data
print('-'*45)
print("\t\tTrain Data")
print('-'*45)
print(X_train)

(5,)
(13,)
(5,)
(13,)
---------------------------------------------
		Train Data
---------------------------------------------
5          I do not like this restaurant
12                       I love to dance
17      I went to my enemy's house today
9                    My boss is horrible
2     I feel very good about these beers
6               I am tired of this stuff
0                   I love this sandwich
14                  What a great holiday
15        That is a bad locality to stay
13     I am sick and tired of this place
16        We will have good fun tomorrow
7                 I can't deal with this
1               This is an amazing place
Name: message, dtype: object


## Preprocessing Text Data (using basic NLP)

```CountVectorizer()``` A Module widely used for dealing with Text Data in NLP.

In [27]:
# Text Vectorizer
count_vect = CountVectorizer()

# Implementing Model -> Training Data
X_train_dtm = count_vect.fit_transform(X_train)
X_test_dtm = count_vect.transform(X_test)

# Displaying the Feature Values after performing Vectorisation
print(count_vect.get_feature_names())

# Creating the Feature into a DataFrame
df = pd.DataFrame(X_train_dtm.toarray(), columns=count_vect.get_feature_names())
print(df)
print(X_train_dtm)

['about', 'am', 'amazing', 'an', 'and', 'bad', 'beers', 'boss', 'can', 'dance', 'deal', 'do', 'enemy', 'feel', 'fun', 'good', 'great', 'have', 'holiday', 'horrible', 'house', 'is', 'like', 'locality', 'love', 'my', 'not', 'of', 'place', 'restaurant', 'sandwich', 'sick', 'stay', 'stuff', 'that', 'these', 'this', 'tired', 'to', 'today', 'tomorrow', 'very', 'we', 'went', 'what', 'will', 'with']
    about  am  amazing  an  and  bad  beers  boss  can  dance  ...  tired  to  \
0       0   0        0   0    0    0      0     0    0      0  ...      0   0   
1       0   0        0   0    0    0      0     0    0      1  ...      0   1   
2       0   0        0   0    0    0      0     0    0      0  ...      0   1   
3       0   0        0   0    0    0      0     1    0      0  ...      0   0   
4       1   0        0   0    0    0      1     0    0      0  ...      0   0   
5       0   1        0   0    0    0      0     0    0      0  ...      1   0   
6       0   0        0   0    0    0  

## Multinomial Naive Bayes Model

### Building the Bayesian Network: Multinomial Naive Bayes Model

> Available in Sklearn Library

> Link to <a href="https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html">Docs.</a>

In [28]:
clf = MultinomialNB().fit(X_train_dtm, y_train)
predicted = clf.predict(X_test_dtm)

### Metrics of the Model

In [29]:
print("-------------Metrics--------------")

print("Accuracy of the classifer is", metrics.accuracy_score(y_test, predicted)*100)

print("Confusion Matrix")
print(metrics.confusion_matrix(y_test, predicted))

print("Recall Score: ", end='')
print(metrics.recall_score(y_test, predicted))

print('Prediction: ',metrics.precision_score(y_test, predicted)*100)

-------------Metrics--------------
Accuracy of the classifer is 80.0
Confusion Matrix
[[2 0]
 [1 2]]
Recall Score: 0.6666666666666666
Prediction:  100.0


## Prediction

*Predict the working of the Model on External Data.*

In [30]:
docs_new = ['I like this place']
X_new_counts = count_vect.transform(docs_new)
predictednew = clf.predict(X_new_counts)
for doc, category in zip(docs_new, predictednew):
    print('%s || %s' % (doc, msg.labelnum[category]))

I like this place || 1
