# Bayesian Network - Documentation

## Importing Necessary Libraries

- ```Pandas:``` All Basic ML works
- ```Sklearn:``` Rich Library possessing all the ML Algorithms

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics

## Importing Dataset

```Dataset:``` It is a Random Dataset and is manual one.

In [2]:
msg = pd.read_csv("naivetext.csv", names=["message", "label"])
print("The dimensions of the dataset", msg.shape)

The dimensions of the dataset (18, 2)


In [3]:
msg

Unnamed: 0,message,label
0,I love this sandwich,pos
1,This is an amazing place,pos
2,I feel very good about these beers,pos
3,This is my best work,pos
4,What an awesome view,pos
5,I do not like this restaurant,neg
6,I am tired of this stuff,neg
7,I can't deal with this,neg
8,He is my sworn enemy,neg
9,My boss is horrible,neg


## Splitting the Dataset

**Encoding the Target Column: LABEL**
> **Positive** Sentence: *1*

> **Negative** Sentence: *0*

*Can be random but use the same encoding for ease of Understanding.*

In [4]:
msg["labelnum"] = msg.label.map({"pos": 1, "neg": 0})

**Splitting DataFrame as Independent and Dependent Variables.**

> **X** --> *Inependent*

> **y** --> *Dependent*

In [5]:
X = msg['message']
y = msg['labelnum']

**Splitting the Dataset for Test and Train also displaying the Size of the Dataset**

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Training Dataset
print(X_test.shape)
print(X_train.shape)


# Testing Dataset
print(y_test.shape)
print(y_train.shape)

print('-'*45)
print("Train Data")
print('-'*45)
print(X_train)

(5,)
(13,)
(5,)
(13,)
---------------------------------------------
Train Data
---------------------------------------------
17      I went to my enemy's house today
13     I am sick and tired of this place
0                   I love this sandwich
15        That is a bad locality to stay
16        We will have good fun tomorrow
6               I am tired of this stuff
2     I feel very good about these beers
9                    My boss is horrible
3                   This is my best work
12                       I love to dance
14                  What a great holiday
7                 I can't deal with this
10              This is an awesome place
Name: message, dtype: object


## Preprocessing Text Data (using basic NLP)

```CountVectorizer()``` A Module widely used for dealing with Text Data in NLP.

In [7]:
count_vect = CountVectorizer()

# performing Text Vectorisation Training Data
X_train_dtm = count_vect.fit_transform(X_train)
X_test_dtm = count_vect.transform(X_test)

# Prinitng the Feature Values after performing Vectorisation
print(count_vect.get_feature_names())

# Creating the Feature into a DataFrame
df = pd.DataFrame(X_train_dtm.toarray(), columns=count_vect.get_feature_names())

print(df)

print(X_train_dtm)

['about', 'am', 'an', 'and', 'awesome', 'bad', 'beers', 'best', 'boss', 'can', 'dance', 'deal', 'enemy', 'feel', 'fun', 'good', 'great', 'have', 'holiday', 'horrible', 'house', 'is', 'locality', 'love', 'my', 'of', 'place', 'sandwich', 'sick', 'stay', 'stuff', 'that', 'these', 'this', 'tired', 'to', 'today', 'tomorrow', 'very', 'we', 'went', 'what', 'will', 'with', 'work']
    about  am  an  and  awesome  bad  beers  best  boss  can  ...  to  today  \
0       0   0   0    0        0    0      0     0     0    0  ...   1      1   
1       0   1   0    1        0    0      0     0     0    0  ...   0      0   
2       0   0   0    0        0    0      0     0     0    0  ...   0      0   
3       0   0   0    0        0    1      0     0     0    0  ...   1      0   
4       0   0   0    0        0    0      0     0     0    0  ...   0      0   
5       0   1   0    0        0    0      0     0     0    0  ...   0      0   
6       1   0   0    0        0    0      1     0     0    0  ..

## Multinomial Naive Bayes Model

### Building the Bayesian Network: Multinomial Naive Bayes Model

> Available in Sklearn Library

In [8]:
clf = MultinomialNB().fit(X_train_dtm, y_train)
predicted = clf.predict(X_test_dtm)

### Displaying the Metrics of the Model

In [9]:
print("Accuracy metrics")
print("Accuracy of the classifer is", metrics.accuracy_score(y_test, predicted)*100)
print("Confusion matrix")
print(metrics.confusion_matrix(y_test, predicted))
print("Recall and Precison ")
print(metrics.recall_score(y_test, predicted))
print('Prediction: ',metrics.precision_score(y_test, predicted)*100)

Accuracy metrics
Accuracy of the classifer is 80.0
Confusion matrix
[[2 1]
 [0 2]]
Recall and Precison 
1.0
Prediction:  66.66666666666666


## Prediction

*Predict the Model on External Data.*

In [12]:
docs_new = ['I like this cake!!']
X_new_counts = count_vect.transform(docs_new)
predictednew = clf.predict(X_new_counts)
for doc, category in zip(docs_new, predictednew):
    print('%s || %s' % (doc, msg.labelnum[category]))

I like this cake!! || 1


**This is one of the Bayesian Model and there are many other Bayesian Models available in ```Sklearn``` and other Libraries.**