# Classification using Bayesian Classifier 

### Bayesian Classifiers
These classifiers are "probabilistic classifiers" based on Bayes' theorem. Bayesian classifiers are highly scalable. They are often used when dimensionality of the inputs is high. 

### Types
1. Naïve Bayes
2. Bayesian Belief Network

### Problem Statement

UCI dataset: Skin Segmentation Data Set (https://archive.ics.uci.edu/ml/machine-learning-databases/00229/).
The Skin Segmentation dataset is constructed over the B, G, R color space. Skin and Nonskin dataset is generated using skin textures from face images of people with diverse age, gender, and race. The task is to identify whether the BGR combination is a skin color or not.


In [26]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

Following code reads a text file into a dataframe. Modify the below code to split the values of B, G, R and class label into different columns. Each column must have a column name as specified below:

```
Column No.   Expected Column Name
1            BLUE
2            GREEN
3            RED
4            RESULT     
```

In [27]:
df = pd.read_csv(filepath_or_buffer='data/Skin_NonSkin.txt')
df.columns = ['BLUE','GREEN','RED','RESULT']
df

Unnamed: 0,BLUE,GREEN,RED,RESULT
0,73,84,122,1
1,72,83,121,1
2,70,81,119,1
3,70,81,119,1
4,69,80,118,1
5,70,81,119,1
6,70,81,119,1
7,76,87,125,1
8,76,87,125,1
9,77,88,126,1


Write some code to define X and y dataframes containing R G B components in X and the class in y. Then these will be used to split the data into test / train data. We will be using the Test-Train Split in order to calculate the accuracy of a classification model.

In [28]:
# Write your code here
X = df[['BLUE','GREEN','RED']]
y = df['RESULT']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

The code in the next cell is used to classify the test data by following the steps below:
    1. Import Gaussian Naïve Bayes Classifier
    2. Fit the model with training data (X: attributes and y:labels)
    3. Use the trained model to predict labels of test data (X_test)
    4. Calculate the accuracy score using actual labels (y_test) and predicted labels (y_pred)

In [29]:
from sklearn.metrics import accuracy_score
gnb = GaussianNB()
y_pred = gnb.fit(X_train, y_train).predict(X_test)
accuracy_score(y_test, y_pred)

0.9238867850613737

Write some code to calculate Precision, Recall and F-score of the results obtained using the given GaussianNB classification model.


In [30]:
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

             precision    recall  f1-score   support

          1       0.87      0.74      0.80     12634
          2       0.93      0.97      0.95     48630

avg / total       0.92      0.92      0.92     61264



Write some code to classify X_train and y_train using Multinomial Naive Bayes Classifier or Bernoulli Naive Bayes Classifier. Calculate the accuracy, precision, recall and f1-score values for your trained model. Use the scikit learn library for this task.

In [31]:
from sklearn.naive_bayes import MultinomialNB

mnb = MultinomialNB()
y_pred_mnb = mnb.fit(X_train,y_train).predict(X_test)
accuracy_score(y_test,y_pred)

0.9238867850613737

In [32]:
print(classification_report(y_test,y_pred_mnb))

             precision    recall  f1-score   support

          1       0.76      1.00      0.86     12634
          2       1.00      0.92      0.96     48630

avg / total       0.95      0.94      0.94     61264

