# <span style="color:red"> <strong> Diabetes Classification Using Naive Bayes

## Load Data

The dataset we are using here comes from `Kaggle - Pima Diabetes Dataset`. This dataset contains 8 columns (pregnancies, glucose, bloodpressure, skinthickness, insulin, BMI, diabetespedigreefunction, and age) describing the patients' features. The last column is a binary variable, describing if the patient has diabete or not.

In [2]:
import pandas as pd
df = pd.read_csv("Datasets/diabetes.csv")
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [4]:
#check null values
df.isnull().sum()

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64

There is no missing values in this dataset.

## Build A Naive Bayes Model

In [12]:
#data splitting
from sklearn.model_selection import train_test_split

X = df.drop(columns=["Outcome"], axis=1)
y = df[["Outcome"]]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)


The [sklearn.naive_bayes]() module implements Naive Bayes algorithms. These are supervised learning methods based on applying Bayes' Theorem with strong (naive) feature independence assumptions.

There are some types of Naive Bayes algorithms tailored for different kinds of feature:

* [naive_bayes.BernoulliNB](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.BernoulliNB.html#sklearn.naive_bayes.BernoulliNB): Naive Bayes classifier for multivariate Bernoulli models
* [naive_bayes.CategoricalNB](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.CategoricalNB.html#sklearn.naive_bayes.CategoricalNB): Naive Bayes classifier for categorical features
* [naive_bayes.ComplementNB](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.ComplementNB.html#sklearn.naive_bayes.ComplementNB): Correct the "severe assumptions" made by the standard Multinomial Naive Bayes classifier. It is particularly designed for imbalanced datasets
* [naive_bayes.GaussianNB](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB): Gaussian Naive Bayes
* [naive_bayes.MultinomialNB](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html#sklearn.naive_bayes.MultinomialNB): Naive Bayes classifier for multinomial models

In [15]:
#define the model
from sklearn.naive_bayes import GaussianNB

#fit the model
model = GaussianNB()
model.fit(X_train, y_train)

  y = column_or_1d(y, warn=True)


In [17]:
#make predictions
y_pred = model.predict(X_test)
y_pred

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1,
       0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
       0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0,
       1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1,
       1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0,
       1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0])

## Evaluation

In [22]:
from sklearn import metrics

print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
print("Confusion Matrix:")
print(metrics.confusion_matrix(y_test, y_pred))
print("Classification Report:")
print(metrics.classification_report(y_test, y_pred))

Accuracy: 0.7835497835497836
Confusion Matrix:
[[128  18]
 [ 32  53]]
Classification Report:
              precision    recall  f1-score   support

           0       0.80      0.88      0.84       146
           1       0.75      0.62      0.68        85

    accuracy                           0.78       231
   macro avg       0.77      0.75      0.76       231
weighted avg       0.78      0.78      0.78       231



* Precision = TP / TP + FP
* Recall = TP / TP + FN
* F1-score = (2 x Recall x Precision )/ (Recall + Precision)