# Gaussian Naive Bayes 

* It is a probabilistic modeling technique. The aim is to calculate the probability of a particular sample belonging to each class based on conditional probability.

## 1-)MODEL

In [3]:
import numpy as np
import pandas as pd 
from sklearn.model_selection import train_test_split

In [4]:
diabetes = pd.read_csv("diabetes.csv")
df = diabetes.copy()
df = df.dropna()
y = df["Outcome"]
X = df.drop(['Outcome'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.30, 
                                                    random_state=42)



In [5]:
from sklearn.naive_bayes import GaussianNB

In [6]:
nb = GaussianNB()
nb_model = nb.fit(X_train, y_train)
nb_model

GaussianNB()

## 2-)Prediction

In [7]:
nb_model.predict(X_test)[0:10]

array([0, 0, 0, 0, 1, 1, 0, 0, 0, 1], dtype=int64)

In [8]:
nb_model.predict_proba(X_test)[0:10]

array([[0.73815858, 0.26184142],
       [0.94027894, 0.05972106],
       [0.97242831, 0.02757169],
       [0.82840069, 0.17159931],
       [0.47153473, 0.52846527],
       [0.47274458, 0.52725542],
       [0.99607705, 0.00392295],
       [0.69925055, 0.30074945],
       [0.53838117, 0.46161883],
       [0.25004536, 0.74995464]])

In [9]:
y_pred = nb_model.predict(X_test)
y_pred[0:10]

array([0, 0, 0, 0, 1, 1, 0, 0, 0, 1], dtype=int64)

In [12]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

In [13]:
confusion_matrix(y_test, y_pred)

array([[119,  32],
       [ 27,  53]], dtype=int64)

In [14]:
accuracy_score(y_test, y_pred)

0.7445887445887446

In [15]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.82      0.79      0.80       151
           1       0.62      0.66      0.64        80

    accuracy                           0.74       231
   macro avg       0.72      0.73      0.72       231
weighted avg       0.75      0.74      0.75       231



* As in logistic regression, this model does not have an external hyperparameter that we can optimize.

In [17]:
from sklearn.model_selection import cross_val_score

In [18]:
cross_val_score(nb_model, X_test, y_test, cv = 10).mean()

0.775

* This value represents the validated (validated) accuracy_score. If this value is higher, it shows the predictive power of our model.


* Approaching this value to 1 indicates the strength of predictive accuracy