# Naive Bayes Classification

**Basic Description**

- Naive Bayes models are a group of extremely fast and simple classiciation algorithms that are often suitable for high-dimensional datasets
- Because they are so fast and have so few tunable parameters, they are useful as a quick-and-dirty baseline for a classification problem
- In Bayesian classification, we're interested in finding the probability of a label given some observed features. As a generative model, Naive Bayes specifies the hypothetical random process that generates the data. The "naive" in Naive Bayes comes from the fact that naive assumptions are made about the generative model for each label
- Here I choose a Gaussian Naive Bayes Classifier because our model features are continuous.

**Bias-Variance Tradeoff**
- Its simplicity makes for high bias and low variance 

**Upsides**
- Fast for training and prediction
- Straightforward probabilistic prediction
- Easily interprettable
- Few, if any, tuning parameters

**Downsides**
- Its strong assumptions are often not met

**Other Notes**


## Load Packages and Prep Data

In [1]:
# custom utils
from utils import custom
from utils.cf_matrix import make_confusion_matrix

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.feature_selection import RFECV
from sklearn.linear_model import Lasso
from sklearn.naive_bayes import GaussianNB

In [2]:
# load data
X_train, y_train, X_test, y_test = custom.load_data()

X_train (62889, 42)
y_train (62889,)
X_test (15723, 42)
y_test (15723,)


## Model 1

In [3]:
# fit model
classifier = GaussianNB()
nb_1 = classifier.fit(X_train, y_train)

In [4]:
# cross-validation scoring
nb_1_scores = custom.cv_metrics(nb_1, X_train, y_train)
nb_1_scores

accuracy     0.916
precision    0.225
recall       0.101
f1           0.139
dtype: float64

## Conclusion
- Due to very poor performance with this initial exploratory fit, I'm devoting resources towards tuning other models before returning here.