# Calculating Naive Bayes With Features

In this first demo, we will build a Naive Bayes classifier against the iris data set.  We will use the `naivebayes` package to train a model against a subset of the data and then test it against a holdout set.

The `caret` package here will be used to give us a nice-looking confusion matrix at the end.  Otherwise, we don't need it for this demo.

In [None]:
if(!require(naivebayes)) {
  install.packages("naivebayes")
  library(naivebayes)
}
if(!require(caret)) {
  install.packages("caret")
  library(caret)
}

First, load the iris data set.

In [None]:
data(iris)
head(iris)

Next up, we will take a slice of records out and hold it as a test data set.  We will avoid using it at all for training the model; that way, we have as good a possible a view of how the model behaves for non-trained data points.

Note that Species is in alphabetical order.  We want to randomize this to ensure that we have a representative slice of the three species of iris.

You can use the `caret` package to shuffle and split the input data set, and that's a good idea for larger data sets.  Because this is small, I'm taking a more casual approach.

Also note that I'm saving 20% of the data for testing.  I could bump these numbers up and down as needed, but this seems like a good starting point.

In [None]:
set.seed(1773)
irisr <- iris[sample(nrow(iris)),]
irisr <- irisr[sample(nrow(irisr)),]

iris.train <- irisr[1:120,]
iris.test <- irisr[121:150,]

Generating a Naive Bayes classifier is a one-liner once I have the data set up appropriately.

In [None]:
nb <- naivebayes::naive_bayes(Species ~ ., data = iris.train)

The `naivebayes` package has overridden the `plot` function to show you plots of each variable.  That way you can eyeball the data sets and get a good feeling of which variables are more important for discerning species.

In [None]:
plot(nb)

Once we have a trained model, let's run the `predict` function against our test data set.  We'll use the `cbind` function to combine the output with our initial test data set and call it `iris.output`.

In [None]:
iris.output <- cbind(iris.test, prediction = predict(nb, iris.test))

Once we have our output, we can quickly generate a confusion matrix using `caret`.  I like using this a lot more than building my own with e.g. `table(iris.output$Species, iris.output$prediction)`.  The reason I prefer what `caret` has to offer is that it also includes statistics like positive predictive value and negative predictive value.  These tend to be at least as important as accuracy when performing classification, especially for scenarios where one class is extremely likely and the other extremely unlikely.

**Positive predictive value** for a category is:  if my model predicts that a particular set of inputs matches a particular class, what is the probability that this judgement is correct?  For example, we have 12 versicolor entries (read the "versicolor" Prediction row across and sum up values).  11 of the 12 were predicted as versicolor, so our positive predictive value is 11/12 = 0.9167.

**Negative predictive value** for a category is:  if my model predicts that a particular set of inputs does *not* match a particular class, what is the probability that this judgement is correct?  For example, we have 18 predictions which were *not* versicolor (sum up all of the values across the rows *except for the versicolor row*).  Of those 18, 1 was actually versicolor (read the *versicolor* column and ignore the point where the prediction was versicolor).  Therefore, 17 of our 18 negative predictions for versicolor were correct, so our negative predictive value is 17/18 = 0.9444. 

In [None]:
caret::confusionMatrix(iris.output$prediction, iris.output$Species)