# Model Based Prediction - Naive Bayes
- Assume data follows a probabilistic model
- Use Baye's theorem to identify optimal classifiers
- Reasonably accurate and computationally convenient, but makes assumptions about the data

## Classifying Using the Model
- Linear Discriminant Analysis (lda): assumes Gaussian with same covariances (Decision Boundaries)
- Naive Bayes: assumes independence between features for model building

## Naive Bayes
- Primarily used for text classification (spam, classifying news articles...)
- Example of drawing a Queen if the card is a face card
    - P(Queen | Face) = ( P(Face | Queen) * P(Queen) ) / P(Face)
        - P(Face | Queen) = 1
        - P(Queen) = 1/13
        - P(Face) = 3/13
    - P(Queen | Face) = (1 x 1/13) / 3/13 = 1/13 x 13/3 = 1/3

## Example:
- Using Iris dataset
- Linear Discriminant Analysis and Naive Bayes

In [1]:
library(ggplot2)
library(caret)

# Using iris dataset
set.seed(021818)
inTrain <- createDataPartition(y = iris$Species, p = 0.7, list = FALSE)
training <- iris[inTrain,]
testing <- iris[-inTrain,]

Loading required package: lattice


In [2]:
# Build the LDA Model

model_lda <- train(Species ~ ., data = training, method = "lda")
pred <- predict(model_lda, newdata = testing)
confusionMatrix(pred, testing$Species)

Loading required package: MASS


Confusion Matrix and Statistics

            Reference
Prediction   setosa versicolor virginica
  setosa         15          0         0
  versicolor      0         14         0
  virginica       0          1        15

Overall Statistics
                                          
               Accuracy : 0.9778          
                 95% CI : (0.8823, 0.9994)
    No Information Rate : 0.3333          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.9667          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                 1.0000            0.9333           1.0000
Specificity                 1.0000            1.0000           0.9667
Pos Pred Value              1.0000            1.0000           0.9375
Neg Pred Value              1.0000            0.9677           1.0000
Prevalence                  0.3333          

## Comapre to Naive Bayes

In [3]:
# Build the Naive Bayes Model

model_nb <- train(Species ~ ., data = training, method = "nb")
pred <- predict(model_nb, newdata = testing)
confusionMatrix(pred, testing$Species)

Loading required package: klaR
“Numerical 0 probability for all classes with observation 17”

Confusion Matrix and Statistics

            Reference
Prediction   setosa versicolor virginica
  setosa         15          0         0
  versicolor      0         13         0
  virginica       0          2        15

Overall Statistics
                                          
               Accuracy : 0.9556          
                 95% CI : (0.8485, 0.9946)
    No Information Rate : 0.3333          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.9333          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                 1.0000            0.8667           1.0000
Specificity                 1.0000            1.0000           0.9333
Pos Pred Value              1.0000            1.0000           0.8824
Neg Pred Value              1.0000            0.9375           1.0000
Prevalence                  0.3333          