# Iris dataset in R

### Load data

In [2]:
# Load the ML library
library(caret)
library(randomForest)

# Read the dataset
dataset <- read.csv("data/iris.csv")

# Split into 20% validation and 80% training
val_index <- createDataPartition(dataset$species, p=0.80, list=FALSE)
validation <- dataset[-val_index,]
training <- dataset[val_index,]

# Train ML models and estimate accuracy on test data
# Setup 10-fold cross validation
control <- trainControl(method="cv", number=10)
metric <- "Accuracy"

# Set randomizer seed
set.seed(7)

### Train classifiers 

In [3]:
# Classification and Regression Trees (CART)
fit.cart <- train(species~., data=training, method="rpart", metric=metric, trControl=control)
# k-Nearest Neighbors (kNN)
fit.knn <- train(species~., data=training, method="knn", metric=metric, trControl=control)
# Random Forest
fit.rf <- train(species~., data=training, method="rf", metric=metric, trControl=control)

### Evaluate results

In [4]:
# Evaluate the models and check which one is best
# Summarize accuracy of the models
results <- resamples(list(cart=fit.cart, knn=fit.knn, rf=fit.rf))
summary(results)


Call:
summary.resamples(object = results)

Models: cart, knn, rf 
Number of resamples: 10 

Accuracy 
          Min.   1st Qu.    Median      Mean   3rd Qu. Max. NA's
cart 0.9166667 0.9166667 0.9166667 0.9416667 0.9791667    1    0
knn  0.9166667 0.9166667 1.0000000 0.9666667 1.0000000    1    0
rf   0.8333333 0.9166667 1.0000000 0.9583333 1.0000000    1    0

Kappa 
      Min. 1st Qu. Median   Mean 3rd Qu. Max. NA's
cart 0.875   0.875  0.875 0.9125 0.96875    1    0
knn  0.875   0.875  1.000 0.9500 1.00000    1    0
rf   0.750   0.875  1.000 0.9375 1.00000    1    0


### Evaluate best model on validation set

In [6]:
# Use the best model and evaluate on the validation dataset
predictions <- predict(fit.knn, validation)
confusionMatrix(predictions, validation$species)

Confusion Matrix and Statistics

                 Reference
Prediction        Iris-setosa Iris-versicolor Iris-virginica
  Iris-setosa              10               0              0
  Iris-versicolor           0               9              1
  Iris-virginica            0               1              9

Overall Statistics
                                          
               Accuracy : 0.9333          
                 95% CI : (0.7793, 0.9918)
    No Information Rate : 0.3333          
    P-Value [Acc > NIR] : 8.747e-12       
                                          
                  Kappa : 0.9             
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: Iris-setosa Class: Iris-versicolor
Sensitivity                      1.0000                 0.9000
Specificity                      1.0000                 0.9500
Pos Pred Value                   1.0000                 0.9000
Neg Pred Value                   1.0000                 0