# RADI603: R Programming Assignment

**ROMEN SAMUEL WABINA** | Student No. 6536435<br> 
PhD student, Data Science for Healthcare and Clinical Informatics <br>
Clinical Epidemiology and Biostatistics, Faculty of Medicine - Ramathibodi Hospital <br>
Mahidol University

## A0. Import the relevant libraries

In [6]:
library(ggplot2)
library(GGally)
library(dplyr)
library(naniar)
library(e1071)
library(MASS)
library(caret)
library(class)
library(lpSolve)
library(magick)
library(tensorflow)
library(neuralnet)
library(spatstat)
library(OpenImageR)
library(raster)
library(ROSE)
library(SpatialPack)
library(devtools)
library(tibble)

setwd('../data')

##  A1. Autism Data
From Autism-Adolescent-Data.csv, we need to complete the following task and determine which model is appropriate to classify the patients. The autism dataset consists of 16 feature variables that includes 10 binary-based scoring systems (values 0 or 1), age (M or F), jaundice (Yes or No), Autism (Yes or No), Used App (Yes or No), result (sum of all 10 binary-based scoring systems), and class (YES or NO). In general, the autism dataset has $104 \times 17$ dimensions, representing 104 samples and 17 columns. 

### A1.1. Data Management (Preparing, Cleansing, Imputation, etc.)

In this section, we first preprocessed the given Autism data by enoding the categorical features, such as gender, jaundice, and its corresponding class. We then replaced its ambiguous values, i.e., rows with '?' and using letter 'O' as zeroes, as NAs. We imputed the NAs in terms of median since the given data follows a non-parametric distribution, i.e., categorical features. We also normalized the data by first changing the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information.

In [7]:
df_autism <- read.csv('../data/Autism-Adolescent-Data.csv')

# While it's not required, I changed the column names to make more readable to users
colnames(df_autism)[13] <- 'jaundice'
colnames(df_autism)[14] <- 'autism'
colnames(df_autism)[15] <- 'used'
colnames(df_autism)[17] <- 'class'

# Check the data
check_data <- function(df) 
{
    sapply(df, function(x) sum(is.na(x)))
    sapply(df, function(x) typeof(x))
    sapply(df, function(x) unique(x))
    sapply(df, function(x) is.numeric(x))
    sapply(df, function(x) class(x))
}

In [8]:
# Encoding the categorical data
df_autism$gender = factor(df_autism$gender,
                          levels = c('m', 'f'),
                          labels = c(1, 0))

df_autism$jaundice = factor(df_autism$jaundice,
                            levels = c('yes', 'no'),
                            labels = c(1, 0))

df_autism$autism = factor(df_autism$autism,
                          levels = c('yes', 'no'),
                          labels = c(1, 0))

df_autism$used = factor(df_autism$used,
                        levels = c('yes', 'no'),
                        labels = c(1, 0))

df_autism$class = factor(df_autism$class,
                         levels = c('YES', 'NO'),
                         labels = c(1, 0))

In [9]:
# Replace vague values as NA
df_autism <- df_autism %>% 
  replace_with_na(replace = list(A3.Score  = "?")) %>%
  replace_with_na(replace = list(A6.Score  = "?")) %>%
  replace_with_na(replace = list(A10.Score = "O"))

# Convert string to integer
df_autism$A3.Score  <- as.integer(df_autism$A3.Score)
df_autism$A6.Score  <- as.integer(df_autism$A6.Score)
df_autism$A10.Score <- as.integer(df_autism$A10.Score)


# Imputation
df_autism$A3.Score[is.na(df_autism$A3.Score)]   <- median(df_autism$A3.Score,  na.rm = TRUE)
df_autism$A6.Score[is.na(df_autism$A6.Score)]   <- median(df_autism$A6.Score,  na.rm = TRUE)
df_autism$A10.Score[is.na(df_autism$A10.Score)] <- median(df_autism$A10.Score, na.rm = TRUE)


# Scaling/Normalization
for (idx in 1:16)
  {
    df_autism[, idx] <- as.numeric(df_autism[, idx])
  }

### A1.2. Support Vector Machines

In [10]:
# I added manual seeding in data splitting to provide the same results when running the model
# Users may opt to delete the seeding
set.seed(224599) 
df_autism[1:16] <- scale(df_autism[1:16])
idx <- sample(2, nrow(df_autism), replace = TRUE, prob=c(0.75, 0.25))
autism_train <- as.data.frame(df_autism[idx==1,])
autism_test  <- as.data.frame(df_autism[idx==2,])
tune  <- tune.svm(class~., data = autism_train, gamma = 10^(-6:-1), cost = 10^(1:4), tunecontrol = tune.control(cross = 10))

set.seed(224599)
# Building the SVM classifier
model <- svm(class~., data = autism_train, 
                      method = 'C-classification', 
                      kernel = 'polynomial', degree = 3, 
                      probability = T, 
                      gamma = 1e-02, 
                      cost = 1000)

prediction <- predict(model, autism_test, probability = T)
# table(autism_test$class, prediction)
confusionMatrix(autism_test$class, prediction)

Confusion Matrix and Statistics

          Reference
Prediction  1  0
         1 12  1
         0  0  8
                                          
               Accuracy : 0.9524          
                 95% CI : (0.7618, 0.9988)
    No Information Rate : 0.5714          
    P-Value [Acc > NIR] : 0.0001319       
                                          
                  Kappa : 0.9014          
                                          
 Mcnemar's Test P-Value : 1.0000000       
                                          
            Sensitivity : 1.0000          
            Specificity : 0.8889          
         Pos Pred Value : 0.9231          
         Neg Pred Value : 1.0000          
             Prevalence : 0.5714          
         Detection Rate : 0.5714          
   Detection Prevalence : 0.6190          
      Balanced Accuracy : 0.9444          
                                          
       'Positive' Class : 1               
                                    

Results in SVM have shown high rates of accuracy, as indicated in the confusion matrix and statistics. Its accuracy is 95.24% with p-value = 0.0001319 < 0.05, suggesting a statistical model significance. We may adopt the accuracy as its primary model performance indicator since the given autism dataset is not imbalanced. However, since we are dealing with disease detection, it is vital to minimize the occurrence of false negatives (type II error) - that is, when the autism detection test indicates a person is not diagnosed with autism, but he/she actually is. Therefore, we used sensitivity metrics to avoid patient misdiagnosis. Results above indicated a perfect sensitivity rate, implying that the SVM model can generalize very well to the given dataset. In this case, the SVM has correctly detected autism-diagnosed patients who do have the condition.      

### A1.3 K-Nearest Neighbors

In [29]:
set.seed(10000)
idx <- sample(2, nrow(df_autism), replace = TRUE, prob=c(0.75, 0.35))
autism_train <- as.data.frame(df_autism[idx == 1,])
autism_test  <- as.data.frame(df_autism[idx == 2,])

# Did not use the results column
train_input  <- as.matrix(autism_train[, -16])
train_output <- as.matrix(autism_train[,  17])
test_input   <- as.matrix(autism_test[,  -16])

# Building the KNN classifier
set.seed(10000)
prediction <-knn(train_input, test_input, train_output, k = 3)
xtab <- table(factor(prediction, levels = c(0, 1)), factor(autism_test$class, levels = c(0, 1)))
results <- confusionMatrix(xtab)
results

# You may uncomment the following scripts to see other performance indicators
# as.matrix(results, what = "overall")
# as.matrix(results, what = "classes")

Confusion Matrix and Statistics

   
     0  1
  0  9  0
  1  4 20
                                        
               Accuracy : 0.8788        
                 95% CI : (0.718, 0.966)
    No Information Rate : 0.6061        
    P-Value [Acc > NIR] : 0.000602      
                                        
                  Kappa : 0.7317        
                                        
 Mcnemar's Test P-Value : 0.133614      
                                        
            Sensitivity : 0.6923        
            Specificity : 1.0000        
         Pos Pred Value : 1.0000        
         Neg Pred Value : 0.8333        
             Prevalence : 0.3939        
         Detection Rate : 0.2727        
   Detection Prevalence : 0.2727        
      Balanced Accuracy : 0.8462        
                                        
       'Positive' Class : 0             
                                        

The results in K-Nearest Neighbors have provided lower performance rates compared to SVM. Its accuracy is only 87.88%. Nonetheless, the KNN model produces statistically significant results since p-value = 0.01066 < 0.05, assuming the standard significance level. Its sensitivity metric is only 69.23%, suggesting that the KNN has **misdiagnosed**  patients who do have the condition. This conclusion indicates that the KNN may misdiagnosed 4-4 non-autistic patients out of 10, but in actuality, they are.

### A1.4 Artificial Neural Networks 

In [8]:
set.seed(152)
idx <- sample(2, nrow(df_autism), replace = TRUE, prob=c(0.7, 0.3))
autism_train <- as.data.frame(df_autism[idx == 1,])
autism_test  <- as.data.frame(df_autism[idx == 2,])
nnet_autismtrain <- autism_train

nnet_autismtrain <- cbind(nnet_autismtrain, autism_train$class == 0)
nnet_autismtrain <- cbind(nnet_autismtrain, autism_train$class == 1)
names(nnet_autismtrain)[18:19] <- c('NO', 'YES')

# The script cor(df) performs correlation matrix in the given dataset
# cor(df_autism[, 1:16], as.numeric(df_autism[, 17]), method=c('pearson', 'kendall', 'spearman'))

# Building the ANN classifier
nn <- neuralnet(NO + YES ~ 
                A3.Score + A4.Score + A5.Score + A6.Score  + 
                A7.Score + A8.Score + A9.Score + A10.Score + used,
                data = nnet_autismtrain, hidden=c(4))

# You may uncomment the following code to view neural net plot
# plot(nn)

mypredict <- compute(nn, autism_test[-16])$net.result
maxidx <- function(arr){ 
  return(which(arr == max(arr)))
}

idx <- apply(mypredict, c(1), maxidx)
prediction <- c(0, 1)[idx]
u <- union(prediction, autism_test$class)
xtab <- table(factor(prediction, u), factor(autism_test$class, u))
results <- confusionMatrix(xtab)
results

# You may uncomment the following scripts to see other performance indicators
# as.matrix(results, what = 'overall')
# as.matrix(results, what = 'classes')

# You may uncomment the following script to see the ROC curve
# roc.curve(autism_test$class, prediction, plotit = TRUE)

Confusion Matrix and Statistics

   
     0  1
  0  5  1
  1  0 16
                                          
               Accuracy : 0.9545          
                 95% CI : (0.7716, 0.9988)
    No Information Rate : 0.7727          
    P-Value [Acc > NIR] : 0.0257          
                                          
                  Kappa : 0.8791          
                                          
 Mcnemar's Test P-Value : 1.0000          
                                          
            Sensitivity : 1.0000          
            Specificity : 0.9412          
         Pos Pred Value : 0.8333          
         Neg Pred Value : 1.0000          
             Prevalence : 0.2273          
         Detection Rate : 0.2273          
   Detection Prevalence : 0.2727          
      Balanced Accuracy : 0.9706          
                                          
       'Positive' Class : 0               
                                          


### A1.5 Results and Model Performance Discussion

It is strongly discouraged to adopt K-Nearest Neighbors (KNN) for the autism data. The KNN may be computationally expensive as the number of feature increases, since it requires the calculation of the distance for each feature, where we have to compute the input points, with every single point, perform sorting, and then get the majority class from the n-th nearest neighbor. In addition, the autism data consists of several categorical features, e.g., gender and scores. These variables can increase the resulting false negatives since KNN cannot work with categorical features because it is computational expensive to solve distance formulas for categorical features. 

The given autism data follows a non-linearity trend. Therefore, **Support Vector Machines (SVM) work well with autism data since it utilizes non-linearity kernels (i.e., polynomial) to construct non-linear decisions boundaries.** However, its computational cost is expensive since it requires fine-tuning the parameter C (penalty parameter of the error term), which often takes time to simulate. 

The **Artificial Neural Networks (ANNs), similar to SVM, can also work well with the autism data since it can embed non-linearity through non-linear activation functions (i.e., ReLU). Both ANN and SVM can classify the autism data with comparable accuracy since both models are parametric and utilizes optimization algorithms to train the data, i.e., quadratic programming for SVM and gradient descent for ANNs.** However, the computational power required for ANNs are higher compared to SVMs, which provides longer training time for ANN. This is because the decision hyperplane in SVM is guaranteed to be located between support vectors belonging to different classes. The ANNs, on one hand, don’t offer this guarantee and, instead, position the initial decision function randomly, which may produce false negative predictions. 

Results from model training have shown that ANN's accuracy is relatively higher compared to SVM. Surprisingly, both ANN and SVM yielded perfect sensitivity (100%) for autism data, such that sensitivity is a suitable metric for disease detection tasks to minimize false negatives. However, **SVMs are preferrable than ANNs for low-dimensional datasets (i.e., autism data) due to SVM's curse of dimensionality problem and SVM's low computational power requirement. Therefore, in summary, it is recommended to use SVMs for autism data. The ANNs, on the other hand, is preferrable for high-dimensional data than SVMs, despite the comparable accuracy of both models.**  

##  A2. Leukemia Data
From leukemiaData.csv, we need to complete the following task and determine which model is appropriate to classify the patients. The leukemia dataset consists of 14 feature variables, whose values are continuous, and one target variable with three classes. These three classes correspond to Acute lymphoblastic leukemia (ALL), mixed-lineage leukemia (MLL), and acute myeloblastic leukemia (AML). In general, the leukemia dataset has $57 \times 15$ dimensions, representing 57 samples and 15 dimensions. The luekemia dataset only offers small samples, which may be a problem in the training process of the machine learning model.

### A2.1 Data Management (Preparing, Cleansing, Imputation, etc.)

In this section, we first preprocessed the given Leukemia data by encoding the class variable. We denote ALL, MLL, and AML as 0, 1, and 2, respectively. We then transformed all variables, excluding the class, as a numeric variable. Finally, we scaled the resulting dataset to into a common scale, without distorting differences in the ranges of values or losing information.

We also evaluated the correlation among features since correlated features, in general, don't improve models. Highly correlated features affect (specific) models in various ways due to multicollinearity. The multicollinearity decreases the accuracy of the estimated weights, which weakens the statistical power of the classification model. Therefore, we remove highly correlated features in the leukemia dataset. 

In [11]:
df_leukem <- read.csv('../data/leukemiaData.csv')

# You may uncomment the following script to check data
# check_data(df_leukem)

df_leukem$Class = factor(df_leukem$Class, levels = c('ALL', 'MLL', 'AML'), labels = c(0, 1, 2))

for (idx in 1:(ncol(df_leukem)-1))
  {df_leukem[, idx] <- as.numeric(df_leukem[, idx])}
df_leukem[, 1:(ncol(df_leukem)-1)] <- scale(df_leukem[, 1:(ncol(df_leukem)-1)], center = TRUE, scale  = TRUE)

# Check correlation between features and target
correlation <- cor(df_leukem[, 1:14], as.numeric(df_leukem[, 15]), method=c('pearson', 'kendall', 'spearman'))

### A2.2 Support Vector Machines

In [10]:
set.seed(3407)
idx <- sample(2, nrow(df_leukem), replace = TRUE, prob=c(0.7, 0.3))

# cor(df_leukem[, 1:14], as.numeric(df_leukem[, 15]), method=c('pearson', 'kendall', 'spearman'))

df_leukem_svm <- df_leukem[c('g4', 'g5', 'g7', 'g8', 'g10', 'g12', 'Class')]
leukem_train <- as.data.frame(df_leukem_svm[idx==1,])
leukem_test  <- as.data.frame(df_leukem_svm[idx==2,])
tune  <- tune.svm(Class~., data = leukem_train, 
                  gamma = 10^(-10:-1), 
                  cost  = 10^(1:5), 
                  tunecontrol = tune.control(cross = 10))

# Uncomment the following script to see parametric tuning
# summary(tune)

set.seed(3407)
model <- svm(Class~., data = leukem_train, 
             method = 'C-classification', scale = FALSE,
             kernel = 'radial',
             probability = T, 
             gamma = 1e-03, 
             cost  = 1e+01)

prediction <- predict(model, leukem_test, probability = T)
xtab <- table(leukem_test$Class, prediction)
confusionMatrix(xtab)

Confusion Matrix and Statistics

   prediction
    0 1 2
  0 2 5 0
  1 1 4 1
  2 0 0 4

Overall Statistics
                                          
               Accuracy : 0.5882          
                 95% CI : (0.3292, 0.8156)
    No Information Rate : 0.5294          
    P-Value [Acc > NIR] : 0.4063          
                                          
                  Kappa : 0.3866          
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: 0 Class: 1 Class: 2
Sensitivity            0.6667   0.4444   0.8000
Specificity            0.6429   0.7500   1.0000
Pos Pred Value         0.2857   0.6667   1.0000
Neg Pred Value         0.9000   0.5455   0.9231
Prevalence             0.1765   0.5294   0.2941
Detection Rate         0.1176   0.2353   0.2353
Detection Prevalence   0.4118   0.3529   0.2353
Balanced Accuracy      0.6548   0.5972   0.9000

The leukemia-based SVM only generated 58.82% accuracy, as indicated in the confusion matrix and statistics above. While we expected that SVM may provide good performance since leukemia dataset is small, its p-value, however, is 0.4063 > 0.05, suggesting that the model performance fails to reject the null hypothesis. This implies that the model evaluation did not provide sufficient evidence to conclude that the SVM can classify leukemia correctly. Therefore, SVM is a poor model for the leukemia dataset. 

Under other circumstances, suppose its p-value is statistically significant. We can utilize sensitivity to determine whether the SVM has correctly diagnosed patients with leukemia. However, SVM performance have shown that 50% of its predictions for class 0 (ALL: Acute Lymphoblastic Leukemia) misdiagnosed patients as healthy, wherein they have Acute Lymphoblastic Leukemia. The same scenario is also evident where sensitivity for Class 1 (MLL) is only 50% and Class 2 (AML) is 80%. In general, the predictions made by SVM are poor and highly produce false negatives. 

The SVM can be improved here if sufficient number of samples are used in the luekemia dataset.

### A2.3 K-Nearest Neighbors

In [11]:
set.seed(3407)
idx <- sample(2, nrow(df_leukem), replace = TRUE, prob = c(0.70, 0.30))

df_leukem_knn <- df_leukem[c('g4', 'g5', 'g7', 'g8', 'g10', 'g12', 'Class')]

leukem_train <- as.data.frame(df_leukem_knn[idx == 1,])
leukem_test  <- as.data.frame(df_leukem_knn[idx == 2,])
train_input  <- as.matrix(leukem_train[, -7])
train_output <- as.matrix(leukem_train[,  7])
test_input   <- as.matrix(leukem_test[,  -7])

set.seed(3407)
prediction <- knn(train_input, test_input, train_output, k = 29)
xtab <- table(prediction, leukem_test$Class)
results <- confusionMatrix(xtab)
results

# Uncomment if you want to evaluate other performance indicators
# as.matrix(results, what = 'overall')
# as.matrix(results, what = 'classes')

# roc.curve(factor(leukem_test$Class, levels = c(0, 1)), 
#           factor(prediction, levels = c(0, 1)), plotit = TRUE)

Confusion Matrix and Statistics

          
prediction 0 1 2
         0 6 2 0
         1 1 4 0
         2 0 0 4

Overall Statistics
                                         
               Accuracy : 0.8235         
                 95% CI : (0.5657, 0.962)
    No Information Rate : 0.4118         
    P-Value [Acc > NIR] : 0.0006427      
                                         
                  Kappa : 0.7273         
                                         
 Mcnemar's Test P-Value : NA             

Statistics by Class:

                     Class: 0 Class: 1 Class: 2
Sensitivity            0.8571   0.6667   1.0000
Specificity            0.8000   0.9091   1.0000
Pos Pred Value         0.7500   0.8000   1.0000
Neg Pred Value         0.8889   0.8333   1.0000
Prevalence             0.4118   0.3529   0.2353
Detection Rate         0.3529   0.2353   0.2353
Detection Prevalence   0.4706   0.2941   0.2353
Balanced Accuracy      0.8286   0.7879   1.0000

The leukemia-based KNN produced higher performance metrics than SVM, with a model accuracy of 82.35%. Unlike SVM, the leukemia-based KNN model is considered as a good model since its p-value = 0.0006427 < 0.05. This implies that the model evaluation provide sufficient evidence to conclude that the KNN can classify patients with leukemia correctly. Moreover, it is important to note that classes 0 and 2 generated good sensitivity performance, with sensitivity of 85.71% and 100%, respectively. The class 1, however, generated poor sensitivity rate, possibly due to low distribution of its class.

### A2.4 Artificial Neural Networks

In [12]:
set.seed(1000)
idx <- sample(2, nrow(df_leukem), replace = TRUE, prob = c(0.7, 0.3))
leukem_train <- as.data.frame(df_leukem[idx==1,])
leukem_test  <- as.data.frame(df_leukem[idx==2,])
nnet_leukemtrain <- leukem_train

nnet_leukemtrain <- cbind(nnet_leukemtrain, leukem_train$Class == 0)
nnet_leukemtrain <- cbind(nnet_leukemtrain, leukem_train$Class == 1)
nnet_leukemtrain <- cbind(nnet_leukemtrain, leukem_train$Class == 2)

names(nnet_leukemtrain)[16:18] <- c('ALL', 'MLL', 'AML')

# cor(df_leukem[, 1:14], as.numeric(df_leukem[, 15]), method=c('pearson', 'kendall', 'spearman'))

set.seed(1000)
nn <- neuralnet(ALL + MLL + AML ~ 
                g4  + g5  + g7 + g8 + g10 + g12,
                data = nnet_leukemtrain, hidden = c(3), algorithm = 'rprop+',
                act.fct  = 'logistic', linear.output = FALSE)

mypredict <- compute(nn, leukem_test[-14])$net.result
maxidx <- function(arr){ 
  return(which(arr == max(arr)))
}
idx <- apply(mypredict, c(1), maxidx)
prediction <- c(0, 1, 2)[idx]
u <- union(prediction, leukem_test$Class)
xtab <- table(factor(prediction, u), factor(leukem_test$Class, u))
results <- confusionMatrix(xtab)
results

# as.matrix(results, what = 'overall')
# as.matrix(results, what = 'classes')

Confusion Matrix and Statistics

   
    1 0 2
  1 3 2 0
  0 1 3 1
  2 0 0 6

Overall Statistics
                                          
               Accuracy : 0.75            
                 95% CI : (0.4762, 0.9273)
    No Information Rate : 0.4375          
    P-Value [Acc > NIR] : 0.0115          
                                          
                  Kappa : 0.6213          
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: 1 Class: 0 Class: 2
Sensitivity            0.7500   0.6000   0.8571
Specificity            0.8333   0.8182   1.0000
Pos Pred Value         0.6000   0.6000   1.0000
Neg Pred Value         0.9091   0.8182   0.9000
Prevalence             0.2500   0.3125   0.4375
Detection Rate         0.1875   0.1875   0.3750
Detection Prevalence   0.3125   0.3125   0.3750
Balanced Accuracy      0.7917   0.7091   0.9286

The Artificial Neural Networks produced the **second-highest performance accuracy** among all models. Its accuracy reached 75% with a statistically significant results (p-value = 0.0115 < 0.05) since model evaluation provides a sufficient evidence to conclude that ANN may classify leukemia diagnosis correctly.


### A2.5 Results and Model Performance Discussion
**We encourage to use KNN for the leukemia dataset.** It is expected that KNN generates higher performance metrics among other models because the given dataset is low dimensional. Hence, KNN works well with small number of input variables. In addition, the main advantage of KNN over other algorithms is that KNN can be used for multiclass classification. In addition, the leukemia dataset consists of continuous features. These variables can help to minimize the resulting false negatives since KNN is suitable for continuous valies because it uses distance formulas.

The SVM, on one hand, generates poor predictions on the leukemia dataset since SVM, in its most basic type, does not support multiclass classification tasks. Therefore, it is expected that SVM generates the poorest performance in this dataset. 

**The sensitivity for all classes in KNN is higher than in SVM and ANN.** Hypothesis testing shows that KNN generated statistical significant differences to its predictions. This result implies that KNN is a good model for the leukemia dataset. While ANN also shows statistical difference, KNN has higher performance rates than ANN. In addition to sensitivity rates, it is interesting to note that Class 1 (MLL) generated the lowest sensitivity among the leukemia classes. One possible reason to this result may be accounted to the lack of distribution in its class due to misbalanced dataset. While the existence of misbalanced dataset is not in the worst condition and should not do any upsampling nor downsampling, it is possible that this scenario may explain to the low sensitivity of Class 1.


## B. Model Performance Measurements

### B1. Please write a formula and explain the meaning, pros, and cons of each measurement.

In this visualization, we have two sections which have been outlined. We have the predicted classifications section which contains three subsections for each of the classes (i.e., *setosa, versicolor,* and *virginica*) we want to classify into and the actual classifications section which has three subsections for each of the classes. Having visualized this confusion matrix, we can use this visualization to calculate the fundamental classification performance metrics. 

### <code>as.table(results)</code>

#### Confusion Matrix
Confusion matrix can be used on data where the actual target variables are known. It is a  summary of predicted results in a specific table layout that allows visualization of the machine learning model's performance  for a binary or multi-class classification problem. Finding a confusion matrix assists us in determining various performance metrics such as Precision, Recall, and F1 Score. **Confusion matrix, precision, recall, and F1 score provides better insights into the prediction as compared to accuracy performance metrics.**

| Variable  | Setosa        | Versicolor      | Virginica       |
|:---------:|---------------|-----------------|-----------------|
| Setosa    | $P_{SS} = 10$ | $P_{SVe}  = 0 $ | $P_{SVi}  = 0 $ |
| Versicolor| $P_{VeS} = 0$ | $P_{VeVe} = 10$ | $P_{VeVi} = 1 $ |
| Virginica | $P_{ViS} = 0$ | $P_{ViVe} = 0 $ | $P_{ViVi} = 13$ |

#### Setosa ($S$) Class <br>
The variable $P_{SS}$ represents the number of predictions where setosa were correctly classified as setosa. This is the true positive for the Setosa Class. Meanwhile, variables $P_{SVe}$ and $P_{SVi}$ represent the number of predictions were setosa were **incorrectly** classified as versicolor and virginica, respectively.

#### Versicolor ($Ve$) Class <br>
The variable $P_{VeVe}$ represents the number of predictions where versicolor were correctly classified as versicolor. This is the true positive of the Versicolor class. Meanwhile, variables $P_{VeS}$ and $P_{VeVi}$ represent the number of predictions were versicolor were **incorrectly** classified as setosa and virgnica, respectively.

#### Virginica ($Vi$) Class <br>
The variable $P_{ViVi}$ represents the number of predictions where virginica were correctly classified as virginica. This is the true positive of the Virginica class. Meanwhile, variables $P_{ViS}$ and $P_{ViVe}$ represent the number of predictions where virginica were **incorrectly** classified as setosa and versicolor, respectively.

True Positive (TP)
- $ TP_{S}  = P_{SS}   = 10 $
- $ TP_{Ve} = P_{VeVe} = 10 $
- $ TP_{Vi} = P_{ViVi} = 13 $

True Negative (TN)
- $ TN_{S}  = P_{VeVe} + P_{VeVi} + P_{ViVe} + P_{ViVi} = 10 + 1 + 0 + 13 = 24$
- $ TN_{Ve} = P_{SS} + P_{SVi} + P_{ViS} + P_{ViVi} = 10 + 0 + 0 + 13 = 23$
- $ TN_{Vi} = P_{SS} + P_{SVe} + P_{VeS} + P_{VeVe} = 10 + 0 + 0 + 10 = 20$

False Positive (FP) 
- $FP_{S}  = P_{SVe} + P_{SVi}  = 0 + 0 = 0$
- $FP_{Ve} = P_{VeS} + P_{VeVi} = 0 + 1 = 1$
- $FP_{Vi} = P_{ViS} + P_{ViVe} = 0 + 0 = 0$

False Negative (FN) 
- $ FN_{S}  = P_{VeS} + P_{ViS}  = 0 + 0 = 0 $ 
- $ FN_{Ve} = P_{SVe} + P_{ViVe} = 0 + 0 = 0 $ 
- $ FN_{Vi} = P_{SVi} + P_{VeVi} = 0 + 1 = 1 $

### <code>as.matrix(results, what = 'overall')</code>
#### Accuracy
Accuracy is calculated as the ratio of the number of correct classifications to the total number of classifications. From our confusion matrix, the correct classifications are the True Positives for each class and the total number of classifications is the sum of every value in the confusion matrix, including the True Positives. **The accuracy metric is best suited as a performance indicator for balanced data, especially that the impact of False Positives and False Negatives are similar. Its major disadvantage is that Accuracy is not suited for misbalanced datasets. For a misbalanced data, when the model predicts that each point belongs to the majority class, the accuracy will be high but not accurate. Real-world data is usually imbalanced.** 

\begin{equation*}
\begin{aligned}
    \text{Accuracy} &= \frac{P_{SS} + P_{VeVe} + P_{ViVi}}{P_{SS} + P_{SVe} + P_{SVi} + P_{VeS} + P_{VeVe} + P_{VeVi} + P_{ViSe} + P_{ViVe} + P_{ViVi}} \\
    \text{Accuracy} &= \frac{10 + 10 + 13}{10 + 0 + 0 + 0 + 10 + 1 + 0 + 0 + 13} \\
    \text{Accuracy} &= \frac{33}{34} \\
    \text{Accuracy} &= 0.9705882353
\end{aligned}
\end{equation*}

#### Kappa
The Kappa Coefficient can be used to assess classification accuracy. It compares how well the classification performs to a map in which all values are assigned at random. The Kappa coefficient can range between -1 and 1. The value $k=0$ indicates that the classification is as good as random, while $k>0$ indicates that it is significantly better than random, and $k=0$ indicates that it is significantly worse than random. **However, many articles have questioned the use of Kappas and it is therefore not recommended (Pontius Jr and Millones, 2011).**

#### Other Accuracy Metrics
The **AccuracyLower** and **AccuracyUpper** refers to the accuracy interval obtained by the model. These are only additional descriptive metrics that corresponds to a finite possible values of accuracy as a distribution. These metrics are not entirely useful for model evaluation. 

The null accuracy, denoted as **AccuracyNull**, is an accuracy metric that can be measured through the most frequent class. Since the accuracy does not tell us the underlying distribution of the response values, we can examine the classes' distribution by using null accuracy. However, the null accuracy does not tell you what type of errors your classifier is making.

Lastly, the McNemar's test obtains the errors acquired from the model performance. This test evaluate if there are significant differences between the counts of the classes (i.e., Setosa, Versicolor, Virginica). It rejects the null hypothesis if the classification model have a different proportion of errors on the testing set while it fails to reject the null hypothesis if classificers have a similar proportion of errors on the testing set. **In general, what McNemar's test tells us is the marginal homogeneity of the model performance.** However, several research does not consider this metric since it is not applicable for misbalanced datasets.

### <code>as.matrix(results, what = 'classes')</code>

#### Sensitivity
Sensitivity is calculated as the ratio of the True Positives of a specific class to the sum of its True Positives and False Negatives. The equation below is the calculation of sensitivity for the setosa class. **The sensitivity metric is highly suitable for tasks when the impact of False Negatives must be minimized. This implies that if we want to determine the rate of all positive samples correctly predicted, we can use Sensitivity. One major example is the disease detection, where we want to distinguish the rate of patients who classified as healthy but really diagnosed with the disease. Its major disadvantage is if we apply sensitivity for tasks covering all True Negatives and don't want the False Positive (i.e., drug test, alcohol test)** 

\begin{equation*}
\begin{aligned}
    \text{Sensitivity(Setosa)} &= \frac{P_{SS}}{P_{SS} + P_{VeS} + P_{ViS}} \\
    \text{Sensitivity(Setosa)} &= \frac{TP_{S}}{TP_{S} + FN_{S}} \\
    \text{Sensitivity(Setosa)} &= \frac{10}{10 + 0} \\
    \text{Sensitivity(Setosa)} &= 1.0000 \\
\end{aligned}
\end{equation*}

Sensitivity for the Versicolor class
\begin{equation*}
\begin{aligned}
    \text{Sensitivity(Versicolor)} &= \frac{P_{VeVe}}{P_{VeVe} + P_{SVe} + P_{ViVe}} \\
    \text{Sensitivity(Versicolor)} &= \frac{TP_{Ve}}{TP_{Ve} + FN_{Ve}} \\
    \text{Sensitivity(Versicolor)} &= \frac{10}{10 + 0} \\
    \text{Sensitivity(Versicolor)} &= 1.0000 \\
\end{aligned}
\end{equation*}

Sensitivity for the Virginica class
\begin{equation*}
\begin{aligned}
    \text{Sensitivity(Virginica)} &= \frac{P_{ViVi}}{P_{ViVi} + P_{SVi} + P_{VeVi}} \\
    \text{Sensitivity(Virginica)} &= \frac{TP_{Vi}}{TP_{Vi} + FN_{Vi}} \\
    \text{Sensitivity(Virginica)} &= \frac{13}{13 + 1} \\
    \text{Sensitivity(Virginica)} &= 0.9285714286 \\
\end{aligned}
\end{equation*}

#### Specificity
Specificity is calculated as the ratio of the True Negatives of a specific class to the sum of its True Negatives and False Positives. The equation below is the calculation of specificity for the setosa class. **The Specificity is a good measurement when we want to cover all True Negatives and do not want False Positives, as contrast to the Sensitivity metric. This metric is best suited for tasks that relates to the test's abilitiy to correctly reject healthy patients without a condition. A test with perfect specificity will recognize all patients without the disease by testing negative, so a positive test result would definitely rule in the presence of the disease. Other examples of applying Specificity are drug test and alcohol test.** 

**Specificity, however, is not a good measure for covering False Negatives. A negative prediction with high specificity is not necessarily useful for ruling out disease. For instance, a test that always returns a negative test result will have a specificity of 100% because specificity does not consider false negatives. As such, this test would return negative for patients with the disease, making it useless for ruling out the disease.**

\begin{equation*}
\begin{aligned}
    \text{Specificity(Setosa)} &= \frac{P_{VeVe} + P_{VeVi} + P_{ViVe} + P_{ViVi}}{P_{VeVe} + P_{VeVi} + P_{ViVe} + P_{ViVi} + P_{SVe} + P_{SVi}} \\
    \text{Specificity(Setosa)} &= \frac{TN_{S}}{TN_{S} + FP_{S}} \\
    \text{Specificity(Setosa)} &= \frac{23}{23 + 0} \\
    \text{Specificity(Setosa)} &= 1.0000 \\
\end{aligned}
\end{equation*}

Specificity for the Versicolor class
\begin{equation*}
\begin{aligned}
    \text{Specificity(Versicolor)} &= \frac{P_{SS} + P_{SVi} + P_{ViS} + P_{ViVi}}{P_{SS} + P_{SVi} + P_{ViS} + P_{ViVi} + P_{VeS} + P_{VeVi}} \\
    \text{Specificity(Versicolor)} &= \frac{TN_{Ve}}{TN_{Ve} + FP_{Ve}} \\
    \text{Specificity(Versicolor)} &= \frac{23}{23 + 1} \\
    \text{Specificity(Versicolor)} &= 0.9583333333 \\
\end{aligned}
\end{equation*}

Specificity for the Virginica class
\begin{equation*}
\begin{aligned}
    \text{Specificity(Versicolor)} &= \frac{P_{SS} + P_{SVe} + P_{VeS} + P_{VeVe}}{P_{SS} + P_{SVe} + P_{VeS} + P_{VeVe} + P_{ViS} + P_{ViVe}} \\
    \text{Specificity(Versicolor)} &= \frac{TN_{Vi}}{TN_{Vi} + FP_{Vi}} \\
    \text{Specificity(Versicolor)} &= \frac{20}{20 + 0} \\
    \text{Specificity(Versicolor)} &= 1.0000 \\
\end{aligned}
\end{equation*}

#### Positive Predictive Value

In terms of disease diagnosis, the positive predictive value (PPV) is the ratio of patients truly diagnosed as positive to all those who had positive test results, including healthy subjects who were incorrectly diagnosed as patient. **This characteristic can predict how likely it is for someone to truly be patient, in case of a positive test result. The PPV is primarily used to indicate the likelihood that, in the event of a positive test, the patient actually has the specified disease. However, a disease may have more than one cause, and any single potential cause may not always result in the overt disease seen in a patient.**

**There is a risk of confusing related PPV target conditions, such as misinterpreting a test's PPV or NPV value as having a disease when that PPV actually refers to a predisposition to having that disease.**

\begin{equation*}
\begin{aligned}
    \text{PPV(Setosa)} &= \frac{P_{SS}}{P_{SS} + P_{SVe} + P_{SVi}} \\
    \text{PPV(Setosa)} &= \frac{TP_{S}}{TP_{S} + FP_{S}} \\
    \text{PPV(Setosa)} &= \frac{10}{10 + 0} \\
    \text{PPV(Setosa)} &= 1.0000 \\
\end{aligned}
\end{equation*}

\begin{equation*}
\begin{aligned}
    \text{PPV(Versicolor)} &= \frac{P_{VeVe}}{P_{VeVe} + P_{VeS} + P_{VeVi}} \\
    \text{PPV(Versicolor)} &= \frac{TP_{Ve}}{TP_{Ve} + FP_{Ve}} \\
    \text{PPV(Versicolor)} &= \frac{10}{10 + 1} \\
    \text{PPV(Versicolor)} &= 0.9090909091 \\
\end{aligned}
\end{equation*}

\begin{equation*}
\begin{aligned}
    \text{PPV(Virginica)} &= \frac{P_{ViVi}}{P_{ViVi} + P_{ViS} + P_{ViVe}} \\
    \text{PPV(Virginica)} &= \frac{TP_{Vi}}{TP_{Vi} + FP_{Vi}} \\
    \text{PPV(Virginica)} &= \frac{13}{13 + 0} \\
    \text{PPV(Virginica)} &= 1.0000 \\
\end{aligned}
\end{equation*}

#### Negative Predictive Value

The negative predictive value (NPV) is the proportion of a negative result in a test that are true negative results. **The negative predictive value indicates how confident you can be if you test negative for a disease. It indicates how accurate the negative test result is. In other words, it indicates the likelihood that you do not have the disease. However, there is a risk of confusing related PPV target conditions, such as misinterpreting a test's NPV value as having a disease when that NPV actually refers to a predisposition to having that disease.**

\begin{equation*}
\begin{aligned}
    \text{NPV(Setosa)} &= \frac{P_{VeVe} + P_{VeVi} + P_{ViVe} + P_{ViVi}}{P_{VeVe} + P_{VeVi} + P_{ViVe} + P_{ViVi} + P_{VeS} + P_{ViS}} \\
    \text{NPV(Setosa)} &= \frac{TN_{S}}{TN_{S} + FN_{S}} \\
    \text{NPV(Setosa)} &= \frac{24}{24 + 0} \\
    \text{NPV(Setosa)} &= 1.0000 \\
\end{aligned}
\end{equation*}

\begin{equation*}
\begin{aligned}
    \text{NPV(Versicolor)} &= \frac{P_{SS} + P_{SVi} + P_{ViS} + P_{ViVi}}{P_{SS} + P_{SVi} + P_{ViS} + P_{ViVi} + P_{SVe} + P_{ViVe}} \\
    \text{NPV(Versicolor)} &= \frac{TN_{Ve}}{TN_{Ve} + FN_{Ve}} \\
    \text{NPV(Versicolor)} &= \frac{23}{23 + 0} \\
    \text{NPV(Versicolor)} &= 1.0000 \\
\end{aligned}
\end{equation*}

\begin{equation*}
\begin{aligned}
    \text{NPV(Virginica)} &= \frac{P_{SS} + P_{SVe} + P_{VeS} + P_{VeVe}}{P_{SS} + P_{SVe} + P_{VeS} + P_{VeVe} + P_{SVi} + P_{VeVi}} \\
    \text{NPV(Virginica)} &= \frac{TN_{Vi}}{TN_{Vi} + FN_{Vi}} \\
    \text{NPV(Virginica)} &= \frac{20}{20 + 1} \\
    \text{NPV(Virginica)} &= 0.9523809524 \\
\end{aligned}
\end{equation*}

#### Precision
Precision is a multi-class confusion matrix is the measure of the accuracy relative to the prediction of a specific class. It is calculated as the ratio of the True Positives of the class in question to the sum of its True Positives and False Positives. The equation below is the calculation of precision for the setosa class.

**Advantage: Precision is a good measure if the impact of False Positives must be minimized. This metric can also determine if the classification model returns more relevant results than irrelevant ones. In addition, precision would be the best metric to use if you want to determine the correctness of your model.** <br>
**Disadvantage: Precision may not be applicable to classification tasks if the impact of False Negatives are minimized**
 

\begin{equation*}
\begin{aligned}
    \text{Precision(Setosa)} &= \frac{P_{SS}}{P_{SS} + P_{SVe} + P_{SVi} } \\
    \text{Precision(Setosa)} &= \frac{TP_{S}}{TP_{S} + FP_{S}} \\
    \text{Precision(Setosa)} &= \frac{10}{10 + 0} \\
    \text{Precision(Setosa)} &= 1.0000 \\
\end{aligned}
\end{equation*}

Precision for the Versicolor class
\begin{equation*}
\begin{aligned}
    \text{Precision(Versicolor)} &= \frac{P_{VeVe}}{P_{VeVe} + P_{VeS} + P_{VeVi}} \\
    \text{Precision(Versicolor)} &= \frac{TP_{Ve}}{TP_{Ve} + FP_{Ve}} \\
    \text{Precision(Versicolor)} &= \frac{10}{10 + 1} \\
    \text{Precision(Versicolor)} &= 0.9090909091 \\
\end{aligned}
\end{equation*}

Precision for the Virginica class
\begin{equation*}
\begin{aligned}
    \text{Precision(Virginica)} &= \frac{P_{ViVi}}{P_{ViVi} + P_{ViS} + P_{ViVe}} \\
    \text{Precision(Virginica)} &= \frac{TP_{Vi}}{TP_{Vi} + FP_{Vi}} \\
    \text{Precision(Virginica)} &= \frac{10}{10 + 0} \\
    \text{Precision(Virginica)} &= 1.0000 \\
\end{aligned}
\end{equation*}

#### Recall
In an imbalanced classification problem with more than two classes, recall is calculated as the sum of true positives across all classes divided by the sum of true positives and false negatives across all classes. Therefore, recall is also referred to as the true positive rate or sensitivity.

**Advantage: Recall can determine the ability of a classification model to identify all data points in a revelant class.** <br>
**Disadvantage: We cannot use Recall for classification tasks that covers all True Negatives and don't want the False Positive (i.e., drug test, alcohol test)**
 
\begin{equation*}
\begin{aligned}
    \text{Recall(Setosa)} &= \frac{P_{SS}}{P_{SS} + P_{VeS} + P_{ViS}} \\
    \text{Recall(Setosa)} &= \frac{TP_{S}}{TP_{S} + FN_{S}} \\
    \text{Recall(Setosa)} &= \frac{10}{10 + 0} \\
    \text{Recall(Setosa)} &= 1.0000 \\
\end{aligned}
\end{equation*}

Precision for the Versicolor class
\begin{equation*}
\begin{aligned}
    \text{Recall(Versicolor)} &= \frac{P_{VeVe}}{P_{VeVe} + P_{SVe} + P_{ViVe}} \\
    \text{Recall(Versicolor)} &= \frac{TP_{Ve}}{TP_{Ve} + FN_{Ve}} \\
    \text{Recall(Versicolor)} &= \frac{10}{10 + 0} \\
    \text{Recall(Versicolor)} &= 1.0000 \\
\end{aligned}
\end{equation*}

Precision for the Virginica class
\begin{equation*}
\begin{aligned}
    \text{Recall(Virginica)} &= \frac{P_{ViVi}}{P_{ViVi} + P_{SVi} + P_{VeVi}} \\
    \text{Recall(Virginica)} &= \frac{TP_{Vi}}{TP_{Vi} + FN_{Vi}} \\
    \text{Recall(Virginica)} &= \frac{13}{13 + 1} \\
    \text{Recall(Virginica)} &= 0.9285714286 \\
\end{aligned}
\end{equation*}

#### F1-Score
The F1-score is a measure of a model’s accuracy on a dataset. It is used to evaluate binary classification systems, which classify examples into ‘positive’ or ‘negative’. It is a way of combining the precision and recall of the model, and it is defined as the harmonic mean of the model’s precision and recall.

**Advantage: The F1-score takes into account how the input data is distributed. For instance, if the data is highly imabalnced, then F1-score can provide a better assessment of model performance since it combines the precision and recall of the model.**<br>
**Disadvantage: It may be hard to interpret since it is a blend of the precision and recall of the model. Moreover, accuracy may be suitable for balanced datasets than F1-score.**


\begin{equation*}
\begin{aligned}
    \text{F1(Setosa)} &= \frac{P_{SS}}{P_{SS} + \frac{1}{2} (P_{SVe} + P_{SVi} + P_{VeS} + P_{ViS})} \\
    \text{F1(Setosa)} &= \frac{TP_{S}}{TP_{S} + \frac{1}{2} (FP_{S} + FN_{S})} \\
    \text{F1(Setosa)} &= \frac{10}{10 + \frac{1}{2}(0 + 0)} \\
    \text{F1(Setosa)} &= 1.000 \\
\end{aligned}
\end{equation*}

F1-Score for the Versicolor class
\begin{equation*}
\begin{aligned}
    \text{F1(Versicolor)} &= \frac{P_{Ve}}{P_{Ve} +   \frac{1}{2} (P_{VeS} + P_{VeVi} + P_{SVe} + P_{ViVe})} \\
    \text{F1(Versicolor)} &= \frac{TP_{Ve}}{TP_{Ve} + \frac{1}{2} (FP_{Ve} + FN_{Ve})} \\
    \text{F1(Versicolor)} &= \frac{10}{10 + \frac{1}{2}(1 + 0)} \\
    \text{F1(Versicolor)} &= 0.9523809524 \\
\end{aligned}
\end{equation*}

F1-Score for the Virginica class
\begin{equation*}
\begin{aligned}
    \text{F1(Virginica)} &= \frac{P_{Vi}}{P_{Vi} +   \frac{1}{2} (P_{ViS} + P_{ViVe} + P_{SVi} + P_{VeVi})} \\
    \text{F1(Virginica)} &= \frac{TP_{Vi}}{TP_{Vi} + \frac{1}{2} (FP_{Vi} + FN_{Vi})} \\
    \text{F1(Virginica)} &= \frac{13}{13 + \frac{1}{2}(0 + 1)} \\
    \text{F1(Virginica)} &= 0.962962963 \\
\end{aligned}
\end{equation*}


#### Prevalence
The equation below is the calculation of Prevalence for the setosa class.
\begin{equation*}
\begin{aligned}
    \text{Prevalence(Setosa)} &= \frac{P_{SS} + P_{VeS} + P_{ViS}}{\text{Total Samples}} \\
    \text{Prevalence(Setosa)} &= \frac{10}{34} \\
    \text{Prevalence(Setosa)} &= 0.2941176471 \\
\end{aligned}
\end{equation*}

Prevalence for the Versicolor class
\begin{equation*}
\begin{aligned}
    \text{Prevalence(Versicolor)} &= \frac{P_{SVe} + P_{VeVe} + P_{ViVe}}{\text{Total Samples}} \\
    \text{Prevalence(Versicolor)} &= \frac{10}{34} \\
    \text{Prevalence(Versicolor)} &= 0.2941176471 \\
\end{aligned}
\end{equation*}

Prevalence for the Virginica class
\begin{equation*}
\begin{aligned}
    \text{Prevalence(Virginica)} &= \frac{P_{SVi} + P_{VeVi} + P_{ViVi}}{\text{Total Samples}} \\
    \text{Prevalence(Virginica)} &= \frac{0 + 1 + 13}{34} \\
    \text{Prevalence(Virginica)} &= 0.4117647059 \\
\end{aligned}
\end{equation*}

#### Detection Rate

\begin{equation*}
\begin{aligned}
    \text{DR(Setosa)} &= \frac{P_{SS}}{P_{SS} + P_{SVe} + P_{SVi} + P_{VeS} + P_{ViS} + P_{VeVe} + P_{VeVi} + P_{ViVe} + P_{ViVi}} \\
    \text{DR(Setosa)} &= \frac{TP_{S}}{TP_{S} + FP_{S} + FN_{S} + TN_{S}} \\
    \text{DR(Setosa)} &= \frac{10}{10 + 0 + 0 + 24} \\
    \text{DR(Setosa)} &= 0.2941176471 \\
\end{aligned}
\end{equation*}

\begin{equation*}
\begin{aligned}
    \text{DR(Versicolor)} &= \frac{P_{VeVe}}{P_{VeVe} + P_{VeS} + P_{VeVi} + P_{SVe} + P_{ViVe} + P_{SS} + P_{SVi} + P_{ViS} + P_{ViVi}} \\
    \text{DR(Versicolor)} &= \frac{TP_{Ve}}{TP_{Ve} + FP_{Ve} + FN_{Ve} + TN_{Ve}} \\
    \text{DR(Versicolor)} &= \frac{10}{10 + 1 + 0 + 23} \\
    \text{DR(Versicolor)} &= 0.2941176471 \\
\end{aligned}
\end{equation*}

\begin{equation*}
\begin{aligned}
    \text{DR(Virginica)} &= \frac{P_{ViVi}}{P_{ViVi} + P_{ViS} + P_{ViVe} + P_{SVi} + P_{VeVi} + P_{SS} + P_{SVe} + P_{VeS} + P_{VeVe}} \\
    \text{DR(Virginica)} &= \frac{TP_{Vi}}{TP_{Vi} + FP_{Vi} + FN_{Vi} + TN_{Vi}} \\
    \text{DR(Virginica)} &= \frac{13}{13 + 0 + 1 + 20} \\
    \text{DR(Virginica)} &= 0.3823529412 \\
\end{aligned}
\end{equation*}

#### Detection Prevalence
Detection prevalence is defined as the number of predicted positive events (both true positive and false positive) divided by the total number of predictions. The equation below is the calculation of Detection Prevalence (DP) for the setosa class.
\begin{equation*}
\begin{aligned}
    \text{DP(Setosa)} &= \frac{P_{SS} + P_{SVe} + P_{SVi}}{\text{Total Samples}} \\
    \text{DP(Setosa)} &= \frac{10 + 0 + 0}{34} \\
    \text{DP(Setosa)} &= 0.2941176471 \\
\end{aligned}
\end{equation*}

Detection Prevalence for the Versicolor class
\begin{equation*}
\begin{aligned}
    \text{DP(Versicolor)} &= \frac{P_{VeVe} + P_{VeS} + P_{VeVi}}{\text{Total Samples}} \\
    \text{DP(Versicolor)} &= \frac{10 + 0 + 1}{34} \\
    \text{DP(Versicolor)} &= 0.3235294118 \\
\end{aligned}
\end{equation*}

Detection Prevalence for the Virginica class
\begin{equation*}
\begin{aligned}
    \text{DP(Virginica)} &= \frac{P_{ViVi} + P_{ViS} + P_{ViVe}}{\text{Total Samples}} \\
    \text{DP(Virginica)} &= \frac{13 + 0 + 0}{34} \\
    \text{DP(Virginica)} &= 0.3823529412 \\
\end{aligned}
\end{equation*}

#### Balanced Accuracy

\begin{equation*}
\begin{aligned}
    \text{BA(Setosa)} &= \frac{\frac{P_{SS}}{P_{VeS} + P_{ViS}} + \frac{P_{VeVe} + P_{VeVi} + P_{ViVe} + P_{ViVi}}{P_{VeVe} + P_{VeVi} + P_{ViVe} + P_{ViVi} + P_{SVe} + P_{SVi}}}{2} \\
    \text{BA(Setosa)} &= \frac{\text{Sensitivity(Setosa)} + \text{Specificity(Setosa)}}{2} \\
    \text{BA(Setosa)} &= \frac{1.0000 + 1.0000}{2} \\
    \text{BA(Setosa)} &= 1.0000 \\
\end{aligned}
\end{equation*}

\begin{equation*}
\begin{aligned}
    \text{BA(Versicolor)} &= \frac{\frac{P_{VeVe}}{P_{SVe} + P_{ViVe}} + \frac{P_{SS} + P_{SVi} + P_{ViS} + P_{ViVi}}{P_{SS} + P_{SVi} + P_{ViS} + P_{ViVi} + P_{VeS} + P_{VeVi}}}{2} \\
    \text{BA(Versicolor)} &= \frac{\text{Sensitivity(Versicolor)} + \text{Specificity(Versicolor)}}{2} \\
    \text{BA(Versicolor)} &= \frac{1.0000 + 0.9583333333}{2} \\
    \text{BA(Versicolor)} &= 0.9791666667 \\
\end{aligned}
\end{equation*}

\begin{equation*}
\begin{aligned}
    \text{BA(Virginica)} &= \frac{\frac{P_{ViVi}}{P_{ViVi} + P_{SVi} + P_{VeVi}} + \frac{P_{SS} + P_{SVe} + P_{VeS} + P_{VeVe}}{P_{SS} + P_{SVe} + P_{VeS} + P_{VeVe} + P_{ViS} + P_{ViVe}}}{2} \\
    \text{BA(Virginica)} &= \frac{\text{Sensitivity(Virginica)} + \text{Specificity(Virginica)}}{2} \\
    \text{BA(Virginica)} &= \frac{1.0000 + 0.9583333333}{2} \\
    \text{BA(Virginica)} &= 0.9642857143 \\
\end{aligned}
\end{equation*}

## C. Linear Programming

We denote $w, x, y,$ and $z$ as the decision variables in this LP problem, such that each variable represents the food choices in Menu 1, 2, 3, and 4, respectively. In this problem, we primarily aim to minimize the cost of diet of Ajarn Ake. Therefore, we denote $C$ as the total cost of Ajarn Ake's diet subject to various constraints. 

According to his doctor, the calories per day is at least 500 kcal. Hence, calorie intake from every menus should be allocated optimally with at least 500 kcal. This constraint is also adaptable to the protein intake of at leat 6 grams, carbohydrates of at least 10 grams, and fat of at least 8 grams. Therefore, we construct the following objective function subject to four primary constraints with four non-negativity contraints. 

\begin{equation}
\begin{split}
&\text{min C} = 500w + 200x + 300y + 800z \\
&\text{subject to: } \\
\end{split}
\end{equation}

\begin{equation}
\begin{split}
    400w + 200x + 150y + 500z &\geq 500 \\
    3w + 2x &\geq 6 \\
    2w + 2x + 4y + 4z &\geq 10 \\
    2w + 4x +  y + 5z &\geq 8  \\
    w, x, y, z &\geq 0 \\
\end{split}
\end{equation}

In [13]:
f.objective   <- c(500, 200, 300, 800)
f.constraints <- matrix(c(  400, 200, 150, 500, 
                              3,   2,   0,   0, 
                              2,   2,   4,   4, 
                              2,   4,   1,   5), nrow = 4, byrow = TRUE)

calories      <- 500 
protein       <- 6  
carbohydrates <- 10 
fat           <- 8  

# Right-hand side of the LP constraints
f.rhs         <- c(calories, protein, carbohydrates, fat)
f.dir         <- c('>=', '>=', '>=', '>=')
optimal       <- lp(direction = 'min', f.objective, f.constraints, f.dir , f.rhs)

# summary(opt)
optimal$solution
optimal$objval

**Conclusion**: The results from linear programming have shown that Ajarn Ake should buy zero orders in Menu 1, three orderes of Menu 2, one order of Menu 3, and zero order of Menu 4. Based on these orders, the cost of Ajarn Ake's diet will accumulate to 900 Baht, which will provide him at least 500 calories, 6 grams of protein, 10 grams of carbohydrates, and 8 grams of fat.

## D. Data Augmentation

In [13]:
get_width <- function(image) {
  width  <- image_info(image)[2]
  width  <- width %>% getElement('width')
  return (width)
}

get_height <- function(image) {
  height <- image_info(image)[3]
  height <- height %>% getElement('height')
  return (height)
}

crop_image <- function(image, location, idx) 
  {
  randomA = sample(100:200, 1)
  randomB = sample(100:200, 1)
  image <- image_crop(image, paste(as.character(randomA),
                                   'x',
                                   as.character(randomB), 
                                   sep = ''))
  loc = location
  path = paste(as.character(idx), 'crop.png', sep = '')
  png(file   = paste(loc, path),
      width  = get_width(image) + 50, 
      height = get_height(image)+ 50, 
      units = 'px', bg = 'transparent')
  plot(image)
  dev.off()
}


scale_image <- function(image, location, idx) 
  {
  image <- image_scale(image, sample(200:500, 1))
  loc  = location 
  path = paste(as.character(idx), 'scale.png', sep = '')
  png(file   = paste(loc, path), 
      width  = get_width(image), 
      height = get_height(image), 
      units  = 'px', bg = 'transparent')
  plot(image)
  dev.off()
}


rotate_image <- function(image, location, idx) { 
  image <- image_rotate(image, sample(0:180, 1))
  loc  = location 
  path = paste(as.character(idx), 'rotate.png', sep = '')
  png(file = paste(loc, path), 
      width  = get_width(image), 
      height = get_height(image), 
      units = 'px', bg = 'transparent')
  plot(image)
  dev.off()
}


modulate_image <- function(image, location, idx) 
  {
  image <- image_modulate(image, brightness = sample(5:200, 1), 
                          saturation = sample(1:100, 1), 
                          hue = sample(1:100, 1))
  loc  = location
  path = paste(as.character(idx), 'modulate.png', sep = '')
  png(file = paste(loc, path), 
      width  = get_width(image), 
      height = get_height(image), 
      units = 'px', bg = 'transparent')
  plot(image)
  dev.off()
}

colorize_image <- function(image, location, idx)
{
  image <- image_colorize(image, opacity = sample(10:50, 1),
                          color = sample(c('blue', 'red', 'yellow', 'green'), 1))
  loc  = location
  path = paste(as.character(idx), 'colorize.png', sep = '')
  png(file   = paste(loc, path), 
      width  = get_width(image), 
      height = get_height(image), 
      units = 'px', bg = 'transparent')
  plot(image)
  dev.off()
}

contrast_image <- function(image, location, idx) 
{
  image <- image_contrast(image, sharpen = sample(1:200, 1))
  loc  = location
  path = paste(as.character(idx), 'contrast.png', sep = '')
  png(file   = paste(loc, path), 
      width  = get_width(image), 
      height = get_height(image), 
      units = 'px', bg = 'transparent')
  plot(image)
  dev.off()
}

gaussianblur_image <- function(image, location, idx)
{
  image <- image %>% image_convolve('Gaussian:0x5', scaling = '70, 30%')
  loc  = location
  path = paste(as.character(idx), 'gaussian.png', sep = '')
  png(file   = paste(loc, path), 
      width  = get_width(image), 
      height = get_height(image), 
      units = 'px', bg = 'transparent')
  plot(image)
  dev.off()
}

edge_image <- function(image, location, idx)
{
  image <- image %>% image_convolve('Sobel') %>% image_negate()
  loc  = location
  path = paste(as.character(idx), 'edge.png', sep = '')
  png(file   = paste(loc, path), 
      width  = get_width(image), 
      height = get_height(image), 
      units = 'px', bg = 'transparent')
  plot(image)
  dev.off()
}

sobelscale_image <- function(image, location, idx)
{
  image <- image %>% image_convolve('Sobel', 
                                    scaling = sample(0:5, 1), 
                                    bias = sample(0:5, 1))
  loc  = location
  path = paste(as.character(idx), 'sobelscale.png', sep = '')
  png(file   = paste(loc, path), 
      width  = get_width(image), 
      height = get_height(image), 
      units = 'px', bg = 'transparent')
  plot(image)
  dev.off()
}

medianblur_image <- function(image, location, idx)
{
  image <- image_median(image, radius = sample(1:10, 1))
  loc  = location
  path = paste(as.character(idx), 'medianblur.png', sep = '')
  png(file   = paste(loc, path), 
      width  = get_width(image), 
      height = get_height(image), 
      units = 'px', bg = 'transparent')
  plot(image)
  dev.off()
}

negate_image <- function(image, location, idx)
{
  image <- image %>% image_convolve('Gaussian:0x5') %>% image_negate()
  loc  = location
  path = paste(as.character(idx), 'negate.png', sep = '')
  png(file   = paste(loc, path), 
      width  = get_width(image), 
      height = get_height(image), 
      units = 'px', bg = 'transparent')
  plot(image)
  dev.off()
}

main <- function(image, location) {
  for (idx in 1:2)
  {
    scale_image(image, location, idx)
    rotate_image(image, location, idx)
    crop_image(image, location, idx)
    modulate_image(image, location, idx)
    colorize_image(image, location, idx)
    contrast_image(image, location, idx)
    gaussianblur_image(image, location, idx)
    edge_image(image, location, idx)
    sobelscale_image(image, location, idx)
    medianblur_image(image, location, idx)
    negate_image(image, location, idx)
  }
}

image <- image_read('../data/datatwo/x-ray01.jpg', density = NULL, depth = NULL, strip = FALSE)
location = '../data/xrays/xray1/'
main(image, location)

image <- image_read('../data/datatwo/x-ray02.jpg', density = NULL, depth = NULL, strip = FALSE)
location = '../data/xrays/xray2/'
main(image, location)

image <- image_read('../data/datatwo/x-ray03.jpg', density = NULL, depth = NULL, strip = FALSE)
location = '../data/xrays/xray3/'
main(image, location)