<h1 style="text-align: center;">Predictive Performance Assessment Activities</h1>

<p style="text-align: center;">July, 2024</p>

## Predictive Performance Assessment Activity 1 - Prediction Curve

Construct the ROC and P-R curves for Subarachnoid Hemorrhage patients (predicting whether or not their clinical outcome was good or poor):

1. Load the `pROC` R package and load the `aSAH` dataset. True labels are `outcome` and class predictions are `s100b`.

2. Use the `roc` function to plot the ROC curve.

3. Use the resulting `roc` object to compute precision and recall at different cutpoints.
    * Hint: Look at the `coords` function in `pROC`, use argument `x="all"` and see the `ret` argument.

4. Plot the precision-recall curve using the base R `plot` function and the `coords` object.
    * Hint: Look through the examples of the `coords` function documentation, we want to use the base R plot function and set the arguments `type` as well as `ylim`.

5. Repeat step 3, but set argument `x="best"` instead and obtain the sensitivity, specificity, NPV and PPV instead. Try again with a different `best.method` argument.

## Predictive Performance Assessment Activity 2 - Performance Assessment
Going to assess predictive performance of a linear regression model using 10-fold cross-validation on LA Ozone pollution data.

1. Load the `mlbench` R package and load the `Ozone` dataset. Remove any observations with missing data using `na.omit` and remove the first three columns. Outcome of interest is `V4`

2. Install the `caret` R package and complete the following code:
    * `TC = trainControl(method=..., number=..., savePrediction=...)`

3. Use the `train` function to conduct cross-validation with a linear regression model on the data:
    * `Model = train(..., data=..., trControl=..., method=...)`

4. Now try using the holdout set approach to estimate RMSE. Randomly sample 80% of the data (using `sample` function) and use the other 20% as your holdout set. Use the `lm` function on your training set and the `predict` function to get predictions.

## Predictive Performance Assessment Activity 3 - Hyperparameter Selection (BONUS/CHALLENGE)

Apply k-nearest neighbours (kNN) to a forensic glass dataset, containing different types of glasses with elemental characteristics.

1. Load the `Glass` dataset from the `mlbench` R package. The outcome is `Type`.


2. Divide the dataset into training set (80%) and test set (20%)



3. Load the `caret` package and complete the following code to perform 5-fold cross-validation using kNN to select k from \{2,3,4,5,6\}:
    * `TC = trainControl(method="cv", number=..., savePrediction=...)`
    * `Model = train(..., data=..., trControl=...,
    method=..., tuneGrid=...)`

4. Compute the test set predictions using the final model and the `predict` function.


5. Compute performance measures on the test set:
    * Hint: Consider using the `confusionMatrix` function.

# Introduction to Machine Learning Activity Solutions

## Predictive Performance Assessment Activity 1 Solution


<details>
  <summary>Click here for Solution Part 1</summary>
  
  `data(aSAH)`

  `ROC_Obj = roc(aSAH$outcome, aSAH$s100b)`
  
  `plot(ROC_Obj)`

  `PR_Coords = coords(ROC_Obj, x = "all", ret = c("precision", "recall"))`

  `plot(precision ~ recall, PR_Coords, type = "l", ylim = c(0, 1))`

  `Perf_Coords_1 = coords(ROC_Obj, x = "best", ret = c("sensitivity", "specificity", "npv", "ppv"))`

  `Perf_Coords_2 = coords(ROC_Obj, x = "best", best.method = "closest.topleft", ret = c("sensitivity", "specificity", "npv", "ppv"))`
</details>


## Predictive Performance Assessment Activity 2 Solution



<details>
  <summary>Click here for Solution Part 2</summary>
  
  `data(Ozone)`

  `Ozone_Clean = na.omit(Ozone[, 4:13])`

  `TC = trainControl(method = "cv", number = 10, savePrediction = TRUE)`

  `Model = train(V4 ~ ., data = Ozone_Clean, trControl = TC, method = "lm")`

  `Train_SI = sample(1:nrow(Ozone_Clean), size = 0.8 * nrow(Ozone_Clean))`
  
  `Test_SI = which(!(1:nrow(Ozone_Clean) %in% Train_SI))`

  `Train_Set = Ozone_Clean[Train_SI, ]`
  
  `Test_Set = Ozone_Clean[Test_SI, ]`

  `LM_Model = lm(V4 ~ ., Train_Set)`

  `Test_Preds = predict(LM_Model, Test_Set)`

  `RMSE = sum((Test_Preds - Test_Set$V4)^2) / nrow(Test_Set)`
  
  `MAE = sum(abs((Test_Preds - Test_Set$V4))) / nrow(Test_Set)`
</details>


## Predictive Performance Assessment Activity 3 Solution



<details>
  <summary>Click here for Solution Part 3</summary>
  
  `data(Glass)`

  `Train_SI = sample(1:nrow(Glass), size = 0.8 * nrow(Glass))`
  
  `Test_SI = which(!(1:nrow(Glass) %in% Train_SI))`

  `Glass_Train = Glass[Train_SI, ]`
  
  `Glass_Test = Glass[Test_SI, ]`

  `TC = trainControl(method = "cv", number = 10, savePrediction = TRUE)`
  
  `Model = train(Type ~ ., data = Glass_Train, trControl = TC, method = "knn", tuneGrid = data.frame(k = c(2, 3, 4, 5, 6)))`

  `Preds = predict(Model, Glass_Test)`

  `confusionMatrix(data = Preds, reference = Glass_Test$Type)`
</details>

