# Algorithm Test Harnesses

It is difficult (almost impossible) to know beforehand which algorithm will suit best our problem. That's why it is a (_very_) good idea to implement a machine learning test harness that we can use repeatedly and effectively to measure the performance of a particular algorithm.

In this notebook we will create two algorithm test harnesses using:

   - Train-test split.
   - K-Fold Cross-Validation.
    
A test harness is comprised of three key building blocks:

   1. A resampling method.
   2. The algorithm to test.
   3. The evaluation metric used to measure the performance of the algorithm.
    
Let's start our implementation by loading the code and libraries we'll need. We will build our solution on top of the ones we implemented in the [previous notebook](https://github.com/jesus-a-martinez-v/toy-ml/blob/master/src/main/scala/notebooks/baseline_models.ipynb).

In [1]:
import $ivy.`com.github.tototoshi::scala-csv:1.3.5`
import $file.^.datasmarts.ml.toy.scripts.BaselineModels, BaselineModels._
import scala.util.Random

[32mimport [39m[36m$ivy.$                                      
[39m
[32mimport [39m[36m$file.$                                         , BaselineModels._
[39m
[32mimport [39m[36mscala.util.Random[39m

## Data

We'll use the [Pima Indians Diabetes](https://archive.ics.uci.edu/ml/datasets/pima+indians+diabetes) dataset to test our harnesses. Let's load it:

In [2]:
val BASE_DATA_PATH = "../../resources/data"
val pimaIndiansPath = s"$BASE_DATA_PATH/6/pima-indians-diabetes.csv"

val rawData = loadCsv(pimaIndiansPath)
val numberOfRows = rawData.length
val numberOfColumns = rawData.head.length
println(s"Number of rows in dataset: $numberOfRows")
println(s"Number of column in dataset: $numberOfColumns")

val data = (0 until numberOfColumns).toVector.foldLeft(rawData) { (d, i) => textColumnToNumeric(d, i)}

Number of rows in dataset: 768
Number of column in dataset: 9


[36mBASE_DATA_PATH[39m: [32mString[39m = [32m"../../resources/data"[39m
[36mpimaIndiansPath[39m: [32mString[39m = [32m"../../resources/data/6/pima-indians-diabetes.csv"[39m
[36mrawData[39m: [32mVector[39m[[32mVector[39m[[32mData[39m]] = [33mVector[39m(
  [33mVector[39m(
    Text(6),
    Text(148),
    Text(72),
    Text(35),
    Text(0),
    Text(33.6),
    Text(0.627),
    Text(50),
    Text(1)
  ),
[33m...[39m
[36mnumberOfRows[39m: [32mInt[39m = [32m768[39m
[36mnumberOfColumns[39m: [32mInt[39m = [32m9[39m
[36mdata[39m: [32mVector[39m[[32mVector[39m[[32mData[39m]] = [33mVector[39m(
  [33mVector[39m(
    Numeric(6.0),
    Numeric(148.0),
    Numeric(72.0),
    Numeric(35.0),
    Numeric(0.0),
    Numeric(33.6),
    Numeric(0.627),
    Numeric(50.0),
    Numeric(1.0)
  ),
[33m...[39m

## Harnesses Assumptions

Our test harnesses will receive an algorithm to evaluate and an evaluation metric to measure its performance. An algorithm is just a function that takes a __train__ and a __test__ set, as well as optional __parameters__. Let's create the proper types to represent an algorithm:

In [3]:
type Parameters = Map[String, Any]
type Output = Vector[Data]
type Algorithm = (Dataset, Dataset, Parameters) => Output

defined [32mtype[39m [36mParameters[39m
defined [32mtype[39m [36mOutput[39m
defined [32mtype[39m [36mAlgorithm[39m

Good. A few remarks before moving on:

   - We decided to represent the parameters as a map of Strings to Any, where the key is the name of the parameter in camelCase, and the value must be properly casted by the algorithm. If an algorithm do not require any additional parameters, just passing `Map.empty` is enough.
   - Output is just an alias of Vector[Data] used to enhance readibility. Each algorithm takes a train and a test set and outputs the predictions made on the latter.
    
Let's now create a type to represent an evaluation metric:
   

In [4]:
type EvaluationMetric[T <: Data] = (Vector[T], Vector[T]) => Double

defined [32mtype[39m [36mEvaluationMetric[39m

An evaluation metric is a function that takes a vector of actual results and a vector of predictions, then use them to compute some measure that is represented as a double value. By stating that an EvaluationMetric works on any type `T` that's a subclass of `Data` we ensure that it can both represent regression and classification metrics.

Good. Let's now proceed to implement our first test harness.

## Train-Test Algorithm Test Harness

As it name suggests, this harness utilizes a train-test split under the hood.

In [5]:
def evaluateAlgorithmUsingTrainTestSplit[T <: Data](
    dataset: Dataset, 
    algorithm: Algorithm, 
    parameters: Parameters, 
    evaluationMetric: EvaluationMetric[T], 
    trainProportion: Double = 0.8, 
    randomSeed: Int = 42): Double = {
  val (train, test) = trainTestSplit(dataset, trainProportion, randomSeed)
  val predicted = algorithm(train, test, parameters).asInstanceOf[Vector[T]]
  val actual = selectColumn(test, test.head.length - 1).asInstanceOf[Vector[T]]
  
  evaluationMetric(actual, predicted)
}

defined [32mfunction[39m [36mevaluateAlgorithmUsingTrainTestSplit[39m

In [6]:
evaluateAlgorithmUsingTrainTestSplit(data, (train, test, parameters) => zeroRuleClassifier(train, test), Map.empty, accuracy)

[36mres5[39m: [32mDouble[39m = [32m0.6168831168831169[39m

In [7]:
evaluateAlgorithmUsingTrainTestSplit(data, (train, test, parameters) => zeroRuleRegressor(train, test), Map.empty, rootMeanSquaredError)

[36mres6[39m: [32mDouble[39m = [32m0.4880203360256279[39m

Great. It works in both cases (classification and regression).

Given we haven't implement any algorithm yet, we are working with our baseline models that we implemented last week. You might have noticed that we wrapped both the `zeroRuleRegressor` and `zeroRuleClassifier` in a function. That's because these algorithms don't take parameters, but our `Algorithm` type does, so we just receive the parameters and ignore them in this case.

Let's now implement an algorithm test harness using K-Fold Cross-Validation:

## K-Fold Cross-Validation Algorithm Test Harness

As it name suggests, this harness utilizes a K-Fold Cross-Validation split under the hood.

In [8]:
def evaluateAlgorithmUsingCrossValidation[T <: Data](
    dataset: Dataset, 
    algorithm: Algorithm, 
    parameters: Parameters, 
    evaluationMetric: EvaluationMetric[T],
    numberOfFolds: Int = 3, 
    randomSeed: Int = 42) = {
  val folds = crossValidationSplit(dataset, numberOfFolds, randomSeed)

  for {
    fold <- folds
    train = folds.filterNot(_ == fold).flatten  // All but the current fold will comprise the test set
    test = fold
  } yield {
    val predicted = algorithm(train, test, parameters).asInstanceOf[Vector[T]]
    val actual = selectColumn(test, test.head.length - 1).asInstanceOf[Vector[T]]
    evaluationMetric(actual, predicted)
  }
}

defined [32mfunction[39m [36mevaluateAlgorithmUsingCrossValidation[39m

In [9]:
val accuracies = evaluateAlgorithmUsingCrossValidation(data, (train, test, parameters) => zeroRuleClassifier(train, test), Map.empty, accuracy)

[36maccuracies[39m: [32mVector[39m[[32mDouble[39m] = [33mVector[39m([32m0.640625[39m, [32m0.64453125[39m, [32m0.66796875[39m)

In [10]:
val rmses = evaluateAlgorithmUsingCrossValidation(data, (train, test, parameters) => zeroRuleRegressor(train, test), Map.empty, rootMeanSquaredError)

[36mrmses[39m: [32mVector[39m[[32mDouble[39m] = [33mVector[39m([32m0.480071609241788[39m, [32m0.47875472342847764[39m, [32m0.47162610494047946[39m)

Nice. As we can see, this second test harness returns us the evaluation metric value for each of the folds. In order to have a unique value, we should average these values:

In [11]:
val accuracy = accuracies.sum / accuracies.length
val rmse = rmses.sum / rmses.length

[36maccuracy[39m: [32mDouble[39m = [32m0.6510416666666666[39m
[36mrmse[39m: [32mDouble[39m = [32m0.4768174792035817[39m