Skip to content
Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
226 lines (167 sloc) 6.89 KB
---
title: "Resampling and Benchmarking"
author: "mlr-org team"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Resampling and Benchmarking}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r child = 'include/options.Rmd'}
```
This introduction is about resampling and benchmarking.
## Objects
Again, we consider the iris task and a simple classification tree here.
```{r}
library(mlr3)
task = mlr_tasks$get("iris")
learner = mlr_learners$get("classif.rpart")
```
Additionally, we need to define **how** we want to resample.
`mlr3` comes with the following resampling strategies implemented:
```{r}
mlr_resamplings$keys()
```
Additional resampling methods for special use cases will be available via extension packages, such as [mlr3spatiotemporal](https://github.com/mlr-org/mlr3spatiotemporal) for spatial data (still in development).
The experiment conducted in the introduction on train/predict/score is equivalent to a simple "holdout", so let's consider this one first.
```{r}
resampling = mlr_resamplings$get("holdout")
print(resampling)
print(resampling$param_set$values)
```
To change the ratio to $0.8$, we simply overwrite the slot:
```{r}
resampling$param_set$values = list(ratio = 0.8)
```
## Resampling
Now, we can pass all created objects to the `resample()` function to get an object of class `ResampleResult`:
```{r}
rr = resample(task, learner, resampling)
print(rr)
```
Before we go into more detail, let's change the resampling to a 3-fold cross-validation to better illustrate what operations are possible with a resampling result.
```{r}
resampling = mlr_resamplings$get("cv", param_vals = list(folds = 3L))
rr = resample(task, learner, resampling)
print(rr)
```
We can do different things with resampling results, e.g.:
* Extract the performance for the individual resampling iterations:
```{r}
rr$performance("classif.mmce")
```
* Extract and inspect the now created resampling:
```{r}
rr$resampling
rr$resampling$iters
rr$resampling$test_set(1)
rr$resampling$test_set(2)
rr$resampling$test_set(3)
```
* Retrieve the experiment of a specific iteration and inspect it:
```{r}
e = rr$experiment(iter = 1)
e$model
```
## Manual instantiation
If you want to compare multiple learners, you should use the same resampling per task to reduce the variance of the performance estimation.
Until now, we have just passed a resampling strategy to `resample()`, without specifying the actual splits into training and test.
Here, we manually instantiate the resampling:
```{r}
resampling = mlr_resamplings$get("cv", param_vals = list(folds = 3L))
resampling$instantiate(task)
resampling$iters
resampling$train_set(1)
```
If we now pass this instantiated object to resample, the pre-calculated training and test splits will be used for both learners:
```{r}
learner1 = mlr_learners$get("classif.rpart") # simple classification tree
learner2 = mlr_learners$get("classif.featureless") # featureless learner, prediction majority class
rr1 = resample(task, learner1, resampling)
rr2 = resample(task, learner2, resampling)
setequal(rr1$experiment(1)$train_set, rr2$experiment(1)$train_set)
```
We can also combine the created result objects into a `BenchmarkResult` (see below for an introduction to simple benchmarking):
```{r}
bmr = rr1$combine(rr2)
bmr$aggregated(objects = FALSE)
```
## Custom resampling
Sometimes it is necessary to perform resampling with custom splits, e.g. to reproduce a study.
For this purpose, splits can be manually set for `ResamplingCustom`:
```{r}
resampling = mlr_resamplings$get("custom")
resampling$instantiate(task,
list(c(1:10, 51:60, 101:110)),
list(c(11:20, 61:70, 111:120))
)
resampling$iters
resampling$train_set(1)
resampling$test_set(1)
```
## Benchmarking
Comparing the performance of different learners on multiple tasks is a recurrent task.
`mlr3` offers the `benchmark()` function for convenience.
The interface of the `benchmark()` function accepts a design of tasks, learners, and resampling strategies as data.frame.
Here, we call `benchmark()` to perform a single holdout split on a single task and two learners:
```{r}
library(data.table)
design = data.table(
task = mlr_tasks$mget("iris"),
learner = mlr_learners$mget(c("classif.rpart", "classif.featureless")),
resampling = mlr_resamplings$mget("holdout")
)
print(design)
bmr = benchmark(design)
```
Note that the holdout splits have been automatically instantiated for each row of the design.
As a result, the `rpart` learner used a different training set than the featureless learner.
However, for comparison of learners you usually want the learners to see the same splits into train and test sets.
To overcome this issue, the resampling strategy needs to be manually instantiated before creating the design.
While the interface of `benchmark()` allows full flexibility, the creation of such design tables can be tedious.
Therefore, `mlr3` provides a helper function to quickly generate design tables and instantiate resampling strategies in an exhaustive grid fashion: `mlr3::expand_grid()`.
```{r}
# get some example tasks
tasks = mlr_tasks$mget(c("pima", "sonar", "spam"))
# set measures for all tasks: accuracy (acc) and area under the curve (auc)
measures = mlr_measures$mget(c("classif.acc", "classif.auc"))
tasks = lapply(tasks, function(task) { task$measures = measures; task })
# get a featureless learner and a classification tree
learners = mlr_learners$mget(c("classif.featureless", "classif.rpart"))
# let the learners predict probabilities instead of class labels (required for AUC measure)
learners$classif.featureless$predict_type = "prob"
learners$classif.rpart$predict_type = "prob"
# compare via 10-fold cross validation
resamplings = mlr_resamplings$mget("cv")
# create a BenchmarkResult object
design = expand_grid(tasks, learners, resamplings)
print(design)
bmr = benchmark(design)
```
The aggregated resampling results can be accessed with:
```{r}
bmr$aggregated(objects = FALSE)
```
We can aggregate it further, i.e. if we are interested which learner performed best over all tasks:
```{r}
bmr$aggregated(objects = FALSE)[, list(acc = mean(classif.acc), auc = mean(classif.auc)), by = "learner_id"]
```
Unsurprisingly, the classification tree outperformed the featureless learner.
### Converting specific benchmark objects to resample objects
As a `BenchmarkResult` object is basically a collection of multiple `ResampleResult` objects, we can extract specific `ResampleResult` objects using the stored hashes:
```{r}
tab = bmr$aggregated(objects = FALSE)[task_id == "spam" & learner_id == "classif.rpart"]
print(tab)
rr = bmr$resample_result(tab$hash)
print(rr)
```
We can now investigate this resampling and even single experiments using the previously introduced API:
```{r}
rr$aggregated
# get the iteration with worst AUC
worst = as.data.table(rr)[which.min(classif.auc), c("iteration", "classif.auc")]
print(worst)
# get the corresponding experiment
e = rr$experiment(worst$iteration)
print(e)
```
You can’t perform that action at this time.