Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
128 lines (104 sloc) 4.05 KB
---
title: "Handling model errors"
author: "Michel Lang"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Handling model errors}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE, cache = FALSE}
library(mlr3)
knitr::opts_knit$set(
datatable.print.keys = FALSE,
datatable.print.class = TRUE
)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
future::plan("sequential")
set.seed(123)
```
This vignettes demonstrates how to deal with learners which raise exceptions during train or predict.
# Setup
First, we need a simple learning task and a learner which raises exceptions.
For this purpose, `mlr3` ships with the learner `classif.debug`:
```{r}
task = mlr_tasks$get("spam")
learner = mlr_learners$get("classif.debug")
print(learner)
```
The hyperparameters let us control (a) what conditions should be signaled (message, warning, error), and (b) during which stage (train or predict).
Additionally, we can tell the learner to provoke a segfault which tears down the complete R session.
With its default settings, it will do nothing special: it learns a random label and which is used to create constant predictions.
# No error handling
In the defaults, `mlr3` does not handle errors.
Thus, the exception raised by the unittest learner stops the execution and can be tracebacked:
```{r, error = TRUE}
task = mlr_tasks$get("spam")
learner = mlr_learners$get("classif.debug")
learner$param_set$values = list(error_train = TRUE)
e = Experiment$new(task, learner)
e$train()
```
# Encapsulation
During parallelization, error messages (as well as normal output or warnings) are often not properly forwarded to the master R session, or they arrive in a confusing order.
The learner execution can be encapsulated, so its output is logged to the experiment instead of just printed to the console:
```{r}
task = mlr_tasks$get("spam")
learner = mlr_learners$get("classif.debug")
learner$param_set$values = list(warning_train = TRUE, error_train = TRUE)
ctrl = mlr_control(encapsulate_train = "evaluate")
e = Experiment$new(task, learner, ctrl = ctrl)
e$train()
e$has_errors # any errors recorded?
e$logs$train # print train log
e$logs$train$warnings # get all the warnings
e$logs$train$errors # get all the errors
```
You can also enable the encapsulation for the *predict* step of an experiment by setting `encapsulate_predict`.
Another possibility to encapsulate is execution via package the [`callr`](https://cran.r-project.org/package=callr).
[`callr`](https://cran.r-project.org/package=callr) spawns a new R process, and thus guards us from segfaults.
On the downside, starting new processes comes with a computational overhead.
```{r}
ctrl = mlr_control(encapsulate_train = "callr")
e = Experiment$new(task, learner)
e$train(ctrl = ctrl)
e$has_errors
e$logs$train
```
```{r}
ctrl = mlr_control(encapsulate_train = "callr")
task = mlr_tasks$get("spam")
learner = mlr_learners$get("classif.debug")
learner$param_set$values = list(segfault_train = TRUE)
e = Experiment$new(task, learner)
e$train(ctrl = ctrl)
e$has_errors
e$logs$train$errors
```
Note that, although no exception has been raised with encapsulation, it is impossible to perform the predict step without a model:
```{r, error = TRUE}
e$predict()
```
As a workaround, we define a learner in the next section which is used as a surrogate to create predictions.
# Fallback learners
Each learner can have a fallback learner, which is used if either the train or predict step fail.
Here, we simply fallback to the predictions of a featureless learner (predicting majority class):
```{r}
task = mlr_tasks$get("spam")
learner = mlr_learners$get("classif.debug")
learner$param_set$values = list(error_train = TRUE)
learner$fallback = mlr_learners$get("classif.featureless")
ctrl = mlr_control(encapsulate_train = "evaluate")
e = Experiment$new(task = task, learner = learner, ctrl = ctrl)
e$train()
e$has_errors
e$logs$train
e$predict()
e$score()
e$prediction
e$performance
```
Note that the logs and timings are tracked for the original learner (until it errored), not the fallback learner.