Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
148 lines (115 sloc) 4.88 KB
---
title: "Implemented Performance Measures"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{mlr}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r, echo = FALSE, message=FALSE}
library("mlr")
library("BBmisc")
library("ParamHelpers")
urlMlrFunctions = "https://mlr-org.github.io/mlr/reference/"
ext = ".html"
library("pander")
# show grouped code output instead of single lines
knitr::opts_chunk$set(collapse = TRUE)
```
This page shows the performance measures available for the different types of
learning problems as well as general performance measures in alphabetical order.
(See also the documentation about `measures()` and `makeMeasure()` for available measures and their properties.)
If you find that a measure is missing, you can either [open an issue](https://github.com/mlr-org/mlr/issues) or try to [implement a measure yourself](create_measure.html){target="_blank"}.
Column **Minim.** indicates if the measure is minimized during, e.g., tuning or feature selection.
**Best** and **Worst** show the best and worst values the performance measure can attain.
For *classification*, column **Multi** indicates if a measure is suitable for multi-class problems.
If not, the measure can only be used for binary classification problems.
The next six columns refer to information required to calculate the performance measure.
* **Pred.**: The `Prediction()` object.
* **Truth**: The true values of the response variable(s) (for supervised learning).
* **Probs**: The predicted probabilities (might be needed for classification).
* **Model**: The `WrappedModel` (`makeWrappedModel()`) (e.g., for calculating the training time).
* **Task**: The `Task()` (relevant for cost-sensitive classification).
* **Feats**: The predicted data (relevant for clustering).
**Aggr.** shows the default aggregation method (`aggregations()`) tied to the measure.
```{r include=FALSE}
# urlMlrFunctions, ext are defined in build
linkFct = function(x, y) {
collapse(sprintf("[%1$s](%3$s%2$s%4$s)", x, y, urlMlrFunctions, ext), sep = "<br />")
}
cn = function(x) if (is.null(x)) NA else gsub("\\n", " ", x)
urls = function(x) if (is.na(x)) NA else gsub("(http)(\\S+)(\\.)", "[\\1\\2](\\1\\2)\\3", x)
# regex is not ideal and can break
getTab = function(type) {
m = list(featperc = featperc, timeboth = timeboth, timepredict = timepredict, timetrain = timetrain)
if (type == "general") {
meas = m
} else {
meas = listMeasures(type, create = TRUE)
ord = order(names(meas))
meas = meas[ord]
keep = setdiff(names(meas), names(m))
meas = meas[keep]
}
cols = c("ID / Name", "Minim.", "Best", "Worst", "Multi", "Pred.", "Truth", "Probs", "Model", "Task", "Feats", "Aggr.", "Note")
df = makeDataFrame(nrow = length(meas), ncol = length(cols),
col.types = c("character", "logical", "numeric", "numeric", "logical", "logical", "logical", "logical", "logical", "logical", "logical", "character", "character"))
names(df) = cols
for (i in seq_along(meas)) {
mea = meas[[i]]
df[i, 1] = paste0("**", linkFct(mea$id, "measures"), "** <br />", mea$name)
df[i, 2] = mea$minimize
df[i, 3] = mea$best
df[i, 4] = mea$worst
df[i, 5] = "classif.multi" %in% mea$properties
df[i, 6] = "req.pred" %in% mea$properties
df[i, 7] = "req.truth" %in% mea$properties
df[i, 8] = "req.prob" %in% mea$properties
df[i, 9] = "req.model" %in% mea$properties
df[i, 10] = "req.task" %in% mea$properties
df[i, 11] = "req.feats" %in% mea$properties
df[i, 12] = linkFct(mea$aggr$id, "aggregations")
df[i, 13] = urls(cn(mea$note))
}
just = c("left", "center", "right", "right", "center", "center", "center", "center", "center", "center", "center", "left", "left")
if (type != "classif") {
ind = cols != "Multi"
df = df[ind]
just = just[ind]
}
logicals = vlapply(df, is.logical)
df[logicals] = lapply(df[logicals], function(x) ifelse(x, "X", ""))
pandoc.table(df, style = "rmarkdown", split.tables = Inf, split.cells = Inf,
justify = just)
}
```
# Classification
```{r echo=FALSE,results="asis"}
getTab("classif")
```
# Regression
```{r echo=FALSE,results="asis"}
getTab("regr")
```
# Survival analysis
```{r echo=FALSE,results="asis"}
getTab("surv")
```
# Cluster analysis
```{r echo=FALSE,results="asis"}
getTab("cluster")
```
# Cost-sensitive classification
```{r echo=FALSE,results="asis"}
getTab("costsens")
```
Note that in case of *ordinary misclassification costs* you can also generate performance measures from cost matrices by function `makeCostMeasure()`.
For details see the tutorial page on [cost-sensitive classification](cost_sensitive_classif.html){target="_blank"} and also the page on [custom performance measures](create_measure.html){target="_blank"}.
# Multilabel classification
```{r echo=FALSE,results="asis"}
getTab("multilabel")
```
# General performance measures
```{r echo=FALSE,results="asis"}
getTab("general")
```
You can’t perform that action at this time.