diff --git a/src/performance.Rmd b/src/performance.Rmd index ed7c5794..b345be7e 100644 --- a/src/performance.Rmd +++ b/src/performance.Rmd @@ -134,7 +134,11 @@ See the help page of [Measure](&makeMeasure) for information on the individual s str(mmce) ``` -## Binary classification: Plot performance versus threshold +## Binary classification + +For binary classification specialized techniques exist to analyze the performance. + +### Plot performance versus threshold As you may recall (see the previous section on [making predictions](predict.md)) in binary classification we can adjust the threshold used to map probabilities to class labels. @@ -166,3 +170,22 @@ multiple measures, one of them is mapped to an interactive sidebar which selects ```{r, eval = FALSE} plotThreshVsPerfGGVIS(d) ``` + +### ROC measures + +For binary classification a large number of specialized measures exist, which can be nicely formatted into +one matrix, see for example the [receiver operating characteristic page on wikipedia](https://en.wikipedia.org/wiki/Receiver_operating_characteristic). + +We can generate a similiar table with the [&calculateROCMeasures] function. + +```{r} +r = calculateROCMeasures(pred) +r +``` + +The top left $2 \times 2$ matrix is the [confusion matrix](predict.md#confusion-matrix), which shows +the relative frequency of correctly and incorrectly classified observations. Below and to the right a large number of +performance measures that can be inferred from the confusion matrix are added. By default some +additional info about the measures is printed. +You can turn this off using the `abbreviations` argument of the [print](&calculateROCMeasures) method: +`print(r, abbreviations = FALSE)`. diff --git a/src/predict.Rmd b/src/predict.Rmd index 3c4a0f5a..233452c6 100644 --- a/src/predict.Rmd +++ b/src/predict.Rmd @@ -113,11 +113,6 @@ pred = predict(mod, task = iris.task) pred ``` -A confusion matrix can be obtained by calling [&getConfMatrix]. -```{r} -getConfMatrix(pred) -``` - In order to get predicted posterior probabilities we have to create a [Learner](&makeLearner) with the appropriate ``predict.type``. @@ -138,6 +133,46 @@ As mentioned above, the predicted posterior probabilities can be accessed via th head(getPredictionProbabilities(pred)) ``` +### Confusion matrix + +A confusion matrix can be obtained by calling [&calculateConfusionMatrix]. The columns represent +predicted and the rows true class labels. + +```{r} +calculateConfusionMatrix(pred) +``` + +You can see the number of correctly classified observations on the diagonal of the matrix. Misclassified +observations are on the off-diagonal. The total number of errors for single (true and predicted) +classes is shown in the `-err.-` row and column, respectively. + +To get relative frequencies additional to the absolute numbers we can set `relative = TRUE`. + +```{r} +conf.matrix = calculateConfusionMatrix(pred, relative = TRUE) +conf.matrix +``` + +It is possible to normalize by either row or column, therefore every element of the above +relative confusion matrix contains two values. The first is the relative frequency grouped by row +(the true label) and the second value grouped by column (the predicted label). + +If you want to access the relative values directly you can do this through the `$relative.row` +and `$relative.col` members of the returned object `conf.matrix`. +For more details see the [&ConfusionMatrix] documentation page. + +```{r} +conf.matrix$relative.row +``` + +Finally, we can also add the absolute number of observations for each predicted +and true class label to the matrix (both absolute and relative) by setting `sums = TRUE`. + +```{r} +calculateConfusionMatrix(pred, relative = TRUE, sums = TRUE) +``` + + ## Adjusting the threshold We can set the threshold value that is used to map the predicted posterior probabilities to class labels. @@ -161,11 +196,13 @@ pred1$threshold ## Set the threshold value for the positive class pred2 = setThreshold(pred1, 0.9) pred2$threshold + pred2 ## We can also set the effect in the confusion matrix -getConfMatrix(pred1) -getConfMatrix(pred2) +calculateConfusionMatrix(pred1) + +calculateConfusionMatrix(pred2) ``` Note that in the binary case [&getPredictionProbabilities] by default extracts the posterior