Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conf matrix and ROCMeasures #48

Merged
merged 3 commits into from
Aug 31, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion src/performance.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,11 @@ See the help page of [Measure](&makeMeasure) for information on the individual s
str(mmce)
```

## Binary classification: Plot performance versus threshold
## Binary classification

For binary classification specialized techniques exist to analyze the performance.

### Plot performance versus threshold

As you may recall (see the previous section on [making predictions](predict.md))
in binary classification we can adjust the threshold used to map probabilities to class labels.
Expand Down Expand Up @@ -166,3 +170,22 @@ multiple measures, one of them is mapped to an interactive sidebar which selects
```{r, eval = FALSE}
plotThreshVsPerfGGVIS(d)
```

### ROC measures

For binary classification a large number of specialized measures exist, which can be nicely formatted into
one matrix, see for example the [receiver operating characteristic page on wikipedia](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).

We can generate a similiar table with the [&calculateROCMeasures] function.

```{r}
r = calculateROCMeasures(pred)
r
```

The top left $2 \times 2$ matrix is the [confusion matrix](predict.md#confusion-matrix), which shows
the relative frequency of correctly and incorrectly classified observations. Below and to the right a large number of
performance measures that can be inferred from the confusion matrix are added. By default some
additional info about the measures is printed.
You can turn this off using the `abbreviations` argument of the [print](&calculateROCMeasures) method:
`print(r, abbreviations = FALSE)`.
51 changes: 44 additions & 7 deletions src/predict.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -113,11 +113,6 @@ pred = predict(mod, task = iris.task)
pred
```

A confusion matrix can be obtained by calling [&getConfMatrix].
```{r}
getConfMatrix(pred)
```

In order to get predicted posterior probabilities we have to create a [Learner](&makeLearner)
with the appropriate ``predict.type``.

Expand All @@ -138,6 +133,46 @@ As mentioned above, the predicted posterior probabilities can be accessed via th
head(getPredictionProbabilities(pred))
```

### Confusion matrix

A confusion matrix can be obtained by calling [&calculateConfusionMatrix]. The columns represent
predicted and the rows true class labels.

```{r}
calculateConfusionMatrix(pred)
```

You can see the number of correctly classified observations on the diagonal of the matrix. Misclassified
observations are on the off-diagonal. The total number of errors for single (true and predicted)
classes is shown in the `-err.-` row and column, respectively.

To get relative frequencies additional to the absolute numbers we can set `relative = TRUE`.

```{r}
conf.matrix = calculateConfusionMatrix(pred, relative = TRUE)
conf.matrix
```

It is possible to normalize by either row or column, therefore every element of the above
relative confusion matrix contains two values. The first is the relative frequency grouped by row
(the true label) and the second value grouped by column (the predicted label).

If you want to access the relative values directly you can do this through the `$relative.row`
and `$relative.col` members of the returned object `conf.matrix`.
For more details see the [&ConfusionMatrix] documentation page.

```{r}
conf.matrix$relative.row
```

Finally, we can also add the absolute number of observations for each predicted
and true class label to the matrix (both absolute and relative) by setting `sums = TRUE`.

```{r}
calculateConfusionMatrix(pred, relative = TRUE, sums = TRUE)
```



## Adjusting the threshold
We can set the threshold value that is used to map the predicted posterior probabilities to class labels.
Expand All @@ -161,11 +196,13 @@ pred1$threshold
## Set the threshold value for the positive class
pred2 = setThreshold(pred1, 0.9)
pred2$threshold

pred2

## We can also set the effect in the confusion matrix
getConfMatrix(pred1)
getConfMatrix(pred2)
calculateConfusionMatrix(pred1)

calculateConfusionMatrix(pred2)
```

Note that in the binary case [&getPredictionProbabilities] by default extracts the posterior
Expand Down