Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better confusion matrix #1071

Merged
merged 19 commits into from
Aug 11, 2016
Merged

Conversation

ja-thomas
Copy link
Contributor

Imorovements for get confusion matrix (and fixed a bug in there).

Closes #1048

@berndbischl berndbischl force-pushed the fix_1048_better_conf_matrix_printer branch from 84d4011 to ba658ae Compare August 7, 2016 14:23
@berndbischl
Copy link
Sponsor Member

travis fails, please rebase on master

@ja-thomas ja-thomas force-pushed the fix_1048_better_conf_matrix_printer branch from 4d06102 to 226796e Compare August 9, 2016 13:53
@ja-thomas
Copy link
Contributor Author

Some more explaination.

getConfusionMatrix now returns an object that contains a lot more information. The "old" matrix is now in a slot cm$result, I also adapted all the internal usage so that the new object works and doesn't break anything.

We fixed a bug where the column marginals where strangely (or even wrongly) calculated for realtve values.

Also there is the possibility to add row and column sums which was requested in #1048

I added a printer to see the information in a compact way:

> set.seed(1230)
> allinds = 1:150
> train = sample(allinds, 75)
> test = setdiff(allinds, train)
> mod = train("classif.rpart", iris.task, subset = train)
> pred = predict(mod, iris.task, subset = test)
> print(getConfMatrix(pred, relative = TRUE))
Relative confusion matrix (normalized by row/column):
            predicted
true         setosa    versicolor virginica -err.-   
  setosa     1.00/1.00 0.00/0.00  0.00/0.00 0.00     
  versicolor 0.00/0.00 0.83/0.95  0.17/0.18 0.17     
  virginica  0.00/0.00 0.05/0.05  0.95/0.82 0.05     
  -err.-          0.00      0.05       0.18 0.22/0.23


Absolute confusion matrix:
            predicted
true         setosa versicolor virginica -err.-
  setosa         32          0         0      0
  versicolor      0         20         4      4
  virginica       0          1        18      1
  -err.-          0          1         4      5
> print(getConfMatrix(pred, relative = TRUE, sums = TRUE))
Relative confusion matrix (normalized by row/column):
            predicted
true         setosa    versicolor virginica -err.-   
  setosa     1.00/1.00 0.00/0.00  0.00/0.00 0.00     
  versicolor 0.00/0.00 0.83/0.95  0.17/0.18 0.17     
  virginica  0.00/0.00 0.05/0.05  0.95/0.82 0.05     
  -err.-          0.00      0.05       0.18 0.22/0.23


Absolute confusion matrix:
           setosa versicolor virginica -err.- -n-
setosa         32          0         0      0  32
versicolor      0         20         4      4  24
virginica       0          1        18      1  19
-err.-          0          1         4      5  NA
-n-            32         21        22     NA 150

@berndbischl
Copy link
Sponsor Member

  1. the last element in the bottom right corner = errs / n.

  2. confMatrix --> ConfMatrix. And the structure must be documented

  3. really explain that we create a list of mats, that are printed then. that the user knows this.

  4. check your asserts

  5. i would add "sums" / "-n-" to all matrices

  6. digits default should be 2

@ja-thomas
Copy link
Contributor Author

> allinds = 1:150
> train = sample(allinds, 75)
> test = setdiff(allinds, train)
> mod = train("classif.lda", iris.task, subset = train)
> pred = predict(mod, iris.task, subset = test)
> print(getConfMatrix(pred))
            predicted
true         setosa versicolor virginica -err.-
  setosa         24          0         0      0
  versicolor      0         26         0      0
  virginica       0          1        24      1
  -err.-          0          1         0      1
> print(getConfMatrix(pred, sums = TRUE))
           setosa versicolor virginica -err.- -n-
setosa         24          0         0      0  24
versicolor      0         26         0      0  26
virginica       0          1        24      1  25
-err.-          0          1         0      1  NA
-n-            24         27        24     NA 150
> print(getConfMatrix(pred, relative = TRUE))
Relative confusion matrix (normalized by row/column):
            predicted
true         setosa    versicolor virginica -err.-   
  setosa     1.00/1.00 0.00/0.00  0.00/0.00 0.00     
  versicolor 0.00/0.00 1.00/0.96  0.00/0.00 0.00     
  virginica  0.00/0.00 0.04/0.04  0.96/1.00 0.04     
  -err.-          0.00      0.04       0.00   0.01   


Absolute confusion matrix:
            predicted
true         setosa versicolor virginica -err.-
  setosa         24          0         0      0
  versicolor      0         26         0      0
  virginica       0          1        24      1
  -err.-          0          1         0      1
> print(getConfMatrix(pred, relative = TRUE, sums = TRUE))
Relative confusion matrix (normalized by row/column):
            predicted
true         setosa    versicolor virginica -err.-    -n- 
  setosa     1.00/1.00 0.00/0.00  0.00/0.00 0.00      24  
  versicolor 0.00/0.00 1.00/0.96  0.00/0.00 0.00      27  
  virginica  0.00/0.00 0.04/0.04  0.96/1.00 0.04      24  
  -err.-          0.00      0.04       0.00   0.01    <NA>
  -n-        24        26         25        <NA>      150 


Absolute confusion matrix:
           setosa versicolor virginica -err.- -n-
setosa         24          0         0      0  24
versicolor      0         26         0      0  26
virginica       0          1        24      1  25
-err.-          0          1         0      1  NA
-n-            24         27        24     NA 150

@larskotthoff
Copy link
Sponsor Member

What exactly does the relative confusion matrix show?

@ja-thomas
Copy link
Contributor Author

for a/b, a is the relative frequency (and error in the margins) normalized by column and b by row. the n are the group sizes.

@ja-thomas
Copy link
Contributor Author

1. Error: measures quickcheck (@test_base_measures.R#573)

I dont think that has something to do with my changes.

@berndbischl
Copy link
Sponsor Member

  1. Error: measures quickcheck (@test_base_measures.R#573)

pls rebase

@ja-thomas
Copy link
Contributor Author

So like suggested in #1113, we renamed getConfMatrix to calculateConfusionMatrix and deprecated the old one.

Can you check if the deprecation is done correctly, i've never done that before.

Also two mini fixes, padding and the internal function name

@jakobbossek
Copy link
Contributor

Can you check if the deprecation is done correctly, i've never done that before.

Deprecation looks good.

@berndbischl
Copy link
Sponsor Member

  1. pls recheck the ConfMatrix docs.

  2. add family performance

@ja-thomas
Copy link
Contributor Author

The tag is already there for family tag for calculateConfusionMatrix. But should it also be added for the ConfMatrix?

@berndbischl berndbischl reopened this Aug 11, 2016
@berndbischl berndbischl merged commit 603c556 into master Aug 11, 2016
@berndbischl berndbischl deleted the fix_1048_better_conf_matrix_printer branch August 15, 2016 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

getConfMatrix, slight improvement
5 participants