BUG need to ensure classification metrics are sane under (non-stratified) cross-validation #2029

jnothman opened this Issue Jun 4, 2013 · 1 comment


Issues Without PR in scikit-learn 0.19

2 participants

jnothman commented Jun 4, 2013

Where a dataset is split up and not all evaluated at once, some classes may be missing from evaluation. Metrics implementations get around problems relating to classes appearing not in both the y_true and y_pred by considering the union of their labels. However, this is insufficient if a label that existed in the training set for a fold is absent from both the predicted and true test targets.

This is at least a problem for the P/R/F family of metrics with average='macro' and labels unspecified, and it should be documented (though a user shouldn't be using 'macro' if there are infrequent labels). I haven't thought yet about whether it is an issue elsewhere, or whether it can be reasonably tested.

jnothman commented Jun 4, 2013

Where P/R/F specially handles the binary case this is also a problem for other values of average. By this I mean that if one or more missing classes reduces the problem from multiclass to binary classes, the expected result is completely different.

@amueller amueller added this to the 0.15.1 milestone Jul 18, 2014
@amueller amueller modified the milestone: 0.16, 0.17 Sep 11, 2015
@amueller amueller modified the milestone: 0.18, 0.17 Sep 20, 2015
@amueller amueller modified the milestone: 0.18, 0.19 Sep 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment