Skip to content


Subversion checkout URL

You can clone with
Download ZIP


sklearn.metrics.consensus_score potentially gives wrong results #2445

untom opened this Issue · 3 comments

3 participants



sklearn.metrics.consensus_score() gives wrong scores if the two results to be compared contain different numbers of biclusters. This is because the function contains as its final line:

return np.trace(matrix[:, indices[:, 1]]) / max(n_a, n_b)

which uses np.trace under the assumption that matrix (the similarity matrix) is square, and thus contains the most similar items in its diagonal.

However, when matrix is non-square (i.e., n_b != n_a in the code), this fails. I have an example dataset that shows such a case, deposited under: . Just use:

import sklearn.metrics
a_rows = np.loadtxt("/home/tom/a_rows.txt")
a_cols = np.loadtxt("/home/tom/a_cols.txt")
b_rows = np.loadtxt("/home/tom/b_rows.txt")
b_cols = np.loadtxt("/home/tom/b_cols.txt")
print sklearn.metrics.consensus_score((a_rows, a_cols), (b_rows, b_cols))

This gives a consensus-score of ~0.328, however the real score should be ~0.529

The bug can be fixed by exchanging the last line of the function to:

return matrix[indices[:, 0], indices[:, 1]].sum() / max(n_a, n_b)

(I can send a pull request if necessary, however since it's just a single-line fix I'm not sure it's worth it)

@untom untom referenced this issue from a commit in untom/scikit-learn
@untom untom Fixes issue #2445.
When comparing bicluster-results that contain different number of biclusters,
`sklearn.metrics.consensus_score` can sometimes give wrong results. This
is fixed here.

This will be fixed once PR #2452 is merged.

@amueller amueller added this to the 0.15.1 milestone

#2452 has stalled so it might be worth super-seeding it or make a patch.


I've sent a new patch, feel free to review it

@amueller amueller closed this in #3640
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.