Overflow in matthews_corrcoef on 32-bit numpy #2806

Closed
aldanor opened this Issue Jan 31, 2014 · 3 comments

Comments

Projects
None yet
5 participants

aldanor commented Jan 31, 2014

Example:

import sklearn.metrics
import numpy

def matthews_corrcoef(y_true, y_predicted):
    conf_matrix = sklearn.metrics.confusion_matrix(y_true, y_predicted)
    true_pos = conf_matrix[1,1]
    false_pos = conf_matrix[1,0]
    false_neg = conf_matrix[0,1]
    n_points = conf_matrix.sum()*1.0
    pos_rate = (true_pos + false_neg) / n_points
    activity = (true_pos + false_pos) / n_points
    mcc_numerator = true_pos / n_points - pos_rate * activity
    mcc_denominator = activity * pos_rate * (1 - activity) * (1 - pos_rate)
    return mcc_numerator / numpy.sqrt(mcc_denominator)

def random_ys(n_points):
    x_true = numpy.random.sample(n_points)
    x_pred = x_true + 0.2 * (numpy.random.sample(n_points) - 0.5)
    y_true = (x_true > 0.5) * 1.0
    y_pred = (x_pred > 0.5) * 1.0
    return y_true, y_pred

for n_points in [10, 100, 1000]:
    y_true, y_pred = random_ys(n_points)
    mcc_safe = matthews_corrcoef(y_true, y_pred)
    mcc_unsafe = sklearn.metrics.matthews_corrcoef(y_true, y_pred)
    try:
        assert(abs(mcc_safe - mcc_unsafe) < 1e-8)
    except AssertionError:
        print('Error: mcc_safe=%s, mcc_unsafe=%s, n_points=%s' % (
            mcc_safe, mcc_unsafe, n_points))

This runs fine on 64-bit unix box, but not on 32-bit windows machine due to overflows (and afaik getting 64-bit numpy to work on windows is hard next to impossble).

Owner

larsmans commented Feb 17, 2014

Which NumPy version? The heavy lifting in matthews_corrcoef is done by np.corrcoef, so this might be an upstream bug. (Not that we shouldn't work around it.)

arjoly added the Bug label Apr 25, 2014

Owner

arjoly commented Apr 25, 2014

Does it still occur if y_true and y_pred are in np.float64?

amueller added this to the 0.15.1 milestone Jul 18, 2014

@amueller amueller modified the milestone: 0.16, 0.17 Sep 11, 2015

@amueller amueller modified the milestone: 0.18, 0.17 Sep 20, 2015

Member

lesteve commented Aug 31, 2016

I could reproduce this on Linux 32bit with scikit-learn 0.14.1 (the scikit-learn version does fit with the time this issue was opened).

The overflow comes from the line: https://github.com/scikit-learn/scikit-learn/blob/0.14.1/sklearn/metrics/metrics.py#L464 and there is a warning for it:

/home/lesteve/miniconda3_32bit/envs/py27/lib/python2.7/site-packages/sklearn/metrics/metrics.py:464: RuntimeWarning: overflow encountered in long_scalars
  den = np.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn))
Error: mcc_safe=0.883915998392, mcc_unsafe=4.76576382581, n_points=1000

The code of matthews_corrcoef has since changed and uses np.corrcoeff. I can not reproduce this problem for scikit-learn 0.15 so I am closing this issue.

lesteve closed this Aug 31, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment