Overflow in matthews_corrcoef on 32-bit numpy #2806

Closed
opened this Issue Jan 31, 2014 · 3 comments

Projects
None yet
5 participants

aldanor commented Jan 31, 2014

 Example: ```import sklearn.metrics import numpy def matthews_corrcoef(y_true, y_predicted): conf_matrix = sklearn.metrics.confusion_matrix(y_true, y_predicted) true_pos = conf_matrix[1,1] false_pos = conf_matrix[1,0] false_neg = conf_matrix[0,1] n_points = conf_matrix.sum()*1.0 pos_rate = (true_pos + false_neg) / n_points activity = (true_pos + false_pos) / n_points mcc_numerator = true_pos / n_points - pos_rate * activity mcc_denominator = activity * pos_rate * (1 - activity) * (1 - pos_rate) return mcc_numerator / numpy.sqrt(mcc_denominator) def random_ys(n_points): x_true = numpy.random.sample(n_points) x_pred = x_true + 0.2 * (numpy.random.sample(n_points) - 0.5) y_true = (x_true > 0.5) * 1.0 y_pred = (x_pred > 0.5) * 1.0 return y_true, y_pred for n_points in [10, 100, 1000]: y_true, y_pred = random_ys(n_points) mcc_safe = matthews_corrcoef(y_true, y_pred) mcc_unsafe = sklearn.metrics.matthews_corrcoef(y_true, y_pred) try: assert(abs(mcc_safe - mcc_unsafe) < 1e-8) except AssertionError: print('Error: mcc_safe=%s, mcc_unsafe=%s, n_points=%s' % ( mcc_safe, mcc_unsafe, n_points))``` This runs fine on 64-bit unix box, but not on 32-bit windows machine due to overflows (and afaik getting 64-bit numpy to work on windows is hard next to impossble).
Owner

larsmans commented Feb 17, 2014

 Which NumPy version? The heavy lifting in `matthews_corrcoef` is done by `np.corrcoef`, so this might be an upstream bug. (Not that we shouldn't work around it.)

Owner

arjoly commented Apr 25, 2014

 Does it still occur if `y_true` and `y_pred` are in `np.float64`?

Member

lesteve commented Aug 31, 2016

 I could reproduce this on Linux 32bit with scikit-learn 0.14.1 (the scikit-learn version does fit with the time this issue was opened). The overflow comes from the line: https://github.com/scikit-learn/scikit-learn/blob/0.14.1/sklearn/metrics/metrics.py#L464 and there is a warning for it: ``````/home/lesteve/miniconda3_32bit/envs/py27/lib/python2.7/site-packages/sklearn/metrics/metrics.py:464: RuntimeWarning: overflow encountered in long_scalars den = np.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn)) Error: mcc_safe=0.883915998392, mcc_unsafe=4.76576382581, n_points=1000 `````` The code of `matthews_corrcoef` has since changed and uses `np.corrcoeff`. I can not reproduce this problem for scikit-learn 0.15 so I am closing this issue.