GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
def matthews_corrcoef(y_true, y_predicted):
conf_matrix = sklearn.metrics.confusion_matrix(y_true, y_predicted)
true_pos = conf_matrix[1,1]
false_pos = conf_matrix[1,0]
false_neg = conf_matrix[0,1]
n_points = conf_matrix.sum()*1.0
pos_rate = (true_pos + false_neg) / n_points
activity = (true_pos + false_pos) / n_points
mcc_numerator = true_pos / n_points - pos_rate * activity
mcc_denominator = activity * pos_rate * (1 - activity) * (1 - pos_rate)
return mcc_numerator / numpy.sqrt(mcc_denominator)
x_true = numpy.random.sample(n_points)
x_pred = x_true + 0.2 * (numpy.random.sample(n_points) - 0.5)
y_true = (x_true > 0.5) * 1.0
y_pred = (x_pred > 0.5) * 1.0
return y_true, y_pred
for n_points in [10, 100, 1000]:
y_true, y_pred = random_ys(n_points)
mcc_safe = matthews_corrcoef(y_true, y_pred)
mcc_unsafe = sklearn.metrics.matthews_corrcoef(y_true, y_pred)
assert(abs(mcc_safe - mcc_unsafe) < 1e-8)
print('Error: mcc_safe=%s, mcc_unsafe=%s, n_points=%s' % (
mcc_safe, mcc_unsafe, n_points))
This runs fine on 64-bit unix box, but not on 32-bit windows machine due to overflows (and afaik getting 64-bit numpy to work on windows is hard next to impossble).
Which NumPy version? The heavy lifting in matthews_corrcoef is done by np.corrcoef, so this might be an upstream bug. (Not that we shouldn't work around it.)
Does it still occur if y_true and y_pred are in np.float64?
I could reproduce this on Linux 32bit with scikit-learn 0.14.1 (the scikit-learn version does fit with the time this issue was opened).
The overflow comes from the line: https://github.com/scikit-learn/scikit-learn/blob/0.14.1/sklearn/metrics/metrics.py#L464 and there is a warning for it:
/home/lesteve/miniconda3_32bit/envs/py27/lib/python2.7/site-packages/sklearn/metrics/metrics.py:464: RuntimeWarning: overflow encountered in long_scalars
den = np.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn))
Error: mcc_safe=0.883915998392, mcc_unsafe=4.76576382581, n_points=1000
The code of matthews_corrcoef has since changed and uses np.corrcoeff. I can not reproduce this problem for scikit-learn 0.15 so I am closing this issue.