Closed
Description
I have a sample of 1m rows and the cohen_kappa_score showed as -11.3 when it should be -1 to +1. Further investigation showed that the confusion matrix returns a 32 bit result and in cohen_kappa calculation the outer product overflows.
Here is an example with a small amount of data. If I put np.int64 around the confusion matrix then it works.
import sys
from sklearn.metrics import confusion_matrix
y1 = np.int64([1,0,0,1])
y2 = np.int64([0,0,1,1])
confusion = confusion_matrix(y1, y2)
sys.version, type(y1[0]), type(y2[0]), type(confusion[0,0])
('3.5.2 |Anaconda 4.1.1 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]',
numpy.int64,
numpy.int64,
numpy.int32)
Metadata
Metadata
Assignees
Labels
No labels