I can reproduce on Windows actually. Looks like we get int32 somewhere and that a computation overflows so that you end up taking the log of a negative int32 ... need more investigation to pinpoint the source of the problem.
@lesteve I am experiencing this error in Ubuntu system.
I am guessing this is because you are using a 32-bit python. I can reproduce the problem using a 32-bit python.
and thus got [ 0.00012122]. Please verify.
This is what I get as well. Not sure why I had a different value in my previous post.
The problem lies at line 605, sklearn/metrics/cluster/supervised.py
Seems like you figured out where the int overflow happens, thanks! Not sure what the best fix actually is, maybe casting pi and pj as int64, this way you make sure that pi.sum() and pj.sum() do not overflow either.
As for testing, you are more than welcome to add a test similar to the one in the first post (without the pandas dependency). This should fail on Windows without your fix.