-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Odd/inconsistent behavior in adjusted_rand_score #12940
Comments
Note that this came to my attention through this SO question. |
@Engineero In
You can test the above code here: https://rdrr.io/snippets/ Scikit-learn's implementation is based on this (denoted by
So I think that its consistent. Maybe some one can be able to explain the results. |
@vivekk-ezdi maybe I need to look into the math a bit more. I will close for now and reopen if I think I've found something that shows inconsistency with what one would expect from the math. |
Adjustment needs a probability distribution for agreement by chance given
the true distribution. Since the true distribution is extremely peaked, I
suspect the adjustment is not very suitable. But I also have not yet looked
at the specific maths
|
Description
metrics.adjusted_rand_score
seems to give inconsistent results. The example given below is extreme, wherein two almost identical inputs return an ARI of 0.0.Steps/Code to Reproduce
Example:
If you change the single
0
inlabels_pred
to a1
, the result is, as expected, 1.0.Expected Results
One would expect the ARI in the case shown above to be very close to 1.0 for two almost-identical inputs.
Actual Results
The actual result for the example given is 0.0, which seems to indicate unexpected behavior in the algorithm.
Versions
Linux-2.6.32-696.23.1.el6.x86_64-x86_64-with-redhat-6.9-Santiago
Python 3.6.2 (default, Nov 4 2017, 17:40:18)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)]
NumPy 1.14.3
SciPy 1.1.0
Scikit-Learn 0.19.1
The text was updated successfully, but these errors were encountered: