[MRG] Error for cosine affinity when zero vectors present #7943
I think that this is not the right fix.
There are two problems here: one that cosine is not well defined when vectors are null, and the other that hierarchical clustering is buggy.
We might consider to do bugware and implement such a check, to save the users' time (I stress the "might", as I would really be bugware).
If we do this, we should:
Thanks for the feedback. I apologize if I am making things harder than they need to be. I still have a few questions:
Cosine is not well defined --
Hierarchical Clustering is buggy --
I made the additions (1) and (2) in your list, @GaelVaroquaux.
Regarding (3), as I mentioned in my previous post, I don't believe there to be a remaining bug in Scipy's hierarchical-clustering.
What I propose is: we put this merge on hold while we create an issue regarding: What should Agglomerative Clustering do when using Cosine Distance with zero vectors? Once this gets resolved, we can either go forward with this fix... or we can proceed with whatever gets decided.
If this sounds reasonable, I can submit this new Issue.
@@ Coverage Diff @@ ## master #7943 +/- ## ========================================== + Coverage 96.19% 96.26% +0.06% ========================================== Files 348 401 +53 Lines 64645 72877 +8232 Branches 0 7895 +7895 ========================================== + Hits 62187 70154 +7967 - Misses 2458 2699 +241 - Partials 0 24 +24
…rn#7943) * cosine affinity cannot be used when X contains zero vectors * fixed issue with tabs spaces * changed to np.any and created a test for this new ValueError * use assert_raise_message and flipped order of if conditions * fixed 0 row calculation