New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZeroDivisionError with sparse input and metric='jaccard' #33
Comments
I suspect my Jaccard implementation is not liking all zero rows. That should not be too hard to fix, but I'll have to take a look and see what's actually going on there. Thanks for the report! |
That was my thought, initially -- I forgot to mention that I'm pretty certain I correctly did a test to clean out zero rows (not shown above) |
Well I did certainly have a bug in handling of all zero rows, which is now fixed. It seems that isn't enough however, so now it's a matter of tracking down where the error actually is. |
Ah, it looks like it is in the random projection trees using cosine splitting. Let me look into that further ... |
It seems that the problem is when we try to split two identical points and the hyperplane is thus zero and we can't take the norm of it (although all zero vectors would also break this code, so I should fix that too). I will have to think a little about the "right" way to handle this. |
Ah, makes sense in light of my original data. |
After some thought standard defensive programming will, in fact, do the right thing. I should have been more careful and caught these the first time, but hopefully this will work better now. Edit: I tested this with your code above and it now works for me; I'll let you pull from master and reinstall and see it fixes the problems on your end as well. |
I suspect you intended for:
to instead be:
|
Yes, sorry. I was working on a separate copy that "worked for me". All the more reason I appreciate the double check. |
No worries, happy to be able to help. Now to try the real data again! |
Not seeing any errors, closing ticket. |
Thank you! |
Example to reproduce error:
import numpy as np
from sklearn import manifold
import umap
X = np.random.choice([0, 1], size=(1000, 50), p=[90./100, 10./100])
tsne = manifold.TSNE(metric='jaccard')
y_tsne = tsne.fit_transform(X)
um = umap.UMAP(metric='jaccard')
y_umap = um.fit_transform(X)
p=[85./100, 15./100] works.
The text was updated successfully, but these errors were encountered: