Fewer than k nearest neighbors are found #14
Comments
Hello @wangyuran - This is a great request - thanks for the feedback! The requested change is now landed. Please let me know if there is anything else! |
Ill leave this request open for another week then close if i don't hear back. |
@spencebeecher , for the index in the example, the doc_index are for the whole dataset, shouldn't it be only for the training dataset? |
@spencebeecher . thank you very much. The output is k neighbors now. However, the new issue is the top 1 NN accuracy in my case drops more than half (68% to 28%). Since the change is just the stopping criteria, I am not sure why this happens. |
@wangyuran - Oh no! Can you increase the number of k_clusters you search as a quick patch? If you send me a notebook / script I can try to debug it with you. Ill look at this more later on tonight. You can always go back one revision in github to get the old behavior. Let me know what you discover! |
Adding info - i just re-ran this example - https://github.com/facebookresearch/pysparnn/blob/master/examples/sparse_search_comparison.ipynb I get very similar results from before (~60% recall for pysparnn) |
Hi Spence,
I tried to rerun with the earlier version (Mar 19), it did give similar
results. Maybe for my system, the results was not very stable. I tried to
run a few times, the values fluctuates quite a bit (5% to 68%), but most of
them are about 30%.
Anyway, thanks for fixing the issue.
Thanks,
Yuran
…On Tue, May 30, 2017 at 11:20 AM, Spence Beecher ***@***.***> wrote:
Adding info - i just re-ran this example - https://github.com/
facebookresearch/pysparnn/blob/master/examples/sparse_
search_comparison.ipynb
I get very similar results from before (~60% recall for pysparnn)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AThE9QvujdrwB-F__IzDDJS09AOR5pi8ks5r_DPZgaJpZM4NhTSf>
.
--
Best,
Yuran Wang
|
Hi Yaran - to improve recall you might try MultiClusterIndex(num_indexes=2) or 3. You can also increase the k_clusters parameter for more recall. I would also check to see if the k results you get back still have reasonable distances. It could be that the 2nd best item still isnt so far off. Finally - if your space is very very sparse you can try bumping the matrix_size param when creating the indexes. Note - increasing this param makes shallower trees and eventually becomes brute force search. |
Hi Spence,
Thanks a lot for the suggestions. I started with num_indexes=2. The results
only become comparable with a cosine brute force method when num_indexes is
about 10.
Thanks,
Yuran
…On Tue, May 30, 2017 at 4:38 PM, Spence Beecher ***@***.***> wrote:
Hi Yaran - to improve recall you might try MultiClusterIndex(num_indexes=2)
or 3. You can also increase the k_clusters parameter for more recall. I
would also check to see if the k results you get back still have reasonable
distances. It could be that the 2nd best item still isnt so far off.
Finally - if your space is very very sparse you can try bumping the
matrix_size param when creating the indexes. Note - increasing this param
makes shallower trees and eventually becomes brute force search.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AThE9WdEO8Y6rety8zowchFfygA-6oA3ks5r_H4xgaJpZM4NhTSf>
.
--
Best,
Yuran Wang
|
I use the MultiClusterIndex class. With the search method, I only changed the parameter k to 10, but fewer than 10 nearest neighbors are found, only 80% of the examples returned 10 NNs. What I can adjust to make sure that I get 10 NNs for all the cases?
Another question is in the examples, the doc_index are for the whole dataset. Isn’t it should be only for the training dataset?
The text was updated successfully, but these errors were encountered: