Fewer than k nearest neighbors are found #14

wangyuran · 2017-05-20T13:45:02Z

I use the MultiClusterIndex class. With the search method, I only changed the parameter k to 10, but fewer than 10 nearest neighbors are found, only 80% of the examples returned 10 NNs. What I can adjust to make sure that I get 10 NNs for all the cases?

Another question is in the examples, the doc_index are for the whole dataset. Isn’t it should be only for the training dataset?

spencebeecher · 2017-05-29T21:41:21Z

Hello @wangyuran - This is a great request - thanks for the feedback!

The requested change is now landed. Please let me know if there is anything else!
b726a22

spencebeecher · 2017-05-29T21:41:47Z

Ill leave this request open for another week then close if i don't hear back.

wangyuran · 2017-05-30T14:10:14Z

@spencebeecher , for the index in the example, the doc_index are for the whole dataset, shouldn't it be only for the training dataset?

wangyuran · 2017-05-30T14:37:54Z

@spencebeecher . thank you very much. The output is k neighbors now. However, the new issue is the top 1 NN accuracy in my case drops more than half (68% to 28%). Since the change is just the stopping criteria, I am not sure why this happens.

spencebeecher · 2017-05-30T14:51:08Z

@wangyuran - Oh no! Can you increase the number of k_clusters you search as a quick patch? If you send me a notebook / script I can try to debug it with you. Ill look at this more later on tonight. You can always go back one revision in github to get the old behavior. Let me know what you discover!

spencebeecher · 2017-05-30T15:20:54Z

Adding info - i just re-ran this example - https://github.com/facebookresearch/pysparnn/blob/master/examples/sparse_search_comparison.ipynb

I get very similar results from before (~60% recall for pysparnn)

wangyuran · 2017-05-30T19:53:42Z

Hi Spence, I tried to rerun with the earlier version (Mar 19), it did give similar results. Maybe for my system, the results was not very stable. I tried to run a few times, the values fluctuates quite a bit (5% to 68%), but most of them are about 30%. Anyway, thanks for fixing the issue. Thanks, Yuran

…

On Tue, May 30, 2017 at 11:20 AM, Spence Beecher ***@***.***> wrote: Adding info - i just re-ran this example - https://github.com/ facebookresearch/pysparnn/blob/master/examples/sparse_ search_comparison.ipynb I get very similar results from before (~60% recall for pysparnn) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AThE9QvujdrwB-F__IzDDJS09AOR5pi8ks5r_DPZgaJpZM4NhTSf> .

-- Best, Yuran Wang

spencebeecher · 2017-05-30T20:38:06Z

Hi Yaran - to improve recall you might try MultiClusterIndex(num_indexes=2) or 3. You can also increase the k_clusters parameter for more recall. I would also check to see if the k results you get back still have reasonable distances. It could be that the 2nd best item still isnt so far off.

Finally - if your space is very very sparse you can try bumping the matrix_size param when creating the indexes. Note - increasing this param makes shallower trees and eventually becomes brute force search.

wangyuran · 2017-05-30T21:34:17Z

Hi Spence, Thanks a lot for the suggestions. I started with num_indexes=2. The results only become comparable with a cosine brute force method when num_indexes is about 10. Thanks, Yuran

…

On Tue, May 30, 2017 at 4:38 PM, Spence Beecher ***@***.***> wrote: Hi Yaran - to improve recall you might try MultiClusterIndex(num_indexes=2) or 3. You can also increase the k_clusters parameter for more recall. I would also check to see if the k results you get back still have reasonable distances. It could be that the 2nd best item still isnt so far off. Finally - if your space is very very sparse you can try bumping the matrix_size param when creating the indexes. Note - increasing this param makes shallower trees and eventually becomes brute force search. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AThE9WdEO8Y6rety8zowchFfygA-6oA3ks5r_H4xgaJpZM4NhTSf> .

-- Best, Yuran Wang

spencebeecher self-assigned this May 29, 2017

spencebeecher closed this as completed Jun 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fewer than k nearest neighbors are found #14

Fewer than k nearest neighbors are found #14

wangyuran commented May 20, 2017

spencebeecher commented May 29, 2017

spencebeecher commented May 29, 2017

wangyuran commented May 30, 2017

wangyuran commented May 30, 2017 •

edited

spencebeecher commented May 30, 2017

spencebeecher commented May 30, 2017

wangyuran commented May 30, 2017 via email

spencebeecher commented May 30, 2017

wangyuran commented May 30, 2017 via email

Fewer than k nearest neighbors are found #14

Fewer than k nearest neighbors are found #14

Comments

wangyuran commented May 20, 2017

spencebeecher commented May 29, 2017

spencebeecher commented May 29, 2017

wangyuran commented May 30, 2017

wangyuran commented May 30, 2017 • edited

spencebeecher commented May 30, 2017

spencebeecher commented May 30, 2017

wangyuran commented May 30, 2017 via email

spencebeecher commented May 30, 2017

wangyuran commented May 30, 2017 via email

wangyuran commented May 30, 2017 •

edited