Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

Fewer than k nearest neighbors are found #14

Closed
wangyuran opened this issue May 20, 2017 · 9 comments
Closed

Fewer than k nearest neighbors are found #14

wangyuran opened this issue May 20, 2017 · 9 comments
Assignees

Comments

@wangyuran
Copy link

I use the MultiClusterIndex class. With the search method, I only changed the parameter k to 10, but fewer than 10 nearest neighbors are found, only 80% of the examples returned 10 NNs. What I can adjust to make sure that I get 10 NNs for all the cases?

Another question is in the examples, the doc_index are for the whole dataset. Isn’t it should be only for the training dataset?

@spencebeecher
Copy link
Contributor

Hello @wangyuran - This is a great request - thanks for the feedback!

The requested change is now landed. Please let me know if there is anything else!
b726a22

@spencebeecher
Copy link
Contributor

Ill leave this request open for another week then close if i don't hear back.

@spencebeecher spencebeecher self-assigned this May 29, 2017
@wangyuran
Copy link
Author

@spencebeecher , for the index in the example, the doc_index are for the whole dataset, shouldn't it be only for the training dataset?

@wangyuran
Copy link
Author

wangyuran commented May 30, 2017

@spencebeecher . thank you very much. The output is k neighbors now. However, the new issue is the top 1 NN accuracy in my case drops more than half (68% to 28%). Since the change is just the stopping criteria, I am not sure why this happens.

@spencebeecher
Copy link
Contributor

@wangyuran - Oh no! Can you increase the number of k_clusters you search as a quick patch? If you send me a notebook / script I can try to debug it with you. Ill look at this more later on tonight. You can always go back one revision in github to get the old behavior. Let me know what you discover!

@spencebeecher
Copy link
Contributor

Adding info - i just re-ran this example - https://github.com/facebookresearch/pysparnn/blob/master/examples/sparse_search_comparison.ipynb

I get very similar results from before (~60% recall for pysparnn)

@wangyuran
Copy link
Author

wangyuran commented May 30, 2017 via email

@spencebeecher
Copy link
Contributor

Hi Yaran - to improve recall you might try MultiClusterIndex(num_indexes=2) or 3. You can also increase the k_clusters parameter for more recall. I would also check to see if the k results you get back still have reasonable distances. It could be that the 2nd best item still isnt so far off.

Finally - if your space is very very sparse you can try bumping the matrix_size param when creating the indexes. Note - increasing this param makes shallower trees and eventually becomes brute force search.

@wangyuran
Copy link
Author

wangyuran commented May 30, 2017 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants