Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probelm with result = j.Search(q[t], k) #65

Open
Arctanxy opened this issue Jun 18, 2019 · 2 comments
Open

Probelm with result = j.Search(q[t], k) #65

Arctanxy opened this issue Jun 18, 2019 · 2 comments
Labels
bug Something isn't working

Comments

@Arctanxy
Copy link

Arctanxy commented Jun 18, 2019

I used SPTAG in face searching. There're more than a million faces in my database.

  1. When I change k value from 28000 to 30000, the searching time raised from 0.4s to several minutes.

  2. I got many duplicate indices from j.Search() when I set k to a value bigger than 10000.

Looking forward to your reply.

My environment:

Python 3.7
Ubuntu 18.04

@RioMichael
Copy link

RioMichael commented Jun 21, 2019

I also got the duplicate problem (but didn't get any search time problem). The duplicate problem occurred even when I tried to search with about 1000-NN (tested both BKT and KDT with data with about 100 dimension, and size of around 100,000 using Cosine as distance method), and it happened not only when using Python wrapper, but also when directly using IndexSearcher (so I guess the problem lies withing the source code).
I'm currently reading the source code, but since I'm not very familiar with c++, I haven't found any problem yet. Anyone has some ideas or solutions on this?

My environment:

Python 2.7
Windows 10/Ubuntu 16.04

@RioMichael
Copy link

So I was focusing on testing different parameters in the past two weeks, and didn't look too deep into the source code. But I do temporally solve the duplicate problem by adding a check when adding a node to the result query (QueryResultSet.h under SPTAGLib):

bool AddPoint(const int index, float dist)
    {
	for (int i = 0; i < m_results.Length(); i++) {
		if (index == m_results[i].VID) {
			return false;
		}
	}
        if (dist < m_results[0].Dist || (dist == m_results[0].Dist && index < m_results[0].VID))
        {
            m_results[0].VID = index;
            m_results[0].Dist = dist;
            Heapify(m_resultNum);
            return true;
        }
        return false;
    }

Although making this change can avoid the duplicates, I believe that this is not the alternative solution as the same node should not be able to reach AddPoint function more than once (maybe something wrong when checking the hash table, haven't looked at the hash table yet).
@Arctanxy Have you got a better solution? Also, did you get good result on your face dataset? I'm also testing with a face database (256 dimension, and using L2 distance), but did not get good result (have tried to tuning almost all the parameters.) But do get good results on all the datasets on https://github.com/erikbern/ann-benchmarks

@MaggieQi MaggieQi added the bug Something isn't working label Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants