Probelm with result = j.Search(q[t], k) #65

Arctanxy · 2019-06-18T06:03:18Z

I used SPTAG in face searching. There're more than a million faces in my database.

When I change k value from 28000 to 30000, the searching time raised from 0.4s to several minutes.
I got many duplicate indices from j.Search() when I set k to a value bigger than 10000.

Looking forward to your reply.

My environment:

Python 3.7
Ubuntu 18.04

RioMichael · 2019-06-21T04:16:21Z

I also got the duplicate problem (but didn't get any search time problem). The duplicate problem occurred even when I tried to search with about 1000-NN (tested both BKT and KDT with data with about 100 dimension, and size of around 100,000 using Cosine as distance method), and it happened not only when using Python wrapper, but also when directly using IndexSearcher (so I guess the problem lies withing the source code).
I'm currently reading the source code, but since I'm not very familiar with c++, I haven't found any problem yet. Anyone has some ideas or solutions on this?

My environment:

Python 2.7
Windows 10/Ubuntu 16.04

RioMichael · 2019-07-03T10:31:35Z

So I was focusing on testing different parameters in the past two weeks, and didn't look too deep into the source code. But I do temporally solve the duplicate problem by adding a check when adding a node to the result query (QueryResultSet.h under SPTAGLib):

bool AddPoint(const int index, float dist)
    {
	for (int i = 0; i < m_results.Length(); i++) {
		if (index == m_results[i].VID) {
			return false;
		}
	}
        if (dist < m_results[0].Dist || (dist == m_results[0].Dist && index < m_results[0].VID))
        {
            m_results[0].VID = index;
            m_results[0].Dist = dist;
            Heapify(m_resultNum);
            return true;
        }
        return false;
    }

Although making this change can avoid the duplicates, I believe that this is not the alternative solution as the same node should not be able to reach AddPoint function more than once (maybe something wrong when checking the hash table, haven't looked at the hash table yet).
@Arctanxy Have you got a better solution? Also, did you get good result on your face dataset? I'm also testing with a face database (256 dimension, and using L2 distance), but did not get good result (have tried to tuning almost all the parameters.) But do get good results on all the datasets on https://github.com/erikbern/ann-benchmarks

MaggieQi added the bug Something isn't working label Jul 10, 2019

RioMichael mentioned this issue Jul 12, 2019

Hash table of visited nodes for NGQueue #79

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Probelm with result = j.Search(q[t], k) #65

Probelm with result = j.Search(q[t], k) #65

Arctanxy commented Jun 18, 2019 •

edited

Loading

RioMichael commented Jun 21, 2019 •

edited

Loading

RioMichael commented Jul 3, 2019

Probelm with result = j.Search(q[t], k) #65

Probelm with result = j.Search(q[t], k) #65

Comments

Arctanxy commented Jun 18, 2019 • edited Loading

RioMichael commented Jun 21, 2019 • edited Loading

RioMichael commented Jul 3, 2019

Arctanxy commented Jun 18, 2019 •

edited

Loading

RioMichael commented Jun 21, 2019 •

edited

Loading