Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with index.add_items() when building large indexes? #48

Open
loisaidasam opened this issue Jan 9, 2024 · 3 comments
Open

Issue with index.add_items() when building large indexes? #48

loisaidasam opened this issue Jan 9, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@loisaidasam
Copy link

When building indexes of varying sizes I ran into some issues with some of the larger sizes..

Here's what my index creation code looks like:

# imagine `vectors` is an ndarray with multiple vectors of dimension 1728 ...
num_dimensions = vectors[0].shape[0]
index = Index(Space.Cosine, num_dimensions=num_dimensions)
index.add_items(vectors)
index.save(filename)

And my test code looks like this:

# imagine `vector` is a sample query vector of (matching) dimension 1728
index = Index.load(filename)
index.query(vector, k=200)

This works fine when vectors is of cardinality 10k, 50k, 100k, 500k, and 1M ...

but when vectors has 5M or 10M vectors in it, index creation runs fine, but upon querying ...

     index.query(vector, k=200)
RuntimeError: Potential candidate (with label '4963853') had negative distance -17059.800781. This may indicate a corrupted index file. 

I tried creating the index with slices of the same vectors array of size 1M:

start = 0
while start < vectors.shape[0]:
    end = start + batch_size
    index.add_items(vectors[start:end])
    start = end

and it seems I can query this index just fine. Maybe some sort of limitation with the add_items() function?

@markkohdev markkohdev added the bug Something isn't working label Jan 17, 2024
@markkohdev markkohdev self-assigned this Jan 17, 2024
@markkohdev
Copy link
Contributor

hey @loisaidasam thank you for reporting, that result definitely shouldn't be happening! We've seen similar behavior when accidentally adding vectors with NaN values into the index, but I don't think the issue is related. It's more likely due to a race condition somewhere so we'll investigate this and get back to you!

@Lorenzoncina
Copy link

Any update on this tipoc?

@stephen29xie
Copy link
Contributor

Hi all. This error was removed in #80 and included in the 2.0.9 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants