Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault 11 when using non sequential IDs #288

Closed
gregsadetsky opened this issue Apr 8, 2018 · 7 comments
Closed

segfault 11 when using non sequential IDs #288

gregsadetsky opened this issue Apr 8, 2018 · 7 comments

Comments

@gregsadetsky
Copy link

gregsadetsky commented Apr 8, 2018

I just came upon a reproducible segfault that seems to be related to non-sequential IDs.

I am working with 10-dimensional vectors and for this small reproducible example, 200 items. When inserting non sequential IDs, if I attempt to get_nns_by_vector with a n value of 100, I get a segfault. Calling it with a n of 10 does work. The specific vector that's passed to get_nns_by_vector also makes a difference (see comments in code below)

Please get the values from this gist.

from annoy import AnnoyIndex
import numpy as np

# path to values file
VALUES_PATH = 'annoy_segfault_values.txt'
# path to temporary annoy index file
ANNOY_INDEX_PATH = 'test.ann'

NMB_VECTORS = 10
NMB_TREES = 20

t = AnnoyIndex(NMB_VECTORS, metric='euclidean')
with open(VALUES_PATH) as f:
  for line_idx, line in enumerate(f):
    idx, vector = line.strip().split(';')
    idx = int(idx)
    vector = np.fromstring(vector, dtype='float64', sep=' ')
    # using 'line_idx' in the line below instead of 'idx'
    # (i.e., using sequential IDs) fixes the issue
    t.add_item(idx, vector)

t.build(NMB_TREES)
t.save(ANNOY_INDEX_PATH)

# .....

u = AnnoyIndex(NMB_VECTORS, metric='euclidean')
u.load(ANNOY_INDEX_PATH)

# segfault only happens with certain vectors.
# for instance, np.zeros(10) does not segfault.
# the term at index 4 must be approximately >=0.9
# to reproduce the segfault
needle_vector = np.array([0., 0., 0., 0., 0.9, 0., 0., 0., 0., 0.])
needle_vector = needle_vector.astype('float64')

# segfault 11
out = u.get_nns_by_vector(needle_vector, 100)
print out

As a semi-related, but different issue, if I switch the metric to manhattan (which I should probably use in this case), one of the returned result indices is 0, which was not part of the original indices.

switch

t = AnnoyIndex(NMB_VECTORS, metric='manhattan')

and

u = AnnoyIndex(NMB_VECTORS, metric='manhattan')

in the code above to see the issue

@erikbern
Copy link
Collaborator

erikbern commented Apr 8, 2018

weird, i can take a look

@erikbern
Copy link
Collaborator

erikbern commented Apr 8, 2018

honestly the support for missing vectors is a bit flaky

@erikbern
Copy link
Collaborator

erikbern commented Apr 9, 2018

I just tried running your code and it seems to work well (does not segfault)

did you use the latest version? i've fixed a number of issues related to index "holes"

@erikbern
Copy link
Collaborator

erikbern commented Apr 9, 2018

The latest master also doesn't return 0 as one of the neighbors although I don't think that's on pypi yet.

Can you please check if latest version of pypi or latest master still exhibits the issue you saw?

@gregsadetsky
Copy link
Author

gregsadetsky commented Apr 9, 2018

woah, that's absolutely awesome! just tested on the latest master and it does not, indeed, cause either segfault nor the incorrect index value return. apologies for not testing it on the latest code..!

thank you x 1000000!

ok to close this I guess?

@gregsadetsky
Copy link
Author

gregsadetsky commented Apr 9, 2018

err, even the latest version on pypi does not have these issues. I should have definitely upgraded before filing this...!

thanks again. definitely closing. :-)


for the record, I was running annoy==1.11.4, which I just re-tested and does segfault/return 0.

@erikbern
Copy link
Collaborator

erikbern commented Apr 9, 2018

sweet, thanks for confirming. i might push the 0 fix to pypi and bump the version to 1.11.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants