Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: Index size is not a multiple of vector size #423

Closed
dtMndas opened this issue Sep 25, 2019 · 28 comments · Fixed by #426
Closed

OSError: Index size is not a multiple of vector size #423

dtMndas opened this issue Sep 25, 2019 · 28 comments · Fixed by #426

Comments

@dtMndas
Copy link

dtMndas commented Sep 25, 2019

i use annoy to build ,which the number of words is 5844240 and the vector size is 200. it raise a error: 【OSError: Index size is not a multiple of vector size】。please help me

@erikbern
Copy link
Collaborator

can you share code?

@dtMndas
Copy link
Author

dtMndas commented Sep 26, 2019

from annoy import AnnoyIndex
wv_model=Word2VecKeyedVectors.load_word2vec_format("w2v.bin",binary=True)
tc_index = AnnoyIndex(200,"angular")
i = 1
for key in wv_model.vocab.keys():
v = wv_model.get_vector(key)
tc_index.add_item(i,v)
i += 1
tc_index.build(100)
tc_index.save('annoy_build.index')

this is my code .please help me check it .thanks

@erikbern
Copy link
Collaborator

this doesn't seem right – that error should only be thrown at the point of loading an index. is there more code to it?

@dtMndas dtMndas closed this as completed Sep 27, 2019
@dtMndas dtMndas reopened this Sep 27, 2019
@dtMndas
Copy link
Author

dtMndas commented Sep 27, 2019

there has not other code ,the detail error is :
Warning: index size 18446744073709551615
Error: index size 18446744073709551615 is not a multiple of vector size 812
Traceback (most recent call last):
File "annoy_handler.py", line 29, in
tc_index.save('../resources/tencent/tencent_build.index')
OSError: Index size is not a multiple of vector size

@dtMndas
Copy link
Author

dtMndas commented Sep 27, 2019

i guess whether is the vocab size too large . i run the vocab size 100000 is ok

@erikbern
Copy link
Collaborator

That's super odd, but 18446744073709551615 = 0xffffffffffffffff = 2^64-1 which provides a clue.

looks like lseek might be returning -1: http://man7.org/linux/man-pages/man2/lseek.2.html

i guess we could catch this and inspect errno to find out what's really going on

@dtMndas
Copy link
Author

dtMndas commented Sep 28, 2019

so ,annoy index just can reach the maximum index (2^64-1), and it just not a multiple of 812。and throw the error

@erikbern
Copy link
Collaborator

there's no way your index is that big though

@dtMndas
Copy link
Author

dtMndas commented Sep 28, 2019

so the annoy is not suitable for big word2vec model,which contain lot of vocab,such as more than 1000000

@erikbern
Copy link
Collaborator

@huanggengkeng it totally is. you're having some other issues. i'm making a fix so that we throw a proper error when lseek fails. that will hopefully reveal the true issue

@erikbern
Copy link
Collaborator

are you on some weird platform like windows?

erikbern pushed a commit that referenced this issue Sep 29, 2019
Handle lseek failures – follow up from #423
@iraykhel
Copy link

iraykhel commented Oct 3, 2019

I am having this issue. Index has about 2 million vectors of size 300. I am on "a weird platform like Windows" :D

@erikbern
Copy link
Collaborator

erikbern commented Oct 3, 2019

can you install the latest version from github and try? I improved the error messages recently for this bug

@ambroserb3
Copy link

ambroserb3 commented Nov 4, 2019

I'm also experiencing this error with the latest version.
I am running it on a linux system.
Error: index size 104785884 is not a multiple of vector size 412
OSError: Index size is not a multiple of vector size
It happens when loading the embedding.
What does this error really mean and how do I resolve it?

@erikbern
Copy link
Collaborator

erikbern commented Nov 4, 2019

can you share code? typically happens when you open with the wrong metric

@illagrenan
Copy link

I had the same error. I passed different (larger) vector dimensions when loading index.

@abhaymise
Copy link

can you share code? typically happens when you open with the wrong metric

yeah , this is the case when i was getting the error. We have to pass the same metric while indexing and loading. Giving the same metric solved the issue.

@ambroserb3
Copy link

Thanks,

Switching to Euclidean fixed it.

@IITPatnaProjectDeepLearning

i have the same issue.
code is as follows:
annoyIndex = AnnoyIndex(4096, metric='euclidean')

print(args.annoy_file_path)

annoyIndex.load(args.annoy_file_path)
and the error is:
Error: index size 1619001344 is not a multiple of vector size 16400

@paulelvers
Copy link

I was having the issue "Index size is not a multiple of vector size: Unknown error: 316 (316)"

Downgrading annoy from 1.16.3 to 1.15.1 solved the issue for me.

@erikbern
Copy link
Collaborator

That's super odd – nothing major should have changed between 1.15 and 1.16. What platform are you on?

@paulelvers
Copy link

Ok. Figured out it was an incomplete download that caused the error. The error disappeared when downgrading, but the index was empty. With the complete download, it works fine in both versions.

@erikbern
Copy link
Collaborator

great!

@thvasilo
Copy link
Contributor

thvasilo commented Apr 30, 2020

Just ran into this as well, perhaps a more informative error message is warranted, since this commonly happens when you open with the wrong metric?

@erikbern
Copy link
Collaborator

erikbern commented May 1, 2020

@thvasilo – sure, mind sending a pull request with a more descriptive message?

@thvasilo
Copy link
Contributor

thvasilo commented May 1, 2020 via email

@erikbern
Copy link
Collaborator

erikbern commented May 1, 2020

should be in annoylib.h, i think

@erikbern
Copy link
Collaborator

erikbern commented May 1, 2020

thvasilo added a commit to thvasilo/annoy that referenced this issue May 3, 2020
Users might commonly save with a non-default metric, then try to open using the default metric. This gives a more informative message.

@erikbern let me know if you like the language and if line length is a concern.
erikbern pushed a commit that referenced this issue May 3, 2020
A more informative error for  #423
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants