Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First lookup for an ID is slow #376

Closed
scottbreyfogle opened this issue Mar 29, 2019 · 24 comments
Closed

First lookup for an ID is slow #376

scottbreyfogle opened this issue Mar 29, 2019 · 24 comments

Comments

@scottbreyfogle
Copy link

scottbreyfogle commented Mar 29, 2019

I'm seeing a problem when using the library where calling get_nns_by_vector is very slow on first execution for a given value (15s for an index with ~2m vectors of length 200), and then much faster in subsequent calls (subsecond). I've also seen similar behavior in get_item_vector, but I'm not sure if it's related. I've set prefault to true when loading the index from disk.

The strangest part is that it only occurs on some computers, and I'm not able to repro on my local machine to get details on what is going on. If I load the same index on my computer, the execution time is always subsecond. I'm continuing to look into this to see if I can find a reliable method of reproducing.

Has anyone seen performance patterns like this? Do you have thoughts on what the problem may be? I'm not sure that it's Annoy specifically, but would be good to know if it is.

My current thoughts are that it has to do with the MMAPing and that the index is not all saved into RAM and the disk lookups are slow on some machines. I'm not very familiar with MMAPing and would appreciate outside thoughts on whether that's reasonable and/or how to verify if it's the problem.

@erikbern
Copy link
Collaborator

yeah seems possible the page cache just isn't warm and that prefault for whatever reason isn't working on your machine

if you want to warm up the page cache – consider doing a sequential scan or a few thousand random lookups first

@scottbreyfogle
Copy link
Author

scottbreyfogle commented Mar 29, 2019

Looking at the code, it seems like prefault only triggers if MAP_POPULATE is defined. It does not seem to be set anywhere in the library. Is that something defined on the system in question? Part of compilation? Something I should have set in my code calling the library?

In short, is there a way for me to check on a particular computer/installation if that flag is enabled?

Edit: Thank you so much for the quick response, by the way! I really appreciate the library and documentation.

@erikbern
Copy link
Collaborator

There's some doc here: http://man7.org/linux/man-pages/man2/mmap.2.html

We should probably throw a warning or similar if MAP_POPULATE isn't defined and prefault = true here: https://github.com/spotify/annoy/blob/master/src/annoylib.h#L925

@scottbreyfogle
Copy link
Author

I ended up moving to a different ann library because of this. My best guess about the problem is that something about the cloud/kubernetes configuration where the code was running caused MAP_POPULATE to be undefined, but I did not confirm.

@erikbern
Copy link
Collaborator

erikbern commented Apr 3, 2019

@scottbreyfogle why is this a problem though? you can just do some random lookups to warm up the page cache, (or just loop through all vectors)

anyway closing for now

@erikbern erikbern closed this as completed Apr 3, 2019
@scottbreyfogle
Copy link
Author

That is a possibility that I considered. It was really a cost-benefit. A 15 second latency hit was really not acceptable for us, and I decided that it would be more reliable to switch to something I understood more.

If I can't understand fully what is going wrong (only guess), then I'm not comfortable making guarantees about the comprehensiveness of a solution. Are there multiple data structures that need to be warmed? If I call a get_item_vector on all elements, will that fix the problem for get_nns_by_vector? Or do I need to loop on both, which would presumably take quite a long time? What about get_nns_by_vector on un-indexed vectors?

It was just simpler to move to a solution where I didn't have to test all that, especially when validation needed to be done mostly on a remote machine with a very large dataset, since the issue is hard to detect locally or with a small dataset.

Closing seems reasonable though.

@erikbern
Copy link
Collaborator

erikbern commented Apr 3, 2019

seems fair. generally i don't think the linux kernel guarantees anything about mmap and swapping it out from primary memory, but i could be wrong. i'll look into the mmap flags again at some point

@shoegazerstella
Copy link

shoegazerstella commented Apr 4, 2019

Hello @erikbern,
We are experiencing the same 'issue'.
ATM we have about 60k items stored in an annoy index. We noticed that the first query (queryByVector) takes quite long (44seconds), but the second was 0.3-0.4 seconds already.
This runs inside an API and we could think about running some random queries after the index loading. How many rounds should I consider to do? You previously suggested a few thousand: is this number somehow related to the size of the index?

@erikbern
Copy link
Collaborator

erikbern commented Apr 4, 2019

yeah, basically you need to scan through the index and make sure every page is hit (i think the linux page size is 4kb?)

so scan through and hit maybe every 100 vectors, and that should be fine (it probably won't be much slower to hit every vector actually)

@loretoparisi
Copy link

@erikfox what about the distribution of the hits among the index? Shall we randomly select the hits to call / 100 vectors so that the distribution of the hits over the data will likely be uniform?

@sonots
Copy link

sonots commented Jun 14, 2019

A comment as a SRE. Relying on disk cache is unstable in terms of performance.
SOLUTION: Use tmpfs to locate annoy files.

@chikubee
Copy link

chikubee commented Jan 27, 2020

That is a possibility that I considered. It was really a cost-benefit. A 15 second latency hit was really not acceptable for us, and I decided that it would be more reliable to switch to something I understood more.

If I can't understand fully what is going wrong (only guess), then I'm not comfortable making guarantees about the comprehensiveness of a solution. Are there multiple data structures that need to be warmed? If I call a get_item_vector on all elements, will that fix the problem for get_nns_by_vector? Or do I need to loop on both, which would presumably take quite a long time? What about get_nns_by_vector on un-indexed vectors?

It was just simpler to move to a solution where I didn't have to test all that, especially when validation needed to be done mostly on a remote machine with a very large dataset, since the issue is hard to detect locally or with a small dataset.

Closing seems reasonable though.

@scottbreyfogle Which solution did you move to, to address this problem, as I am facing similar issues.

@erikbern
Copy link
Collaborator

you can load it with prefault = True – i believe this should speed it up significantly

@chikubee
Copy link

@erdtman I get the following error, i am working on mac, prefault is set to true, but MAP_POPULATE is not defined on this platform.

@erikbern
Copy link
Collaborator

yeah i believe prefault doesn't work on os x unfortunately

@erikbern
Copy link
Collaborator

as a workaround, you could iterate over all indices and run get_vector just to warm up the page cache. or another way is you can just run cat index.ann > /dev/null on the command line

@loretoparisi
Copy link

@erikbern what will happen when doing cat index.ann > /dev/null? thanks

@erikbern
Copy link
Collaborator

@loretoparisi typically the kernel will cache that file in memory, meaning subsequent random access to it will be very fast

@erikbern
Copy link
Collaborator

https://serverfault.com/a/43391 confirms this :)

@eddie-scio
Copy link

I'm running into an interesting issue -- when I query locally (1.5m items, 128-d, num_trees = 100, k = 1000, search_k = 500000), I always get sub 100ms. This is on OSX, where prefault=True does nothing. When I deploy to Google App Engine, I'm getting O(15s) queries. I've tried with both prefault = True there and looping through the whole index and calling .get_item_vector(i), and neither of them resolve the slow first-query problem. I suspect that the ephemeral disk used with GAE flex might be interfering with the page cache. Any thoughts?

@eddie-scio
Copy link

I've confirmed with vmtouch that my local environment is correctly mmapping the index, whereas on GAE it is not (even with both prefault=True and the full item scan I mentioned above).

eddiezhou:~/workspace/vmtouch[11:29:52] (master) $ vmtouch /tmp/index.ann
           Files: 1
     Directories: 0
  Resident Pages: 734722/734722  2G/2G  100%
         Elapsed: 0.12743 seconds

root@cfa460207703:/home/vmagent/app/vmtouch# vmtouch /tmp/index.ann
           Files: 1
     Directories: 0
  Resident Pages: 216861/734722  847M/2G  29.5%
         Elapsed: 0.030743 seconds

@erikbern
Copy link
Collaborator

i don't think prefault is supported on all platforms

@eddie-scio
Copy link

I believe prefault is working on GAE (there's no warning like there is when I run locally). For anyone who's trying to get this working on GAE, allocating more RAM to my instance solved this problem, I think there was more eviction of the index from the filesystem cache due to memory constraint. Using tmpfs as suggested above is the way to go.

@akshaykarangale
Copy link

as a workaround, you could iterate over all indices and run get_vector just to warm up the page cache. or another way is you can just run cat index.ann > /dev/null on the command line

This worked like magic. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants