New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which libraries support incremental item additions? #36

Open
alexklibisz opened this Issue Oct 16, 2017 · 9 comments

Comments

Projects
None yet
4 participants
@alexklibisz
Copy link

alexklibisz commented Oct 16, 2017

Thanks for putting this project together. Is there any way a table or small list could be included to describe which libraries support incremental item additions? I imagine for many this is an important criteria in real-time or near-real-time applications where new content is continuously added and needs to be indexed for look-ups.

@erikbern

This comment has been minimized.

Copy link
Owner

erikbern commented Oct 16, 2017

i don't think any of them do, unfortunately :(

@aaalgo

This comment has been minimized.

Copy link
Contributor

aaalgo commented Oct 16, 2017

Last time I checked (many years ago), FLANN's C++ API supported insertion. In theory it's also easy for all LSH-based methods to support insertion. The Python API might not have such functionality exposed though.

@alexklibisz

This comment has been minimized.

Copy link
Author

alexklibisz commented Oct 16, 2017

@aaalgo Yes from the FLANN website:

(14 December 2012) Version 1.8.0 is out bringing incremental addition/removal of points to/from indexes

Though I'm still trying to figure out how to leverage this via Python.

@alexklibisz

This comment has been minimized.

Copy link
Author

alexklibisz commented Oct 16, 2017

@erikbern Do you have any recommendations for "online" NN-search from your experience? One approach I'm considering is to maintain an indexed lookup (e.g. Annoy) for the majority of my items as well as a brute-force lookup (e.g. sklearn) for the newest items. Then at some threshold, rebuild the index to include the newest points.

@erikbern

This comment has been minimized.

Copy link
Owner

erikbern commented Oct 17, 2017

I don't have any great recommendations, but your idea makes sense though. You could even formalize it and have indexes that have size even powers of two, and any time you have two indexes of the same size, you merge them. That would give you log(n) amortized runtime, although worst case is linear.

@yurymalkov

This comment has been minimized.

Copy link
Contributor

yurymalkov commented Oct 17, 2017

@alexklibisz Both hnsw and sw-graph are built incrementally, and thus naturally support incremental item addition. Hovewer, nmslib does not have a python interface for that, only c++.

@erikbern @alexklibisz I think you are talking about Bentley-Saxe method. http://www.sciencedirect.com/science/article/pii/S030643791300104X

@erikbern

This comment has been minimized.

Copy link
Owner

erikbern commented Oct 17, 2017

@yurymalkov yeah i've never heard of it, but think that's what i meant. basically play 2048 with the indexes :)

@alexklibisz

This comment has been minimized.

Copy link
Author

alexklibisz commented May 18, 2018

For anyone interested, I built a project that supports the feature from my original question on top of Elasticsearch: https://github.com/alexklibisz/elastik-nearest-neighbors

@yurymalkov

This comment has been minimized.

Copy link
Contributor

yurymalkov commented May 18, 2018

@alexklibisz Cool
@erikbern There are now several algorithms that support updates: flann, hnswlib, faiss (batched updates). Probably, this should be noted in readme somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment