Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing a particular item from AnnoyIndex #191

Closed
AakashKumarNain opened this issue Mar 14, 2017 · 11 comments
Closed

Removing a particular item from AnnoyIndex #191

AakashKumarNain opened this issue Mar 14, 2017 · 11 comments

Comments

@AakashKumarNain
Copy link

Suppose I have an AnnoyIndex, say ann_index(100), and I want to remove a particular index item from
ann_index , say 10th item. Is there any way to do this ?

@erikbern
Copy link
Collaborator

sorry there's no way atm... we could probably support this by having a special "tombstone" vector (like all zeros or something) so lmk if you want to add support for this and i can give you the rough outline

@AakashKumarNain
Copy link
Author

Sure.

@mazorigal
Copy link

Hi,
is there any update regarding that issue ?

thanks,

@erikbern
Copy link
Collaborator

erikbern commented Dec 7, 2017

No

I also don't see a huge use case for this. If it's before building the index, why don't you just prevent adding it in the first place. If it's after building the index, the dataset is immutable anyway, so it won't work.

@erikbern erikbern closed this as completed Dec 7, 2017
@mazorigal
Copy link

mazorigal commented Dec 7, 2017

according to the docs:

from annoy import AnnoyIndex
import random

f = 40
t = AnnoyIndex(f)  # Length of item vector that will be indexed
for i in xrange(1000):
    v = [random.gauss(0, 1) for z in xrange(f)]
    t.add_item(i, v)

t.build(10) # 10 trees
t.save('test.ann')

I tried and saw that when I have new items which I would like to include in the index, or to override existing items in the index, I can just use
t.add(i,v) for the new item, or item to be overridden, without adding all other items again.
then just using t.build(10) to build the new tree.
The tree build step is quite fast, hence can fit for streaming use cases, in which I would like to update and rebuild the index in real-time for new streaming items.
Sometimes there are items which are deleted, and I dont want to recommend them, and here I see a big advantage to have the ability removing those items from the index.

@erikbern
Copy link
Collaborator

erikbern commented Dec 7, 2017

It's possible you can run t.build multiple times, but that's not a "supported" feature and I would discourage you from relying on it. Every time you call t.build, it will allocate a lot of new memory for the tree structure.

@mazorigal
Copy link

could you please explain a bit more the meaning "allocate a lot of new memory for the tree structure" ? does the memory allocation is for completely new tree which is re-builded each time t.build() is executed ? does the memory of the "old" tree would be released once t.build() is triggered ?
Can I conclude from your answer that Annoy is not suitable for streaming updates style but rather batch training, let say each 1 hour ?

@erikbern
Copy link
Collaborator

erikbern commented Dec 8, 2017

yes, annoy is not suitable for streaming updates

any time t.build is invoked, it will allocate a lot of new memory

@mazorigal
Copy link

Ok, thanks.
Is there any option at all to implement streaming ann ? Any ideas ?

@erikbern
Copy link
Collaborator

sorry no – if I had infinite time available, I would implement it, but alas I don't :)

@piskvorky
Copy link
Contributor

@mazorigal have a look at #96 for a previous discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants