Skip to content

Working with subsets #263

@FlorianWilhelm

Description

@FlorianWilhelm

First of all, thanks for providing such a useful piece of software, I find it especially useful when dealing with embeddings. I was wondering if I can somehow define at query time a subset of items that should be considered when calculating the kNN.

Let's assume I want to build some kind of search application that besides some user provided filters also considers the preferences I have collected about the user in form of an user embedding. I could for instance use ElasticSearch to retrieve a list of feasible item ids fulfilling the user's filter criteria. Now I want to find the kNN given the user's embedding in my index of all documents but restricted to the subset of feasible items which I retrieved before.

Another possibility to solve this would be if you allow me to add metadata when adding an item to the annoy index. With an additionally provided filter clause annoy could then only consider the item vectors having the defined metadata when calculating the kNN.

How is Spotify solving this problem, anyhow? Do you have an extended version of annoy?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions