Skip to content

Investigate our neighbors algorithm choice #4131

@ilan-gold

Description

@ilan-gold

Right now, we are relying on the underlying implementation in umap for our neighbors calculation due to (potentially outdated) matintainability/usability concerns of almost every other library. We use PyNNDescent currently AFAICT:

if index := getattr(transformer, "index_", None):
from pynndescent import NNDescent
if isinstance(index, NNDescent):
# very cautious here
# TODO catch the correct exception
with contextlib.suppress(Exception):
self._rp_forest = _make_forest_dict(index)

in absence of a transformer but I think this should be reconsidered.

https://github.com/nmslib/hnswlib seems to be maintained if not amazingly documented
https://github.com/meilisearch/arroy lacks python bindings but looks to be maintained
https://github.com/facebookresearch/faiss seems to be maintained but like the first option lacks documentation it seems
https://github.com/zilliztech/pyglass top performer on https://ann-benchmarks.com/index.html#algorithms

Overall, we need to make a reasonable judgement about the speed-accuracy-maintainability tradeoff and try to figure out if we should make this change for our users.

See https://ann-benchmarks.com/index.html as well

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions