Mention brute-force approach, link to vector indexing issue

Refs #216. Closes #214
simonw · Sep 4, 2023 · f842fbe · f842fbe
1 parent 94f0a1a
commit f842fbe
Show file tree

Hide file tree

Showing 2 changed files with 5 additions and 1 deletion.
diff --git a/docs/embeddings/cli.md b/docs/embeddings/cli.md
@@ -285,6 +285,8 @@ llm-docs/plugins/index.md
 
 The `llm similar` command searches a collection of embeddings for the items that are most similar to a given or item ID.
 
+This currently uses a slow brute-force approach which does not scale well to large collections. See [issue 216](https://github.com/simonw/llm/issues/216) for plans to add a more scalable approach via vector indexes provided by plugins.
+
 To search the `quotations` collection for items that are semantically similar to `'computer science'`:
 
 ```bash

diff --git a/docs/embeddings/python-api.md b/docs/embeddings/python-api.md
@@ -116,7 +116,9 @@ if Collection.exists(db, "entries"):
 (embeddings-python-similar)=
 ## Retrieving similar items
 
-Once you have populated a collection of embeddings you can retrieve the entries that are most similar to a given string using the `similar()` method:
+Once you have populated a collection of embeddings you can retrieve the entries that are most similar to a given string using the `similar()` method.
+
+This method uses a brute force approach, calculating distance scores against every document. This is fine for small collections, but will not scale to large collections. See [issue 216](https://github.com/simonw/llm/issues/216) for plans to add a more scalable approach via vector indexes provided by plugins.
 
 ```python
 for entry in collection.similar("hound"):