Skip to content

Commit

Permalink
Mention brute-force approach, link to vector indexing issue
Browse files Browse the repository at this point in the history
Refs #216. Closes #214
  • Loading branch information
simonw committed Sep 4, 2023
1 parent 94f0a1a commit f842fbe
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 1 deletion.
2 changes: 2 additions & 0 deletions docs/embeddings/cli.md
Expand Up @@ -285,6 +285,8 @@ llm-docs/plugins/index.md

The `llm similar` command searches a collection of embeddings for the items that are most similar to a given or item ID.

This currently uses a slow brute-force approach which does not scale well to large collections. See [issue 216](https://github.com/simonw/llm/issues/216) for plans to add a more scalable approach via vector indexes provided by plugins.

To search the `quotations` collection for items that are semantically similar to `'computer science'`:

```bash
Expand Down
4 changes: 3 additions & 1 deletion docs/embeddings/python-api.md
Expand Up @@ -116,7 +116,9 @@ if Collection.exists(db, "entries"):
(embeddings-python-similar)=
## Retrieving similar items

Once you have populated a collection of embeddings you can retrieve the entries that are most similar to a given string using the `similar()` method:
Once you have populated a collection of embeddings you can retrieve the entries that are most similar to a given string using the `similar()` method.

This method uses a brute force approach, calculating distance scores against every document. This is fine for small collections, but will not scale to large collections. See [issue 216](https://github.com/simonw/llm/issues/216) for plans to add a more scalable approach via vector indexes provided by plugins.

```python
for entry in collection.similar("hound"):
Expand Down

0 comments on commit f842fbe

Please sign in to comment.