Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial CLI support and plugin hook for embeddings #186

Merged
merged 14 commits into from Aug 28, 2023
Merged

Initial CLI support and plugin hook for embeddings #186

merged 14 commits into from Aug 28, 2023

Conversation

simonw
Copy link
Owner

@simonw simonw commented Aug 26, 2023

@simonw simonw added enhancement New feature or request plugins labels Aug 26, 2023
@simonw simonw changed the title Embedbing support Embedding support Aug 26, 2023
@simonw
Copy link
Owner Author

simonw commented Aug 26, 2023

Here's an example plugin that works with https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

import llm
from sentence_transformers import SentenceTransformer


@llm.hookimpl
def register_embedding_models(register):
    model_id = "sentence-transformers/all-MiniLM-L6-v2"
    register(SentenceTransformerModel(model_id, model_id, 384))


class SentenceTransformerModel(llm.EmbeddingModel):
    def __init__(self, model_id, model_name, embedding_size):
        self.model_id = model_id
        self.model_name = model_name
        self.embedding_size = embedding_size
        self._model = None

    def embed(self, text):
        if self._model is None:
            self._model = SentenceTransformer(self.model_name)
        return list(self._model.encode([text])[0])

@simonw
Copy link
Owner Author

simonw commented Aug 28, 2023

I introduced https://github.com/simonw/sqlite-migrate here, because I need a mechanism for ensuring any database passed to llm embed -d/--database has the necessary tables - independent of the migrations that create the responses etc tables in the main logs.db database.

I should absolutely port the other LLM migrations to this system too, having two migrations systems is gnarly.

@simonw
Copy link
Owner Author

simonw commented Aug 28, 2023

In order to write tests for this I'm going to build a plugin that only activates during tests, which has a VERY simple embedding mechanism: it returns 16 floats, each float is the length of a word in the input text, truncates to 16 words and fills in with 0 for missing words.

I'll use this pattern: https://til.simonwillison.net/pytest/registering-plugins-in-tests

@simonw
Copy link
Owner Author

simonw commented Aug 28, 2023

I'm going to document this, then land it and then work on llm similar separately.

@simonw simonw marked this pull request as ready for review August 28, 2023 05:20
@simonw simonw changed the title Embedding support Initial CLI support and plugin hook for embeddings Aug 28, 2023
@simonw simonw merged commit 77cf56e into main Aug 28, 2023
20 checks passed
@simonw simonw deleted the embed branch August 28, 2023 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request plugins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant