-
-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial CLI support and plugin hook for embeddings #186
Conversation
Here's an example plugin that works with https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 import llm
from sentence_transformers import SentenceTransformer
@llm.hookimpl
def register_embedding_models(register):
model_id = "sentence-transformers/all-MiniLM-L6-v2"
register(SentenceTransformerModel(model_id, model_id, 384))
class SentenceTransformerModel(llm.EmbeddingModel):
def __init__(self, model_id, model_name, embedding_size):
self.model_id = model_id
self.model_name = model_name
self.embedding_size = embedding_size
self._model = None
def embed(self, text):
if self._model is None:
self._model = SentenceTransformer(self.model_name)
return list(self._model.encode([text])[0]) |
Code is a bit messy, could do with refactoring to reduce duplication with get_model()
I introduced https://github.com/simonw/sqlite-migrate here, because I need a mechanism for ensuring any database passed to I should absolutely port the other LLM migrations to this system too, having two migrations systems is gnarly. |
In order to write tests for this I'm going to build a plugin that only activates during tests, which has a VERY simple embedding mechanism: it returns 16 floats, each float is the length of a word in the input text, truncates to 16 words and fills in with I'll use this pattern: https://til.simonwillison.net/pytest/registering-plugins-in-tests |
I'm going to document this, then land it and then work on |
Refs: