Extension of embeddings draft implementation to support local models #3463

tomaarsen · 2025-11-18T09:01:45Z

Hello!

Pull Request overview

Add local model support via Sentence Transformers on top of Support embeddings models #3252

Details

This PR is a simple extension of the embeddings API draft, with the intention to 1) support local embedding models and 2) inform the PydanticAI maintainers about common usage formats that might want to be incorporated in the overall design.

Local models

Part 1 of these is rather straight-forward: this PR allows users to run any embedding model from https://huggingface.co/models?library=sentence-transformers, like any Qwen, bge, sentence-transformers, EmbeddingGemma, Nomic, Jina, mixedbread, etc. model.
Here's some usage snippets (requires pip install sentence-transformers, torch can be installed with or without GPU support):

import asyncio

from pydantic_ai.embeddings import Embedder
from pydantic_ai.embeddings.sentence_transformers import SentenceTransformerEmbeddingModel

model = SentenceTransformerEmbeddingModel("sentence-transformers/all-MiniLM-L6-v2")
# Try any model from https://huggingface.co/models?library=sentence-transformers

embedder = Embedder(model)


async def main():
    result = await embedder.embed("Hello, world!")
    print(result[:10])

if __name__ == "__main__":
    asyncio.run(main())

And a sample with a cosine similarity call:

import asyncio

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

from pydantic_ai.embeddings import Embedder
from pydantic_ai.embeddings.sentence_transformers import SentenceTransformerEmbeddingModel

model = SentenceTransformerEmbeddingModel('sentence-transformers/all-MiniLM-L6-v2')
# Try any model from https://huggingface.co/models?library=sentence-transformers

embedder = Embedder(model)


async def main():
    # cosine similarity example
    embeddings = await embedder.embed(
        [
            'The weather is lovely today.',
            "It's so sunny outside!",
            'He drove to the stadium.',
        ]
    )
    embeddings_array = np.array(embeddings)
    similarity_matrix = cosine_similarity(embeddings_array)
    print('Cosine Similarity Matrix:')
    print(similarity_matrix)
    """
    Cosine Similarity Matrix:
    [[1.         0.66595534 0.10458399]
     [0.66595534 1.         0.14114465]
     [0.10458399 0.14114465 1.        ]]
    """


if __name__ == '__main__':
    asyncio.run(main())

Personal Recommendation

Part 2 of my goal here doesn't involve this PR much at all. Instead, I want to inform you about how embedding models are commonly used. It wasn't always the case, but nowadays embedding models are almost exclusively used for retrieval. In this setting, users want to embed both queries and documents, and then use an efficient search system/vector database that computes dot product or cosine similarity to find relevant documents given queries (or rather, query embeddings).

In practice, model authors started separating the query and document paths. For example, I'll use two of the most recent big embedding model releases:

These both use special prompts/prefixes for queries, such as Instruct: Given a question, retrieve passages that answer the question\nQuery: , while not using any prompt/prefix for documents. Some models use prompts for both queries and documents, and other models even use fully different underlying models for queries/documents, e.g. https://huggingface.co/MongoDB/mdbr-leaf-ir-asym or https://huggingface.co/jinaai/jina-embeddings-v4.

From what I've seen over the past 2 years, I can almost guarantee you that the next OpenAI embedding model will also require some form of distinction between queries and documents. Cohere's and Voyage's models already do.

Due to this, I think that you should support separate methods for embedding queries and documents, e.g. embed_query and embed_document. The underlying implementation for each provider (OpenAI, Cohere, Voyage, Sentence Transformers) can then use whatever that provider uses to distinguish the input types (e.g. input_type for Cohere and Voyage, encode_query vs encode_document for Sentence Transformers, and twice the same endpoint for OpenAI until they add support for distinctions).

cc @dmontagu @DouweM as you set up the original draft implementation and cc @ggozad as you inspired the structure via https://github.com/ggozad/haiku.rag (P.s. I really like the "slim" approach)

Tom Aarsen

ggozad · 2025-11-18T14:56:40Z

+1 on doing this, I imagine you do not need two separate methods, just an optional input_type defaulting to document.
If you wanna make a PR on haiku.rag as well @tomaarsen you are more than welcome :)

tomaarsen · 2025-11-18T15:00:20Z

Agreed that two methods aren't strictly necessary, as long as there's a way to distinguish the formats. In Sentence Transformers I chose for 2 methods to try and help users via intellisense/autocomplete and to avoid having to specify a "default" format. I'm all good with e.g. an input_type as well.

Tom Aarsen

DouweM · 2025-11-18T15:21:36Z

@tomaarsen Thanks a lot Tom! When @dmontagu and I were drafting this we actually had separate embed_query and embed_document(s) methods initially, but we moved that to a Cohere-specific setting as it was the only one of the APIs we looked at that made that distinction. The context you provide is very helpful, so I'll make a note to restore either the different methods or make it an arg as @ggozad suggested.

Add local embedding models via Sentence Transformers

c8366a7

DouweM merged commit d777138 into pydantic:embeddings-api Nov 18, 2025
25 of 29 checks passed

DouweM mentioned this pull request Nov 18, 2025

Support embeddings models #3252

Draft

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extension of embeddings draft implementation to support local models #3463

Extension of embeddings draft implementation to support local models #3463

tomaarsen commented Nov 18, 2025

Uh oh!

ggozad commented Nov 18, 2025

Uh oh!

tomaarsen commented Nov 18, 2025

Uh oh!

DouweM commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Extension of embeddings draft implementation to support local models #3463

Extension of embeddings draft implementation to support local models #3463

Conversation

tomaarsen commented Nov 18, 2025

Pull Request overview

Details

Local models

Personal Recommendation

Uh oh!

ggozad commented Nov 18, 2025

Uh oh!

tomaarsen commented Nov 18, 2025

Uh oh!

DouweM commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants