Skip to content

Conversation

@tomaarsen
Copy link

Hello!

Pull Request overview

Details

This PR is a simple extension of the embeddings API draft, with the intention to 1) support local embedding models and 2) inform the PydanticAI maintainers about common usage formats that might want to be incorporated in the overall design.

Local models

Part 1 of these is rather straight-forward: this PR allows users to run any embedding model from https://huggingface.co/models?library=sentence-transformers, like any Qwen, bge, sentence-transformers, EmbeddingGemma, Nomic, Jina, mixedbread, etc. model.
Here's some usage snippets (requires pip install sentence-transformers, torch can be installed with or without GPU support):

import asyncio

from pydantic_ai.embeddings import Embedder
from pydantic_ai.embeddings.sentence_transformers import SentenceTransformerEmbeddingModel

model = SentenceTransformerEmbeddingModel("sentence-transformers/all-MiniLM-L6-v2")
# Try any model from https://huggingface.co/models?library=sentence-transformers

embedder = Embedder(model)


async def main():
    result = await embedder.embed("Hello, world!")
    print(result[:10])

if __name__ == "__main__":
    asyncio.run(main())

And a sample with a cosine similarity call:

import asyncio

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

from pydantic_ai.embeddings import Embedder
from pydantic_ai.embeddings.sentence_transformers import SentenceTransformerEmbeddingModel

model = SentenceTransformerEmbeddingModel('sentence-transformers/all-MiniLM-L6-v2')
# Try any model from https://huggingface.co/models?library=sentence-transformers

embedder = Embedder(model)


async def main():
    # cosine similarity example
    embeddings = await embedder.embed(
        [
            'The weather is lovely today.',
            "It's so sunny outside!",
            'He drove to the stadium.',
        ]
    )
    embeddings_array = np.array(embeddings)
    similarity_matrix = cosine_similarity(embeddings_array)
    print('Cosine Similarity Matrix:')
    print(similarity_matrix)
    """
    Cosine Similarity Matrix:
    [[1.         0.66595534 0.10458399]
     [0.66595534 1.         0.14114465]
     [0.10458399 0.14114465 1.        ]]
    """


if __name__ == '__main__':
    asyncio.run(main())

Personal Recommendation

Part 2 of my goal here doesn't involve this PR much at all. Instead, I want to inform you about how embedding models are commonly used. It wasn't always the case, but nowadays embedding models are almost exclusively used for retrieval. In this setting, users want to embed both queries and documents, and then use an efficient search system/vector database that computes dot product or cosine similarity to find relevant documents given queries (or rather, query embeddings).

In practice, model authors started separating the query and document paths. For example, I'll use two of the most recent big embedding model releases:

These both use special prompts/prefixes for queries, such as Instruct: Given a question, retrieve passages that answer the question\nQuery: , while not using any prompt/prefix for documents. Some models use prompts for both queries and documents, and other models even use fully different underlying models for queries/documents, e.g. https://huggingface.co/MongoDB/mdbr-leaf-ir-asym or https://huggingface.co/jinaai/jina-embeddings-v4.

From what I've seen over the past 2 years, I can almost guarantee you that the next OpenAI embedding model will also require some form of distinction between queries and documents. Cohere's and Voyage's models already do.

Due to this, I think that you should support separate methods for embedding queries and documents, e.g. embed_query and embed_document. The underlying implementation for each provider (OpenAI, Cohere, Voyage, Sentence Transformers) can then use whatever that provider uses to distinguish the input types (e.g. input_type for Cohere and Voyage, encode_query vs encode_document for Sentence Transformers, and twice the same endpoint for OpenAI until they add support for distinctions).

cc @dmontagu @DouweM as you set up the original draft implementation and cc @ggozad as you inspired the structure via https://github.com/ggozad/haiku.rag (P.s. I really like the "slim" approach)

  • Tom Aarsen

@ggozad
Copy link

ggozad commented Nov 18, 2025

+1 on doing this, I imagine you do not need two separate methods, just an optional input_type defaulting to document.
If you wanna make a PR on haiku.rag as well @tomaarsen you are more than welcome :)

@tomaarsen
Copy link
Author

Agreed that two methods aren't strictly necessary, as long as there's a way to distinguish the formats. In Sentence Transformers I chose for 2 methods to try and help users via intellisense/autocomplete and to avoid having to specify a "default" format. I'm all good with e.g. an input_type as well.

  • Tom Aarsen

@DouweM
Copy link
Collaborator

DouweM commented Nov 18, 2025

@tomaarsen Thanks a lot Tom! When @dmontagu and I were drafting this we actually had separate embed_query and embed_document(s) methods initially, but we moved that to a Cohere-specific setting as it was the only one of the APIs we looked at that made that distinction. The context you provide is very helpful, so I'll make a note to restore either the different methods or make it an arg as @ggozad suggested.

@DouweM DouweM merged commit d777138 into pydantic:embeddings-api Nov 18, 2025
25 of 29 checks passed
@DouweM DouweM mentioned this pull request Nov 18, 2025
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants