Skip to content

Conversation

franciscojavierarceo
Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo commented Aug 2, 2025

What does this PR do?

This PR adds support for for Chunks in the new OpenAI Vector Stores API. In particular, it adds the following APIs:

  • @webmethod(route="/openai/v1/vector_stores/{vector_store_id}/files/{file_id}/chunks", method="GET")
  • @webmethod(route="/openai/v1/vector_stores/{vector_store_id}/files/{file_id}/chunks/{chunk_id}", method="GET")
  • @webmethod(route="/openai/v1/vector_stores/{vector_store_id}/files/{file_id}/chunks/{chunk_id}", method="POST")
  • @webmethod(route="/openai/v1/vector_stores/{vector_store_id}/files/{file_id}/chunks/{chunk_id}", method="DELETE")

It's worth noting that these APIs aren't actually available from OpenAI but they are consistent with them and are likely what they expose internally (but that's just speculation on my part).

As mentioned in this issue, this is needed for supporting the ingestion of precomputed embeddings, similar to what's available with VectorIO today.

See this example here that is in use at Red Hat: https://github.com/opendatahub-io/rag/tree/main/demos/kfp/docling/pdf-conversion

I enabled the ingestion of precomputed embeddings in #2317, which has been used by a number of our customers via VectorIO.insert(). This would give us feature parity and be consistent with OpenAI's naming conventions.

Other thoughts

I also have a PoC of how this can be exposed in the UI. A screenshot is available below:

Screenshot 2025-08-01 at 11 24 56 PM

Closes #3021

Test Plan

Unit tests added.

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 2, 2025
franciscojavierarceo and others added 2 commits August 1, 2025 23:22
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
@mattf
Copy link
Collaborator

mattf commented Aug 2, 2025

discussion of this is happening on #2981

@franciscojavierarceo
Copy link
Collaborator Author

@mattf it sounds like we're open to this approach now? I'd need to update it a bit to handle the extras with the openai client and add some tests. If so, let me know and I'll do so.

@mattf
Copy link
Collaborator

mattf commented Sep 20, 2025

@mattf it sounds like we're open to this approach now? I'd need to update it a bit to handle the extras with the openai client and add some tests. If so, let me know and I'll do so.

we should have api extensions. my concern is about the design of this extension.

there are two use cases here -

  1. (vector-stores) let llama stack be intelligent about how uploaded files are chunked, embedded and queried
  2. (vector-dbs + vector-io) use llama stack as a consistent, portable interface to a vector db, where the user chunks and embeds their files

(2) seems useful when -

  • llama stack does a poor job at (1)
  • the user is doing research on rag pipelines

the error-prone risk seems very high when a vector store is used in both cases at the same time, e.g. user mixes type (1) processing with type (2) chunks & embeddings.

blending these two use cases under one endpoint increases the error risk.

instead of merging vector-dbs and vector-io into vector-stores, what if vector-dbs & vector-io get merged and placed at an endpoint that differentiates its use case from the vector-stores case? in the process, the embedding_model/dimension fields on vector-dbs should be removed, and the embedding should be made required on insert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create GET (+list), UPDATE, and DELETE APiIs for Chunks in Vector Stores API
2 participants