Skip to content

[AI Bundle][Store] Document Indexing Pipeline #429

@chr-hertel

Description

@chr-hertel

A common scenario is to

  • load documents from a specific source (e.g. folder of PDFs, set of entities, URLs, ...)
  • transform them (e.g. split into chunks, normalize language, summarize, ...)
  • vectorize them with an embedding model
  • store them in a vector store
  • query for them incl. keeping track of original sources while retrieval

A configuration for this could for example look like this:

ai:
    indexer:
         my_indexer:
              loader: 'service_id'
              source: 'string' or ['array', 'of', 'string']
              transformer:
                  - 'service_id'
                  - 'service_id'
              embeddings:
                   class: 'Symfony\AI\Platform\Bridge\OpenAi\Embeddings'
                   name: 'text-embedding-ada-002'
              store: 'ai.store.chroma_db.my_store'

And we could also add a command here:

$ bin/console ai:store:index my_indexer

This should sit on top / rework

  • Symfony\AI\Store\Document\LoaderInterface
  • Symfony\AI\Store\Document\TransformerInterface
  • Symfony\AI\Store\Document\Vectorizer
  • Symfony\AI\Store\Indexer

Metadata

Metadata

Assignees

No one assigned

    Labels

    AI BundleIssues & PRs about the AI integration bundleStoreIssues & PRs about the AI Store component

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions