Skip to content

Feature Request: Proposing User-Customizable RAG Integration in llama.cpp: A Path to Enhanced Contextual Retrieval #12129

@gnusupport

Description

@gnusupport

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

After three months of research and experimentation, I've reached a point where implementing embeddings stored in a pgvector column in a PostgreSQL database has proven to be both effective and rewarding. The ability to search by embeddings alone is a game-changer for me. I find it indispensable, even without the processing power of an LLM.

From my understanding, RAG involves the LLM retrieving results from a database—be it files, database column entries, or links—and then generating answers in the context of those retrieved documents. This process is what constitutes a RAG, and I believe I have nearly achieved this in my setup.

What excites me now is the possibility of integrating this RAG functionality directly into the llama.cpp server. I envision it as an external function:

  • Users would have the option to enable the RAG feature on command line or in the web UI.
  • Users could customize the RAG template through UI options or command line inputs.
  • The external function could be defined via a command line parameter, such as:
    llama-server --rag-function fetch-rag.sh
    
  • The external function fetch-rag.sh would receive the prompt via its standard input.
  • It would then return a list of RAG documents through standard output.
  • The exact implementation details of the fetching from external source is not the point; the concept is what's important. There are other ways of doing it, important is that user can decide how.
  • With the RAG option enabled, the LLM running on the llama-server would receive a new context based on the prompt. The new context is related to list of documents.
  • Subsequently, the llama-server would proceed with the final inference.
  • Users would receive a list of relevant documents and RAG results.

In my specific use case, I rely on database entries and customizable URIs, such as <a href="hyperscope:123">My document</a>, which open documents in an external program. Therefore, the RAG input cannot be limited to just files—it must be entirely customizable by the user.

I am eager to contribute US $100 to develop this option, enabling llama.cpp to support user-customizable RAG input. This would ensure that the final output is enriched with context derived from the documents and information gathered before the actual inference.

Motivation

I am motivated to enhance llama.cpp with user-customizable RAG functionality to leverage the full potential of context-aware retrieval and generation, thereby significantly improving the relevance and accuracy of its outputs.

Possible Implementation

See above

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions