Feature Request: Proposing User-Customizable RAG Integration in llama.cpp: A Path to Enhanced Contextual Retrieval

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

After three months of research and experimentation, I've reached a point where implementing embeddings stored in a `pgvector` column in a PostgreSQL database has proven to be both effective and rewarding. The ability to search by embeddings alone is a game-changer for me. I find it indispensable, even without the processing power of an LLM.

From my understanding, RAG involves the LLM retrieving results from a database—be it files, database column entries, or links—and then generating answers in the context of those retrieved documents. This process is what constitutes a RAG, and I believe I have nearly achieved this in my setup.

What excites me now is the possibility of integrating this RAG functionality directly into the llama.cpp server. I envision it as an external function:

- Users would have the option to enable the RAG feature on command line or in the web UI.
- Users could customize the RAG template through UI options or command line inputs.
- The external function could be defined via a command line parameter, such as:
  ```
  llama-server --rag-function fetch-rag.sh
  ```
- The external function `fetch-rag.sh` would receive the prompt via its standard input.
- It would then return a list of RAG documents through standard output.
- The exact implementation details of the fetching from external source is not the point; the concept is what's important. There are other ways of doing it, important is that user can decide how.
- With the RAG option enabled, the LLM running on the llama-server would receive a new context based on the prompt. The new context is related to list of documents.
- Subsequently, the `llama-server` would proceed with the final inference.
- Users would receive a list of relevant documents and RAG results.

In my specific use case, I rely on database entries and customizable URIs, such as `<a href="hyperscope:123">My document</a>`, which open documents in an external program. Therefore, the RAG input cannot be limited to just files—it must be entirely customizable by the user.

I am eager to contribute US $100 to develop this option, enabling llama.cpp to support user-customizable RAG input. This would ensure that the final output is enriched with context derived from the documents and information gathered before the actual inference.

### Motivation

I am motivated to enhance llama.cpp with user-customizable RAG functionality to leverage the full potential of context-aware retrieval and generation, thereby significantly improving the relevance and accuracy of its outputs.

### Possible Implementation

See above

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Proposing User-Customizable RAG Integration in llama.cpp: A Path to Enhanced Contextual Retrieval #12129

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Proposing User-Customizable RAG Integration in llama.cpp: A Path to Enhanced Contextual Retrieval #12129

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions