-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
After three months of research and experimentation, I've reached a point where implementing embeddings stored in a pgvector column in a PostgreSQL database has proven to be both effective and rewarding. The ability to search by embeddings alone is a game-changer for me. I find it indispensable, even without the processing power of an LLM.
From my understanding, RAG involves the LLM retrieving results from a database—be it files, database column entries, or links—and then generating answers in the context of those retrieved documents. This process is what constitutes a RAG, and I believe I have nearly achieved this in my setup.
What excites me now is the possibility of integrating this RAG functionality directly into the llama.cpp server. I envision it as an external function:
- Users would have the option to enable the RAG feature on command line or in the web UI.
- Users could customize the RAG template through UI options or command line inputs.
- The external function could be defined via a command line parameter, such as:
llama-server --rag-function fetch-rag.sh - The external function
fetch-rag.shwould receive the prompt via its standard input. - It would then return a list of RAG documents through standard output.
- The exact implementation details of the fetching from external source is not the point; the concept is what's important. There are other ways of doing it, important is that user can decide how.
- With the RAG option enabled, the LLM running on the llama-server would receive a new context based on the prompt. The new context is related to list of documents.
- Subsequently, the
llama-serverwould proceed with the final inference. - Users would receive a list of relevant documents and RAG results.
In my specific use case, I rely on database entries and customizable URIs, such as <a href="hyperscope:123">My document</a>, which open documents in an external program. Therefore, the RAG input cannot be limited to just files—it must be entirely customizable by the user.
I am eager to contribute US $100 to develop this option, enabling llama.cpp to support user-customizable RAG input. This would ensure that the final output is enriched with context derived from the documents and information gathered before the actual inference.
Motivation
I am motivated to enhance llama.cpp with user-customizable RAG functionality to leverage the full potential of context-aware retrieval and generation, thereby significantly improving the relevance and accuracy of its outputs.
Possible Implementation
See above