A lightweight Retrieval-Augmented Generation (RAG) prototype that runs entirely on your local machine. It exposes three endpoints:
/{collection}/upload– Upload a list of plain‑text strings, stores it into the vector-db./{collection}/upload_pdf– Upload a PDF, extract and chunk its text, stores it into the vector-db./{collection}/query– Query the vector store with a prompt and receive a response from an LLM.
The service uses Sentence‑Transformers for embeddings (using 'BAAI/bge-m3' embeddings-model) and Qdrant as the vector store.
- Python 3.11+ (recommended within a virtual environment)
- Qdrant running locally or accessible via
http://localhost:6333 - An LLM service exposing a
/completionendpoint (e.g., the bundledllama.cppserver on port 8080).
pip install -r requirements.txtBuild the image:
docker build -t rag-prototype .Run the container (ports 8000 for the endpoints, 8080 for the local LLM):
docker run -d -p 8000:8000 -p 8080:8080 rag-prototypeThe API will be available at http://localhost:8000 and the local LLM at http://localhost:8080.
uvicorn rag-server:app --reload# Upload text
curl -X POST http://localhost:8000/my_collection/upload \
-H "Content-Type: application/json" \
-d '{"texts": ["Hello world", "Another document"]}'
# Upload PDF
curl -X POST http://localhost:8000/my_collection/upload_pdf \
-F file=@sample.pdf
# Query
curl -X POST http://localhost:8000/my_collection/query \
-H "Content-Type: application/json" \
-d '{"text": "What is this document about?"}'MIT