Skip to content

A lightweight Retrieval-Augmented Generation (RAG) prototype that runs entirely on your local machine.

Notifications You must be signed in to change notification settings

leomino/rag-prototype

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Prototype

A lightweight Retrieval-Augmented Generation (RAG) prototype that runs entirely on your local machine. It exposes three endpoints:

  • /{collection}/upload – Upload a list of plain‑text strings, stores it into the vector-db.
  • /{collection}/upload_pdf – Upload a PDF, extract and chunk its text, stores it into the vector-db.
  • /{collection}/query – Query the vector store with a prompt and receive a response from an LLM.

Architecture

image

The service uses Sentence‑Transformers for embeddings (using 'BAAI/bge-m3' embeddings-model) and Qdrant as the vector store.

Prerequisites

  • Python 3.11+ (recommended within a virtual environment)
  • Qdrant running locally or accessible via http://localhost:6333
  • An LLM service exposing a /completion endpoint (e.g., the bundled llama.cpp server on port 8080).

Installation

pip install -r requirements.txt

Docker

Build the image:

docker build -t rag-prototype .

Run the container (ports 8000 for the endpoints, 8080 for the local LLM):

docker run -d -p 8000:8000 -p 8080:8080 rag-prototype

The API will be available at http://localhost:8000 and the local LLM at http://localhost:8080.

Running the Server (local)

uvicorn rag-server:app --reload

Example Usage

# Upload text
curl -X POST http://localhost:8000/my_collection/upload \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Hello world", "Another document"]}'

# Upload PDF
curl -X POST http://localhost:8000/my_collection/upload_pdf \
  -F file=@sample.pdf

# Query
curl -X POST http://localhost:8000/my_collection/query \
  -H "Content-Type: application/json" \
  -d '{"text": "What is this document about?"}'

License

MIT

About

A lightweight Retrieval-Augmented Generation (RAG) prototype that runs entirely on your local machine.

Resources

Stars

Watchers

Forks