Fully offline Retrieval-Augmented Generation (RAG) with text, image, and audio support. Runs locally with a simple UI (Gradio) and a FastAPI backend. Optional React frontend included.
- Offline-first: after setup, no internet needed
- Multimodal: PDF/DOCX/TXT/MD, images, audio
- Local models: sentence-transformers, CLIP, Whisper, and a local LLM (GGUF)
- Two ways to use: Web UI or REST API
git fork <this repo> # on GitHub, then
git clone https://github.com/<your-username>/rag-offline-chatbot.git
cd rag-offline-chatbot- Python 3.11+
- Node.js 18+ (only if using the React frontend)
- Windows, macOS, or Linux
Windows PowerShell:
python -m venv .venv
.\.venv\Scripts\Activate
pip install -r backend\requirements.txtmacOS/Linux:
python3 -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txtOptional: download models automatically (first run needs internet):
python scripts/download_models.pyThe default config is in backend/config.yaml:
model_backend: llama_cpp
model_path: ./models/llm/mistral-7b-instruct-v0.2.Q4_K_M.gguf
top_k: 5
max_tokens: 512
temperature: 0.2Place your GGUF LLM at backend/models/llm/ or update model_path.
Gradio UI (recommended for first run):
python main.py --interface gradio --port 7860Open http://localhost:7860.
FastAPI API server:
python main.py --interface fastapi --port 8000Docs at http://localhost:8000/docs.
Frontend (optional React app):
cd frontend
npm install
npm run dev
# open http://localhost:5173Using Compose (backend + vite preview):
docker compose up --build
# Backend on 8000, frontend on 5173Model folders on the host are mounted from ./backend/models.
- Web UI: upload files, then ask a question
- REST API examples:
Upload a file:
curl -F "file=@document.pdf" http://localhost:8000/ingestAsk a question:
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"query":"What is the main topic?"}'Rebuild index (if metadata present):
curl -X POST http://localhost:8000/index/rebuildStatus:
curl http://localhost:8000/statusbackend/
app/ # API, ingestion, embeddings, retriever, vector store
models/ # llm/, embeddings/, clip/, whisper/
storage/ # faiss index + metadata
config.yaml
frontend/ # optional React UI (vite)
main.py # entrypoint (gradio or fastapi)
cli.py # interactive CLI
- "FAISS not found" on Windows: ensure
faiss-cpuinstalled fromrequirements.txtinside venv - Torch/CPU slow: try a smaller GGUF or enable CUDA if available
- Port already in use: change
--portor stop the conflicting app - No answers: check you ingested files and that
backend/storageis writable - Model not found: verify
backend/config.yamlmodel_path
- Fork → create branch → commit
- Run tests:
pytest -v(inside venv) - Open a PR
MIT — see LICENSE