| title | Rag Project Assistant |
|---|---|
| emoji | 🔥 |
| colorFrom | red |
| colorTo | red |
| sdk | docker |
| pinned | false |
| short_description | RAG (Retrieval-Augmented Generation) system that answers que |
Conceptual illustration — see Demo below for the actual terminal interface.
A local RAG (Retrieval-Augmented Generation) chatbot for querying personal project documentation. Built to replace hallucination-prone LLM responses with grounded answers extracted from actual project source code and documentation.
Current state: Fully deployed. FastAPI backend on Hugging Face Spaces (Docker), Groq inference (Llama 3.3 70B), IP-based rate limiting, and a vanilla JS chat widget embedded at emirhanyildirim.me. CLI mode still available locally.
- Replace Ollama with Groq API (Llama 3.3 70B) for cloud inference
- Expose
/chatPOST endpoint via FastAPI - Add IP-based rate limiting (
slowapi) - Deploy backend to Hugging Face Spaces (Docker SDK)
- Build vanilla JS chat widget for portfolio site integration
- Embed widget into emirhanyildirim.me (Jekyll / GitHub Pages)

documents/ ← prepared knowledge base
├── *_README.md ← project README files
├── *_code_context.txt ← AST-extracted code structure
└── *_highlights.txt ← curated project summaries & rationale
prepare_docs.py ← knowledge base builder
build_index.py ← builds ChromaDB index (run at Docker build time)
rag_chatbot.py ← retrieval + generation core (shared by CLI and API)
app.py ← FastAPI service (POST /query, rate limiting)
chroma_db/ ← persistent vector index
Dockerfile ← multi-stage build, pre-builds index, exposes :7860
Pipeline:
prepare_docs.pyextracts structured context from each projectbuild_index.pyembeds chunks withall-MiniLM-L6-v2and stores in ChromaDB (runs at Docker build time)- At query time: project keyword detection → metadata-filtered retrieval → Groq generation with strict grounding prompt
- Response includes
answer,sources(list of source files), andlow_confidenceflag
Uses Python ast module (not regex) to extract per-file:
- Module, class, and function docstrings
- Function signatures with type annotations and return types
ALL_CAPSmodule-level constants (configuration values)- Inline comments
README.mdandhighlights.txtcopied as-is
Regex-based extraction was discarded because it misattributed multi-line string literals as docstrings and could not reconstruct function signatures reliably.
Embedding model: sentence-transformers/all-MiniLM-L6-v2
Vector store: ChromaDB (persistent, SQLite-backed)
Chunk size: 200 words, 40-word overlap
Similarity threshold: L2 distance < 1.40 passes; above this a low-confidence warning is shown and the LLM is still invoked but the user is alerted. Threshold was calibrated empirically: all-MiniLM-L6-v2 L2 distances in the 1.0–1.3 range correspond to topically related but not directly answering chunks; distances above 1.5 are typically off-topic.
Project-scoped metadata filtering: Each chunk is indexed with a project metadata field derived from its filename prefix. When a query mentions a known project by name or keyword, ChromaDB's $eq filter restricts retrieval to that project's chunks only. Without this, semantically similar chunks from other projects contaminate the context — e.g. a question about Listing Pilot's Telegram integration would retrieve WBC Analyzer's GPT-4o API discussion because both involve external API calls.
Fallback: if the filtered query returns no results (project has few chunks), the filter is dropped and a full-corpus search runs.
You are an assistant that answers questions about the developer's own projects.
The context below comes from that project's documentation and source code.
Rules:
1. Answer using information present in the context, including reasonable direct
inferences (e.g. if a table lists architectures tested, you can state which
ones were used).
2. If the context genuinely contains no relevant information, say exactly:
'I don't have this information.' and stop.
3. Do not invent facts not supported by the context.
4. Mention which project or file the information comes from.
5. Be concise and precise. Answer in the same language as the question.
Two source types with different roles:
| Source | Content | Answers |
|---|---|---|
_code_context.txt |
AST-extracted signatures, docstrings, constants | Implementation questions ("how does X work") |
_highlights.txt |
Curated summaries, design rationale, known limitations | Motivation questions ("why was X chosen") |
_highlights.txt is hand-written per project. This is intentional: motivation and architectural decisions are rarely in source code comments. The system is a hybrid — automatic extraction for structure, curated content for rationale. This distinction matters: the system does not "understand" code; it retrieves the most relevant pre-extracted or pre-written text and grounds the LLM's response to it.
Source attribution errors at chunk boundaries: The LLM sometimes names the wrong file as the source when the relevant information spans a chunk boundary. The chunk metadata records the source file, but a function defined in utils.py may appear in a chunk whose surrounding text is from a runner script. Not critical for Q&A accuracy but worth noting if precise attribution is required.
Keyword-based project detection: Project filtering relies on a static keyword list. Misspelled project names or paraphrased references are not caught. A more robust approach would use semantic similarity against project name embeddings, but the current approach is sufficient for the intended use case.
Cloud inference dependency: Generation requires a valid GROQ_API_KEY. If the Groq API is unreachable, the endpoint returns 503.
Rate limiting: The public API is limited to 3 requests per hour per IP via slowapi. Intended for the portfolio widget use case — not suitable for bulk querying.
pip install -r requirements.txt
# Set up API key
cp .env.example .env
# Edit .env and add your GROQ_API_KEY
# Build knowledge base (run once, or when projects change)
python prepare_docs.py
# Build ChromaDB index
python build_index.py
# Start CLI chatbot
python rag_chatbot.py
# Or start FastAPI server
fastapi dev app.pyLLM: Groq API with llama-3.3-70b-versatile. Requires GROQ_API_KEY in .env.
Live endpoint: https://frissonitte-rag-project-assistant.hf.space/query
curl -X POST https://frissonitte-rag-project-assistant.hf.space/query \
-H "Content-Type: application/json" \
-d '{"query": "What is WBC Analyzer?"}'Response:
{
"answer": "...",
"sources": ["wbc-analyzer_README.md"],
"low_confidence": false
}Rate limit: 3 requests/hour per IP.
| Command | Description |
|---|---|
/reindex |
Rebuild ChromaDB index from current documents/ contents |
/info |
Show active model, index size |
/help |
List commands |
/exit |
Quit |
- WBC Analyzer — DenseNet121-based WBC classification with OOD adaptation pipeline
- Scalable Kinematic Action Recognition for Industry 5.0 — End-to-end action recognition on 10GB motion-capture data with streaming drift detection
- Listing Pilot — Appium automation suite for C2C marketplace listing management
- Popcorn Wagon — Hybrid movie recommender (SVD + Annoy + TMDB)
- Portal Cleaner Ultimate — RPA desktop suite for ERP workflow automation
