CodeBase RAG is a FastAPI service that lets you chat with any public GitHub repository.
It works in two phases:
- Ingest a repository: clone -> parse code with Tree-sitter -> embed chunks -> build/persist a FAISS index.
- Query the indexed repository: retrieve relevant chunks and stream a grounded LLM response over SSE.
- Clones a public GitHub repository on demand.
- Extracts semantic code chunks (functions, classes, methods) using Tree-sitter.
- Builds a local FAISS vector index for semantic retrieval.
- Supports two chat modes:
ask: retrieval-augmented Q&A over relevant chunks.summary: map-reduce architecture summary of the repository.
- Streams responses as Server-Sent Events (SSE) for token-by-token UI rendering.
- API: FastAPI, Uvicorn
- LLM orchestration: LangChain
- LLM provider: Google Gemini (
langchain-google-genai) - Embeddings:
sentence-transformers/all-MiniLM-L6-v2 - Vector store: FAISS (CPU)
- Parsing: Tree-sitter (Python, JS/TS, Go, Java, Rust, C/C++)
- Git access: GitPython
Client
|
| POST /ingest (repo URL)
v
FastAPI (app/main.py)
-> Ingestor (app/ingestor.py)
-> Clone repo
-> Collect source files
-> Parse into CodeChunk objects (app/tree_sitter_parser.py)
-> Build + persist FAISS index (app/vector_store.py)
Client
|
| POST /chat (session_id, query, mode)
v
FastAPI SSE endpoint (app/main.py)
-> RAG chain (app/rag_chain.py)
-> Load FAISS index
-> Retrieve docs (ask) or all docs (summary)
-> Call Gemini and stream tokens
Dockerfile
docker-compose.yaml
requirements.txt
app/
config.py # environment-based settings
schemas.py # request/response + chunk models
tree_sitter_parser.py # semantic code chunk extraction
vector_store.py # embeddings + FAISS persistence/search
ingestor.py # end-to-end ingestion pipeline
rag_chain.py # ask + summary chain logic
main.py # FastAPI app + routes + SSE
Implemented in app/ingestor.py:
- Computes deterministic
session_idfrom repo URL. - Clones repo shallowly (
depth=1) intoREPO_CLONE_BASE/<session_id>. - Filters indexable source files by extension and max file size.
- Parses each file via
parse_file(...)fromapp/tree_sitter_parser.py. - Builds FAISS index via
build_index(...)inapp/vector_store.py. - Deletes cloned repository directory after index persistence.
Output: IngestResponse with:
session_idfiles_processedchunks_indexed
Implemented in app/tree_sitter_parser.py:
- Uses extension -> grammar registry for supported languages.
- Extracts top-level definitions and methods where possible.
- Adds metadata per chunk:
- file path
- language
- symbol name
- kind (
function,class,method,module) - start/end lines
- Splits oversized chunks with overlap to stay within token limits.
- Falls back to line-based splitting when grammar is missing or no defs are extracted.
Implemented in app/vector_store.py:
- Lazily loads a singleton HuggingFace embedding model.
- Converts each
CodeChunkto a LangChainDocument. - Persists FAISS index under
FAISS_STORE_BASE/<session_id>. - Loads existing index for chat retrieval.
Implemented in app/main.py and app/rag_chain.py:
askmode:- loads index
- runs similarity search (
top_k) - builds grounded context block
- streams Gemini answer token-by-token
summarymode:- retrieves broad set of chunks from index
- map phase: per-chunk summaries
- reduce phase: single architecture overview
- streams final output
SSE events are sent as:
data: {"token": "..."}- final:
data: {"done": true}
- Python 3.12+
- Git installed and available on PATH
- Google Gemini API key
Optional:
- Docker + Docker Compose
- Create and activate virtual environment.
python -m venv .venv
.\.venv\Scripts\Activate.ps1- Install dependencies.
pip install -r requirements.txt- Create
.envfrom.env.exampleand set your API key.
GOOGLE_API_KEY=your-API-key- Start API server.
uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 1- Verify service health.
curl http://localhost:8000/healthExpected response:
{"status":"ok"}- Create
.envfrom.env.exampleand setGOOGLE_API_KEY. - Build and run.
docker compose up --build- Check health endpoint.
curl http://localhost:8000/healthNotes:
- FAISS data is persisted in Docker volume
faiss_data. - Default in-container paths:
REPO_CLONE_BASE=/tmp/codebase_rag_reposFAISS_STORE_BASE=/tmp/codebase_rag_stores
Interactive docs:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
curl -X POST http://localhost:8000/ingest \
-H "Content-Type: application/json" \
-d '{"repo_url":"https://github.com/psf/requests"}'Example response:
{
"session_id": "requests-1a2b3c4d5e6f7890",
"files_processed": 123,
"chunks_indexed": 487
}curl -N -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{
"session_id":"requests-1a2b3c4d5e6f7890",
"query":"How does retry logic work?",
"mode":"ask",
"top_k":6
}'query is required by the request schema, even in summary mode. Use a placeholder value.
curl -N -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{
"session_id":"requests-1a2b3c4d5e6f7890",
"query":"summary",
"mode":"summary",
"top_k":6
}'All settings are in app/config.py and can be overridden with environment variables.
Important variables:
GOOGLE_API_KEY: Gemini API key (required)LLM_MODEL_NAME: defaultgemini-2.0-flashLLM_TEMPERATURE: default0.1LLM_MAX_OUTPUT_TOKENS: default4096EMBEDDING_MODEL_NAME: defaultsentence-transformers/all-MiniLM-L6-v2DEFAULT_TOP_K: default6REPO_CLONE_BASE: local temp clone dirFAISS_STORE_BASE: persistent FAISS index dirMAX_FILE_SIZE_BYTES: per-file indexing limit (default 512000)MAX_CHUNK_CHARS: semantic chunk size cap (default 8000)FALLBACK_CHUNK_CHARS: fallback splitter size (default 2000)FALLBACK_CHUNK_OVERLAP: fallback overlap (default 200)
Configured via indexable_extensions in app/config.py:
.py,.js,.jsx,.ts,.tsx.go,.java,.rs.c,.cpp,.h,.hpp.cs,.rb,.php,.swift,.kt,.scala
Tree-sitter grammars are currently wired for:
- Python, JavaScript, TypeScript, TSX, Go, Java, Rust, C, C++
If an indexed file extension has no grammar, fallback splitting is used.
- Session IDs are deterministic per repo URL. Re-ingesting the same URL reuses the same session directory and rebuilds the index.
- The clone directory is deleted after ingestion completes.
- FAISS index directory is retained for later chat requests.
- Embedding model is pre-warmed on app startup to reduce first-ingest latency.
- CORS is currently open (
allow_origins=["*"]) for easier local integration.
- Confirm repository URL is public and reachable.
- Verify Git is installed and accessible in runtime environment.
- Repository may not contain configured extensions.
- File sizes may exceed
MAX_FILE_SIZE_BYTES.
- Ensure
/ingestcompleted successfully. - Confirm you are using the exact returned
session_id.
- Ensure
GOOGLE_API_KEYis set correctly. - Check model name and quota limits for your Google AI account.
- First run may download embedding model weights.
- Keep container running so model cache is reused.
- Start with a smaller repository to validate end-to-end behavior quickly.
- Use
/docsto experiment with request payloads. - Keep
top_kmoderate (4-8) for a good quality/latency tradeoff.