CodeBase RAG

CodeBase RAG is a FastAPI service that lets you chat with any public GitHub repository.

It works in two phases:

Ingest a repository: clone -> parse code with Tree-sitter -> embed chunks -> build/persist a FAISS index.
Query the indexed repository: retrieve relevant chunks and stream a grounded LLM response over SSE.

What This Project Does

Clones a public GitHub repository on demand.
Extracts semantic code chunks (functions, classes, methods) using Tree-sitter.
Builds a local FAISS vector index for semantic retrieval.
Supports two chat modes:
- ask: retrieval-augmented Q&A over relevant chunks.
- summary: map-reduce architecture summary of the repository.
Streams responses as Server-Sent Events (SSE) for token-by-token UI rendering.

Tech Stack

API: FastAPI, Uvicorn
LLM orchestration: LangChain
LLM provider: Google Gemini (langchain-google-genai)
Embeddings: sentence-transformers/all-MiniLM-L6-v2
Vector store: FAISS (CPU)
Parsing: Tree-sitter (Python, JS/TS, Go, Java, Rust, C/C++)
Git access: GitPython

High-Level Architecture

Client
  |
  | POST /ingest (repo URL)
  v
FastAPI (app/main.py)
  -> Ingestor (app/ingestor.py)
     -> Clone repo
     -> Collect source files
     -> Parse into CodeChunk objects (app/tree_sitter_parser.py)
     -> Build + persist FAISS index (app/vector_store.py)

Client
  |
  | POST /chat (session_id, query, mode)
  v
FastAPI SSE endpoint (app/main.py)
  -> RAG chain (app/rag_chain.py)
     -> Load FAISS index
     -> Retrieve docs (ask) or all docs (summary)
     -> Call Gemini and stream tokens

Project Structure

Dockerfile
docker-compose.yaml
requirements.txt
app/
  config.py              # environment-based settings
  schemas.py             # request/response + chunk models
  tree_sitter_parser.py  # semantic code chunk extraction
  vector_store.py        # embeddings + FAISS persistence/search
  ingestor.py            # end-to-end ingestion pipeline
  rag_chain.py           # ask + summary chain logic
  main.py                # FastAPI app + routes + SSE

How It Works Internally

1. Ingestion (`POST /ingest`)

Implemented in app/ingestor.py:

Computes deterministic session_id from repo URL.
Clones repo shallowly (depth=1) into REPO_CLONE_BASE/<session_id>.
Filters indexable source files by extension and max file size.
Parses each file via parse_file(...) from app/tree_sitter_parser.py.
Builds FAISS index via build_index(...) in app/vector_store.py.
Deletes cloned repository directory after index persistence.

Output: IngestResponse with:

session_id
files_processed
chunks_indexed

2. Parsing and Chunking

Implemented in app/tree_sitter_parser.py:

Uses extension -> grammar registry for supported languages.
Extracts top-level definitions and methods where possible.
Adds metadata per chunk:
- file path
- language
- symbol name
- kind (function, class, method, module)
- start/end lines
Splits oversized chunks with overlap to stay within token limits.
Falls back to line-based splitting when grammar is missing or no defs are extracted.

3. Embedding and Index Persistence

Implemented in app/vector_store.py:

Lazily loads a singleton HuggingFace embedding model.
Converts each CodeChunk to a LangChain Document.
Persists FAISS index under FAISS_STORE_BASE/<session_id>.
Loads existing index for chat retrieval.

4. Querying (`POST /chat`) via SSE

Implemented in app/main.py and app/rag_chain.py:

ask mode:
- loads index
- runs similarity search (top_k)
- builds grounded context block
- streams Gemini answer token-by-token
summary mode:
- retrieves broad set of chunks from index
- map phase: per-chunk summaries
- reduce phase: single architecture overview
- streams final output

SSE events are sent as:

data: {"token": "..."}
final: data: {"done": true}

Start Guide

Prerequisites

Python 3.12+
Git installed and available on PATH
Google Gemini API key

Optional:

Docker + Docker Compose

Option A: Run Locally (Python)

Create and activate virtual environment.

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Install dependencies.

pip install -r requirements.txt

Create .env from .env.example and set your API key.

GOOGLE_API_KEY=your-API-key

Start API server.

uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 1

Verify service health.

curl http://localhost:8000/health

Expected response:

{"status":"ok"}

Option B: Run with Docker Compose

Create .env from .env.example and set GOOGLE_API_KEY.
Build and run.

docker compose up --build

Check health endpoint.

curl http://localhost:8000/health

Notes:

FAISS data is persisted in Docker volume faiss_data.
Default in-container paths:
- REPO_CLONE_BASE=/tmp/codebase_rag_repos
- FAISS_STORE_BASE=/tmp/codebase_rag_stores

API Usage

Interactive docs:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

1. Ingest a repository

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"repo_url":"https://github.com/psf/requests"}'

Example response:

{
  "session_id": "requests-1a2b3c4d5e6f7890",
  "files_processed": 123,
  "chunks_indexed": 487
}

2. Ask questions about code (`ask` mode)

curl -N -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "session_id":"requests-1a2b3c4d5e6f7890",
    "query":"How does retry logic work?",
    "mode":"ask",
    "top_k":6
  }'

3. Generate architecture summary (`summary` mode)

query is required by the request schema, even in summary mode. Use a placeholder value.

curl -N -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "session_id":"requests-1a2b3c4d5e6f7890",
    "query":"summary",
    "mode":"summary",
    "top_k":6
  }'

Configuration

All settings are in app/config.py and can be overridden with environment variables.

Important variables:

GOOGLE_API_KEY: Gemini API key (required)
LLM_MODEL_NAME: default gemini-2.0-flash
LLM_TEMPERATURE: default 0.1
LLM_MAX_OUTPUT_TOKENS: default 4096
EMBEDDING_MODEL_NAME: default sentence-transformers/all-MiniLM-L6-v2
DEFAULT_TOP_K: default 6
REPO_CLONE_BASE: local temp clone dir
FAISS_STORE_BASE: persistent FAISS index dir
MAX_FILE_SIZE_BYTES: per-file indexing limit (default 512000)
MAX_CHUNK_CHARS: semantic chunk size cap (default 8000)
FALLBACK_CHUNK_CHARS: fallback splitter size (default 2000)
FALLBACK_CHUNK_OVERLAP: fallback overlap (default 200)

Supported Source Extensions

Configured via indexable_extensions in app/config.py:

.py, .js, .jsx, .ts, .tsx
.go, .java, .rs
.c, .cpp, .h, .hpp
.cs, .rb, .php, .swift, .kt, .scala

Tree-sitter grammars are currently wired for:

Python, JavaScript, TypeScript, TSX, Go, Java, Rust, C, C++

If an indexed file extension has no grammar, fallback splitting is used.

Operational Notes

Session IDs are deterministic per repo URL. Re-ingesting the same URL reuses the same session directory and rebuilds the index.
The clone directory is deleted after ingestion completes.
FAISS index directory is retained for later chat requests.
Embedding model is pre-warmed on app startup to reduce first-ingest latency.
CORS is currently open (allow_origins=["*"]) for easier local integration.

Troubleshooting

`500 Ingestion failed` or clone errors

Confirm repository URL is public and reachable.
Verify Git is installed and accessible in runtime environment.

`No indexable source files found`

Repository may not contain configured extensions.
File sizes may exceed MAX_FILE_SIZE_BYTES.

`No FAISS index found for session`

Ensure /ingest completed successfully.
Confirm you are using the exact returned session_id.

LLM errors / empty responses

Ensure GOOGLE_API_KEY is set correctly.
Check model name and quota limits for your Google AI account.

Slow startup or first request latency

First run may download embedding model weights.
Keep container running so model cache is reused.

Development Tips

Start with a smaller repository to validate end-to-end behavior quickly.
Use /docs to experiment with request payloads.
Keep top_k moderate (4-8) for a good quality/latency tradeoff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeBase RAG

What This Project Does

Tech Stack

High-Level Architecture

Project Structure

How It Works Internally

1. Ingestion (`POST /ingest`)

2. Parsing and Chunking

3. Embedding and Index Persistence

4. Querying (`POST /chat`) via SSE

Start Guide

Prerequisites

Option A: Run Locally (Python)

Option B: Run with Docker Compose

API Usage

1. Ingest a repository

2. Ask questions about code (`ask` mode)

3. Generate architecture summary (`summary` mode)

Configuration

Supported Source Extensions

Operational Notes

Troubleshooting

`500 Ingestion failed` or clone errors

`No indexable source files found`

`No FAISS index found for session`

LLM errors / empty responses

Slow startup or first request latency

Development Tips

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.venv		.venv
app		app
ui		ui
.dockerignore		.dockerignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

CodeBase RAG

What This Project Does

Tech Stack

High-Level Architecture

Project Structure

How It Works Internally

1. Ingestion (POST /ingest)

2. Parsing and Chunking

3. Embedding and Index Persistence

4. Querying (POST /chat) via SSE

Start Guide

Prerequisites

Option A: Run Locally (Python)

Option B: Run with Docker Compose

API Usage

1. Ingest a repository

2. Ask questions about code (ask mode)

3. Generate architecture summary (summary mode)

Configuration

Supported Source Extensions

Operational Notes

Troubleshooting

500 Ingestion failed or clone errors

No indexable source files found

No FAISS index found for session

LLM errors / empty responses

Slow startup or first request latency

Development Tips

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Ingestion (`POST /ingest`)

4. Querying (`POST /chat`) via SSE

2. Ask questions about code (`ask` mode)

3. Generate architecture summary (`summary` mode)

`500 Ingestion failed` or clone errors

`No indexable source files found`

`No FAISS index found for session`

Packages