Skip to content

rayen03/CodeBase_RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeBase RAG

CodeBase RAG is a FastAPI service that lets you chat with any public GitHub repository.

It works in two phases:

  1. Ingest a repository: clone -> parse code with Tree-sitter -> embed chunks -> build/persist a FAISS index.
  2. Query the indexed repository: retrieve relevant chunks and stream a grounded LLM response over SSE.

What This Project Does

  • Clones a public GitHub repository on demand.
  • Extracts semantic code chunks (functions, classes, methods) using Tree-sitter.
  • Builds a local FAISS vector index for semantic retrieval.
  • Supports two chat modes:
    • ask: retrieval-augmented Q&A over relevant chunks.
    • summary: map-reduce architecture summary of the repository.
  • Streams responses as Server-Sent Events (SSE) for token-by-token UI rendering.

Tech Stack

  • API: FastAPI, Uvicorn
  • LLM orchestration: LangChain
  • LLM provider: Google Gemini (langchain-google-genai)
  • Embeddings: sentence-transformers/all-MiniLM-L6-v2
  • Vector store: FAISS (CPU)
  • Parsing: Tree-sitter (Python, JS/TS, Go, Java, Rust, C/C++)
  • Git access: GitPython

High-Level Architecture

Client
  |
  | POST /ingest (repo URL)
  v
FastAPI (app/main.py)
  -> Ingestor (app/ingestor.py)
     -> Clone repo
     -> Collect source files
     -> Parse into CodeChunk objects (app/tree_sitter_parser.py)
     -> Build + persist FAISS index (app/vector_store.py)

Client
  |
  | POST /chat (session_id, query, mode)
  v
FastAPI SSE endpoint (app/main.py)
  -> RAG chain (app/rag_chain.py)
     -> Load FAISS index
     -> Retrieve docs (ask) or all docs (summary)
     -> Call Gemini and stream tokens

Project Structure

Dockerfile
docker-compose.yaml
requirements.txt
app/
  config.py              # environment-based settings
  schemas.py             # request/response + chunk models
  tree_sitter_parser.py  # semantic code chunk extraction
  vector_store.py        # embeddings + FAISS persistence/search
  ingestor.py            # end-to-end ingestion pipeline
  rag_chain.py           # ask + summary chain logic
  main.py                # FastAPI app + routes + SSE

How It Works Internally

1. Ingestion (POST /ingest)

Implemented in app/ingestor.py:

  • Computes deterministic session_id from repo URL.
  • Clones repo shallowly (depth=1) into REPO_CLONE_BASE/<session_id>.
  • Filters indexable source files by extension and max file size.
  • Parses each file via parse_file(...) from app/tree_sitter_parser.py.
  • Builds FAISS index via build_index(...) in app/vector_store.py.
  • Deletes cloned repository directory after index persistence.

Output: IngestResponse with:

  • session_id
  • files_processed
  • chunks_indexed

2. Parsing and Chunking

Implemented in app/tree_sitter_parser.py:

  • Uses extension -> grammar registry for supported languages.
  • Extracts top-level definitions and methods where possible.
  • Adds metadata per chunk:
    • file path
    • language
    • symbol name
    • kind (function, class, method, module)
    • start/end lines
  • Splits oversized chunks with overlap to stay within token limits.
  • Falls back to line-based splitting when grammar is missing or no defs are extracted.

3. Embedding and Index Persistence

Implemented in app/vector_store.py:

  • Lazily loads a singleton HuggingFace embedding model.
  • Converts each CodeChunk to a LangChain Document.
  • Persists FAISS index under FAISS_STORE_BASE/<session_id>.
  • Loads existing index for chat retrieval.

4. Querying (POST /chat) via SSE

Implemented in app/main.py and app/rag_chain.py:

  • ask mode:
    • loads index
    • runs similarity search (top_k)
    • builds grounded context block
    • streams Gemini answer token-by-token
  • summary mode:
    • retrieves broad set of chunks from index
    • map phase: per-chunk summaries
    • reduce phase: single architecture overview
    • streams final output

SSE events are sent as:

  • data: {"token": "..."}
  • final: data: {"done": true}

Start Guide

Prerequisites

  • Python 3.12+
  • Git installed and available on PATH
  • Google Gemini API key

Optional:

  • Docker + Docker Compose

Option A: Run Locally (Python)

  1. Create and activate virtual environment.
python -m venv .venv
.\.venv\Scripts\Activate.ps1
  1. Install dependencies.
pip install -r requirements.txt
  1. Create .env from .env.example and set your API key.
GOOGLE_API_KEY=your-API-key
  1. Start API server.
uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 1
  1. Verify service health.
curl http://localhost:8000/health

Expected response:

{"status":"ok"}

Option B: Run with Docker Compose

  1. Create .env from .env.example and set GOOGLE_API_KEY.
  2. Build and run.
docker compose up --build
  1. Check health endpoint.
curl http://localhost:8000/health

Notes:

  • FAISS data is persisted in Docker volume faiss_data.
  • Default in-container paths:
    • REPO_CLONE_BASE=/tmp/codebase_rag_repos
    • FAISS_STORE_BASE=/tmp/codebase_rag_stores

API Usage

Interactive docs:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

1. Ingest a repository

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"repo_url":"https://github.com/psf/requests"}'

Example response:

{
  "session_id": "requests-1a2b3c4d5e6f7890",
  "files_processed": 123,
  "chunks_indexed": 487
}

2. Ask questions about code (ask mode)

curl -N -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "session_id":"requests-1a2b3c4d5e6f7890",
    "query":"How does retry logic work?",
    "mode":"ask",
    "top_k":6
  }'

3. Generate architecture summary (summary mode)

query is required by the request schema, even in summary mode. Use a placeholder value.

curl -N -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "session_id":"requests-1a2b3c4d5e6f7890",
    "query":"summary",
    "mode":"summary",
    "top_k":6
  }'

Configuration

All settings are in app/config.py and can be overridden with environment variables.

Important variables:

  • GOOGLE_API_KEY: Gemini API key (required)
  • LLM_MODEL_NAME: default gemini-2.0-flash
  • LLM_TEMPERATURE: default 0.1
  • LLM_MAX_OUTPUT_TOKENS: default 4096
  • EMBEDDING_MODEL_NAME: default sentence-transformers/all-MiniLM-L6-v2
  • DEFAULT_TOP_K: default 6
  • REPO_CLONE_BASE: local temp clone dir
  • FAISS_STORE_BASE: persistent FAISS index dir
  • MAX_FILE_SIZE_BYTES: per-file indexing limit (default 512000)
  • MAX_CHUNK_CHARS: semantic chunk size cap (default 8000)
  • FALLBACK_CHUNK_CHARS: fallback splitter size (default 2000)
  • FALLBACK_CHUNK_OVERLAP: fallback overlap (default 200)

Supported Source Extensions

Configured via indexable_extensions in app/config.py:

  • .py, .js, .jsx, .ts, .tsx
  • .go, .java, .rs
  • .c, .cpp, .h, .hpp
  • .cs, .rb, .php, .swift, .kt, .scala

Tree-sitter grammars are currently wired for:

  • Python, JavaScript, TypeScript, TSX, Go, Java, Rust, C, C++

If an indexed file extension has no grammar, fallback splitting is used.

Operational Notes

  • Session IDs are deterministic per repo URL. Re-ingesting the same URL reuses the same session directory and rebuilds the index.
  • The clone directory is deleted after ingestion completes.
  • FAISS index directory is retained for later chat requests.
  • Embedding model is pre-warmed on app startup to reduce first-ingest latency.
  • CORS is currently open (allow_origins=["*"]) for easier local integration.

Troubleshooting

500 Ingestion failed or clone errors

  • Confirm repository URL is public and reachable.
  • Verify Git is installed and accessible in runtime environment.

No indexable source files found

  • Repository may not contain configured extensions.
  • File sizes may exceed MAX_FILE_SIZE_BYTES.

No FAISS index found for session

  • Ensure /ingest completed successfully.
  • Confirm you are using the exact returned session_id.

LLM errors / empty responses

  • Ensure GOOGLE_API_KEY is set correctly.
  • Check model name and quota limits for your Google AI account.

Slow startup or first request latency

  • First run may download embedding model weights.
  • Keep container running so model cache is reused.

Development Tips

  • Start with a smaller repository to validate end-to-end behavior quickly.
  • Use /docs to experiment with request payloads.
  • Keep top_k moderate (4-8) for a good quality/latency tradeoff.

About

FastAPI service for semantic code search and Q&A over any public GitHub repository. Ingests repos via Tree-sitter AST parsing, embeds semantic code chunks into a FAISS vector index, and streams grounded LLM responses (Google Gemini) over SSE — supporting both retrieval-augmented Q&A and map-reduce architecture summarization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors