Skip to content

gauravvij/GithubRepoAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Made by NEO

NEO — Your autonomous AI Engineering Agent. Try NEO in your VS Code IDE for your AI/ML tasks.


Query Any Github Repo

A terminal-styled AI agent that analyzes any GitHub repository and answers questions about its codebase. Point it at any public GitHub URL and get a comprehensive architectural breakdown — then ask follow-up questions in a conversational interface.

query-any-git-repo-neo.1.mp4

What It Does

  • Clones any GitHub repository into a temporary directory
  • Scans the entire codebase recursively, ignoring binaries, .git, node_modules, etc.
  • Generates a structured analysis report covering:
    • Directory hierarchy and file inventory
    • Key components, modules, and their interactions
    • Dependency and import relationships
    • Architectural patterns and entry points
    • Data flow across the project
  • Answers follow-up questions about the codebase in a multi-turn conversational interface
  • Streams progress in real-time showing map/reduce pipeline stages

Tech Stack

  • LLM: Multi-provider support — OpenRouter (default), MiniMax, or any OpenAI-compatible endpoint
  • Backend: Python + Flask with Server-Sent Events (SSE) for streaming
  • Analysis: Parallel map-reduce chunking with token-aware hierarchical reduction
  • Frontend: Terminal-styled dark web UI

Getting Started

1. Clone the repo

git clone https://github.com/gauravvij/query-any-repo.git
cd query-any-repo

2. Install dependencies

pip install -r requirements.txt

3. Configure environment

cp .env.example .env
# Edit .env and add your API key (see "Supported Providers" below)

4. Run the server

python app.py

Open your browser at http://localhost:5000

Usage

  1. Paste any public GitHub repository URL into the input field
  2. Click Analyze — the agent clones the repo and runs the map-reduce analysis pipeline
  3. View the structured report in the terminal-styled interface
  4. Ask follow-up questions about the codebase in the chat input below

Configuration

All configuration is done via .env (see .env.example):

Variable Description Default
LLM_PROVIDER Provider preset: openrouter, minimax, or custom openrouter
OPENROUTER_API_KEY OpenRouter API key (when using OpenRouter) (required for openrouter)
MINIMAX_API_KEY MiniMax API key (when using MiniMax) (required for minimax)
MODEL_NAME LLM model to use Provider-dependent
LLM_BASE_URL Override base URL for any provider Provider-dependent
LLM_API_KEY Override API key for any provider Provider-dependent
FLASK_HOST Server bind host 0.0.0.0
FLASK_PORT Server port 5000
GITHUB_CLONE_BASE Temp dir for cloned repos /tmp/codebase_agent_repos

Supported Providers

OpenRouter (default)

LLM_PROVIDER=openrouter
OPENROUTER_API_KEY=your_key_here
MODEL_NAME=google/gemini-2.5-flash-lite   # or any OpenRouter model

MiniMax

MiniMax offers high-performance models with a 204K token context window — ideal for analyzing large codebases in fewer chunks.

LLM_PROVIDER=minimax
MINIMAX_API_KEY=your_key_here
# MODEL_NAME=MiniMax-M2.7              # default — latest flagship, enhanced reasoning
# MODEL_NAME=MiniMax-M2.7-highspeed    # high-speed version for low-latency scenarios
# MODEL_NAME=MiniMax-M2.5              # previous generation
# MODEL_NAME=MiniMax-M2.5-highspeed    # previous generation, high-speed
Model Context Window Input Price Output Price
MiniMax-M2.7 204,800 tokens $0.3/M tokens $1.2/M tokens
MiniMax-M2.7-highspeed 204,800 tokens $0.6/M tokens $2.4/M tokens
MiniMax-M2.5 204,800 tokens $0.3/M tokens $1.2/M tokens
MiniMax-M2.5-highspeed 204,800 tokens $0.6/M tokens $2.4/M tokens

Get your API key at platform.minimax.io.

Custom Provider

Any OpenAI-compatible endpoint can be used:

LLM_PROVIDER=custom
LLM_API_KEY=your_key_here
LLM_BASE_URL=https://your-endpoint/v1
MODEL_NAME=your-model-name

Architecture

app.py              # Flask server, SSE streaming endpoints
agent.py            # Parallel map-reduce LLM analysis pipeline
scanner.py          # Recursive directory scanner and file parser
github_utils.py     # GitHub repo cloning utility
templates/
  index.html        # Terminal-styled web UI

Analysis Pipeline

  1. Scan — recursively traverse the cloned repo, collect all text-based source files
  2. Chunk — split large codebases into token-safe chunks (well under 128k tokens each)
  3. Map — summarize each chunk in parallel using ThreadPoolExecutor
  4. Reduce — hierarchically merge summaries (token-aware, capped at 100k tokens per call)
  5. Report — produce final structured analysis
  6. Q&A — answer follow-up questions with full report context preserved

License

MIT

About

An LLM agent to query any github repo - Ask questions, features, gaps, improvements

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors