NEO — Your autonomous AI Engineering Agent. Try NEO in your VS Code IDE for your AI/ML tasks.
A terminal-styled AI agent that analyzes any GitHub repository and answers questions about its codebase. Point it at any public GitHub URL and get a comprehensive architectural breakdown — then ask follow-up questions in a conversational interface.
query-any-git-repo-neo.1.mp4
- Clones any GitHub repository into a temporary directory
- Scans the entire codebase recursively, ignoring binaries,
.git,node_modules, etc. - Generates a structured analysis report covering:
- Directory hierarchy and file inventory
- Key components, modules, and their interactions
- Dependency and import relationships
- Architectural patterns and entry points
- Data flow across the project
- Answers follow-up questions about the codebase in a multi-turn conversational interface
- Streams progress in real-time showing map/reduce pipeline stages
- LLM: Multi-provider support — OpenRouter (default), MiniMax, or any OpenAI-compatible endpoint
- Backend: Python + Flask with Server-Sent Events (SSE) for streaming
- Analysis: Parallel map-reduce chunking with token-aware hierarchical reduction
- Frontend: Terminal-styled dark web UI
git clone https://github.com/gauravvij/query-any-repo.git
cd query-any-repopip install -r requirements.txtcp .env.example .env
# Edit .env and add your API key (see "Supported Providers" below)python app.pyOpen your browser at http://localhost:5000
- Paste any public GitHub repository URL into the input field
- Click Analyze — the agent clones the repo and runs the map-reduce analysis pipeline
- View the structured report in the terminal-styled interface
- Ask follow-up questions about the codebase in the chat input below
All configuration is done via .env (see .env.example):
| Variable | Description | Default |
|---|---|---|
LLM_PROVIDER |
Provider preset: openrouter, minimax, or custom |
openrouter |
OPENROUTER_API_KEY |
OpenRouter API key (when using OpenRouter) | (required for openrouter) |
MINIMAX_API_KEY |
MiniMax API key (when using MiniMax) | (required for minimax) |
MODEL_NAME |
LLM model to use | Provider-dependent |
LLM_BASE_URL |
Override base URL for any provider | Provider-dependent |
LLM_API_KEY |
Override API key for any provider | Provider-dependent |
FLASK_HOST |
Server bind host | 0.0.0.0 |
FLASK_PORT |
Server port | 5000 |
GITHUB_CLONE_BASE |
Temp dir for cloned repos | /tmp/codebase_agent_repos |
LLM_PROVIDER=openrouter
OPENROUTER_API_KEY=your_key_here
MODEL_NAME=google/gemini-2.5-flash-lite # or any OpenRouter modelMiniMax offers high-performance models with a 204K token context window — ideal for analyzing large codebases in fewer chunks.
LLM_PROVIDER=minimax
MINIMAX_API_KEY=your_key_here
# MODEL_NAME=MiniMax-M2.7 # default — latest flagship, enhanced reasoning
# MODEL_NAME=MiniMax-M2.7-highspeed # high-speed version for low-latency scenarios
# MODEL_NAME=MiniMax-M2.5 # previous generation
# MODEL_NAME=MiniMax-M2.5-highspeed # previous generation, high-speed| Model | Context Window | Input Price | Output Price |
|---|---|---|---|
MiniMax-M2.7 |
204,800 tokens | $0.3/M tokens | $1.2/M tokens |
MiniMax-M2.7-highspeed |
204,800 tokens | $0.6/M tokens | $2.4/M tokens |
MiniMax-M2.5 |
204,800 tokens | $0.3/M tokens | $1.2/M tokens |
MiniMax-M2.5-highspeed |
204,800 tokens | $0.6/M tokens | $2.4/M tokens |
Get your API key at platform.minimax.io.
Any OpenAI-compatible endpoint can be used:
LLM_PROVIDER=custom
LLM_API_KEY=your_key_here
LLM_BASE_URL=https://your-endpoint/v1
MODEL_NAME=your-model-nameapp.py # Flask server, SSE streaming endpoints
agent.py # Parallel map-reduce LLM analysis pipeline
scanner.py # Recursive directory scanner and file parser
github_utils.py # GitHub repo cloning utility
templates/
index.html # Terminal-styled web UI
- Scan — recursively traverse the cloned repo, collect all text-based source files
- Chunk — split large codebases into token-safe chunks (well under 128k tokens each)
- Map — summarize each chunk in parallel using
ThreadPoolExecutor - Reduce — hierarchically merge summaries (token-aware, capped at 100k tokens per call)
- Report — produce final structured analysis
- Q&A — answer follow-up questions with full report context preserved
MIT