AI-powered semantic search for your codebase in GitHub Copilot, Kiro, and other MCP-compatible editors
A Model Context Protocol (MCP) server that enables AI editors to search and understand your codebase using Google's Gemini embeddings and Qdrant vector storage.
Supported Editors:
- β VS Code with GitHub Copilot
- β VS Code with Roo Cline
- β GitHub Copilot CLI
- β Google Gemini CLI
- β Kiro AI Editor
- β Any MCP-compatible editor
- π Full Documentation - Complete documentation
- βοΈ Setup Guide - VS Code - Installation for VS Code Copilot
- π₯οΈ Setup Guide - CLI - Installation for GitHub Copilot CLI
- π€ Setup Guide - Gemini CLI - Installation for Google Gemini CLI
- π― Setup Guide - Kiro - Installation for Kiro AI Editor
- π¦ Setup Guide - Roo Cline - Installation for Roo Cline (VS Code)
- β‘ Quick Reference - Command cheat sheet
- πΊοΈ Navigation Guide - Find any doc quickly
- Source Code Structure - Code organization
- MCP Server Guide - Build your own MCP server
- Roadmap - Future plans
- Qdrant Setup - Get Qdrant credentials
- Testing Guide - Test search functionality
- Prompt Enhancement Guide - Use prompt enhancement effectively
- Vector Visualization Guide - Visualize your codebase
- Changelog - Version history
- π Semantic Search - Find code by meaning, not just keywords
- π― Smart Chunking - Automatically splits code into logical functions/classes
- π Incremental Indexing - Only re-indexes changed files (90%+ time savings)
- πΎ Auto-save Checkpoints - Saves progress every 10 files, resume anytime
- π Real-time Progress - Track indexing with ETA and performance metrics
- β‘ Parallel Processing - 25x faster indexing with batch execution
- π Real-time Watch - Auto-updates index on file changes
- π Multi-language - Supports 15+ programming languages
- βοΈ Vector Storage - Uses Qdrant for persistent storage
- π€ Prompt Enhancement - AI-powered query improvement (optional)
- οΏ½ Vector Visualization - 2D/3D UMAP visualization of your codebase
- ποΈ Modular Architecture - Clean handler separation for maintainability
- οΏ½π¦ Simple Setup - Just 4 environment variables
- Gemini API Key - Get free at Google AI Studio
- Qdrant Cloud Account - Sign up free at cloud.qdrant.io
Choose your environment:
- VS Code Users: Follow steps below or see Roo Cline Setup
- Copilot CLI Users: See Copilot CLI Setup Guide
- Gemini CLI Users: See Gemini CLI Setup Guide
- Kiro Users: See Kiro Setup Guide
Step 1: Open MCP Configuration in VS Code
- Open GitHub Copilot Chat (
Ctrl+Alt+I/Cmd+Alt+I) - Click Settings icon β MCP Servers β MCP Configuration (JSON)
Step 2: Add this configuration to mcp.json:
{
"servers": {
"codebase": {
"command": "npx",
"args": ["-y", "@ngotaico/mcp-codebase-index"],
"env": {
"REPO_PATH": "/absolute/path/to/your/project",
"GEMINI_API_KEY": "AIzaSyC...",
"QDRANT_URL": "https://your-cluster.gcp.cloud.qdrant.io:6333",
"QDRANT_API_KEY": "eyJhbGci..."
},
"type": "stdio"
}
}
}Step 3: Restart VS Code
The server will automatically:
- Connect to Qdrant Cloud
- Index your codebase
- Watch for file changes
π Detailed instructions:
Ask GitHub Copilot:
"Find the authentication logic"
"Show me how database connections are handled"
"Where is error logging implemented?"
Ask GitHub Copilot:
"Visualize my codebase"
"Show me how my code is organized"
"Visualize authentication code"
π Complete guide: Vector Visualization Guide
"Check indexing status"
"Show me detailed indexing progress"
π More examples: Testing Guide
See your codebase in 2D/3D space - Understand semantic relationships and code organization visually.
Vector visualization transforms your codebase's 768-dimensional embeddings into interactive 2D or 3D visualizations using UMAP dimensionality reduction. This allows you to:
- π¨ Explore semantic relationships - Similar code clusters together
- π Understand architecture - See your codebase structure at a glance
- π― Debug search results - Visualize why certain code was retrieved
- π Track code organization - Identify modules, patterns, and outliers
Visualize entire codebase:
User: "Visualize my codebase"
Result: Interactive clusters showing:
- API Controllers & Routes (28%)
- Database Models (23%)
- Authentication (19%)
- Business Logic (18%)
- Test Suites (12%)
Export as HTML:
User: "Export visualization as HTML"
Result: Standalone HTML file with:
- Interactive hover, zoom, pan
- Click clusters to highlight
- Modern gradient UI
- Works offline
Colors and Clusters:
- Each color represents a semantic cluster (module/functionality)
- Points close together = similar in meaning
- Distance reflects semantic similarity
- Outliers indicate unique/specialized code
Common Cluster Patterns:
- Blue: Frontend/UI components
- Orange: API endpoints and routes
- Green: Database models and queries
- Red: Authentication and security
- Purple: Tests and validation
- Gray: Utilities and helpers
-
ποΈ Architecture Understanding
- Visualize to see module boundaries
- Identify tightly coupled code
- Find opportunities for refactoring
-
π Code Discovery
- Locate related functionality visually
- Find all code touching a feature
- Discover cross-cutting concerns
-
π Search Debugging
- Understand why results were retrieved
- See semantic relationships
- Refine queries based on visualization
-
π₯ Team Onboarding
- Export HTML for new developers
- Visual guide to codebase structure
- Interactive exploration tool
-
β Refactoring Validation
- Visualize before/after refactoring
- Verify improved code organization
- Track architecture evolution
| Collection Size | Processing Time | Recommended maxVectors |
|---|---|---|
| Small (<500 vectors) | ~1s | 500 |
| Medium (500-2K) | ~4s | 1000 |
| Large (2K-10K) | ~15s | 2000 |
| Very Large (>10K) | ~30s | 3000 |
Tips:
- Use 2D for faster processing (40% faster than 3D)
- Limit maxVectors for large codebases
- Export HTML for offline exploration
For detailed documentation including:
- Complete tool reference
- Interpretation guide
- Technical details (UMAP, clustering)
- Troubleshooting
- Best practices
- Advanced use cases
See: Vector Visualization Guide
TL;DR: Prompt enhancement is a transparent background tool that automatically improves search quality. Just ask naturally - no need to mention "enhance" in your prompts.
When enabled (PROMPT_ENHANCEMENT=true), the AI automatically:
- Enhances your search query with codebase context
- Searches with the improved query
- Continues with your original request (implement, fix, explain, etc.)
β
"Find authentication logic and add 2FA support"
β
"Locate payment flow and fix the timeout issue"
β
"Search for profile feature and add bio field"
Why these work: Clear goal (find + action) β AI knows what to do
β "Enhance and search for authentication"
β "Use prompt enhancement to find profile"
Why these fail: No clear action β AI stops after search
Prompt enhancement is invisible infrastructure.
Just tell the AI what you want to accomplish. It will automatically use enhancement to improve search quality behind the scenes.
Think of it like autocomplete: You don't say "use autocomplete" - you just type and it helps automatically.
For detailed guide including:
- Technical details and architecture
- Configuration options
- Real-world examples (TypeScript, Python, Dart, etc.)
- Performance tips and optimization
- Troubleshooting and FAQ
- Advanced use cases
{
"env": {
"REPO_PATH": "/Users/you/Projects/myapp",
"GEMINI_API_KEY": "AIzaSyC...",
"QDRANT_URL": "https://xxx.gcp.cloud.qdrant.io:6333",
"QDRANT_API_KEY": "eyJhbGci..."
}
}{
"env": {
"QDRANT_COLLECTION": "my_project",
"WATCH_MODE": "true",
"BATCH_SIZE": "50",
"EMBEDDING_MODEL": "text-embedding-004",
"PROMPT_ENHANCEMENT": "true"
}
}π Full configuration guide: Setup Guide
Python β’ TypeScript β’ JavaScript β’ Dart β’ Go β’ Rust β’ Java β’ Kotlin β’ Swift β’ Ruby β’ PHP β’ C β’ C++ β’ C# β’ Shell β’ SQL β’ HTML β’ CSS
| Metric | Value |
|---|---|
| Indexing Speed | ~25 files/min |
| Search Latency | <100ms |
| Incremental Savings | 90%+ time reduction |
| Parallel Processing | 25 chunks/sec |
π Performance details: Main Documentation
- Check Copilot Chat β Settings β MCP Servers β Show Output
- Verify all 4 env variables are set
- Ensure
REPO_PATHis absolute path
curl -H "api-key: YOUR_KEY" \
https://YOUR_CLUSTER.gcp.cloud.qdrant.io:6333/collections- Large repos take 5-10 minutes initially
- Subsequent runs only index changed files (90%+ faster)
π More troubleshooting: Main Documentation
mcp-codebase-index/
βββ docs/ # All documentation
β βββ README.md # Main documentation
β βββ SETUP.md # Setup guide
β βββ CHANGELOG.md # Version history
β βββ NAVIGATION.md # Navigation guide
β βββ guides/ # Detailed guides
β βββ planning/ # Development planning
β
βββ src/ # Source code
β βββ core/ # Core business logic
β βββ storage/ # Data persistence
β βββ enhancement/ # Prompt enhancement
β βββ visualization/ # Vector visualization
β βββ mcp/ # MCP server
β β βββ server.ts # Server orchestration (1237 lines)
β β βββ handlers/ # Modular handlers (1045 lines)
β β βββ templates/ # HTML templates
β β βββ types/ # Handler types
β βββ types/ # Type definitions
β βββ index.ts # Entry point
β
βββ config/ # Configuration files
βββ .data/ # Runtime data (gitignored)
βββ package.json
βββ README.md # This file
π Detailed structure: Project Structure | Source Code Structure
npm run buildnpm run devnpm testπ Development guide: Source Code Structure
Contributions welcome! Check out:
- Improvement Plan - Roadmap
- Issues - Detailed feature docs
- Source Code - Code structure
MIT Β© NgoTaiCo
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: ngotaico.flutter@gmail.com
β If you find this useful, please star the repo!