A Retrieval-Augmented Generation (RAG) application for intelligent codebase analysis. Upload any GitHub repository and get instant AI-powered insights, summaries, and context-aware Q&A about your code through semantic search and vector embeddings.
- Repository Analysis: Automatically analyze GitHub repositories and extract meaningful insights
- Project Summarization: Generate AI-powered summaries including tech stack, architecture patterns, and code statistics
- RAG-Powered Chat: Ask natural language questions and receive context-aware answers by retrieving relevant code snippets and augmenting LLM prompts
- Vector Embeddings: Uses Pinecone vector database for semantic search and efficient code context retrieval
- Next.js 14 - React framework with App Router and API routes
- React 19 - UI framework
- TypeScript - Type safety
- Tailwind CSS - Styling
- Zod - Runtime type validation
- LangChain - Document loading, chunking, and LLM orchestration
- Pinecone - Vector database for semantic search and retrieval
- OpenAI - GPT-4o for generation, text-embedding-3-small for retrieval
- Node.js 18+
- Git
- OpenAI API key
- Pinecone API key
- GitHub Personal Access Token
- Clone the repository:
git clone <repository-url>
cd codebase-intelligence- Install dependencies:
npm install- Set up environment variables:
cp .env.localStart the Next.js development server:
npm run devThe application will be available at http://localhost:3000
The application implements a three-stage RAG workflow:
- Ingestion (
/api/ingest): Loads GitHub repository files, chunks them into semantic segments, generates vector embeddings, and stores them in Pinecone with metadata - Summarization (
/api/summarize): Retrieves code context vectors and augments GPT-4o prompts to generate architecture summaries and tech stack analysis - Conversation (
/api/chat): For each user query, retrieves relevant code context from Pinecone and augments the LLM prompt to provide accurate, code-informed responses
npm run build- Zod Schemas for request validation across all endpoints
- Type-safe API responses with TypeScript inference
- Client-side validation before API calls
This project is open source and contributions are welcome! Feel free to open issues, submit pull requests, or suggest improvements.