Skip to content

nancy-kataria/codebase-intelligence

Repository files navigation

Codebase Intelligence

A Retrieval-Augmented Generation (RAG) application for intelligent codebase analysis. Upload any GitHub repository and get instant AI-powered insights, summaries, and context-aware Q&A about your code through semantic search and vector embeddings.

Next.js React TypeScript OpenAI Pinecone LangChain

Features

  • Repository Analysis: Automatically analyze GitHub repositories and extract meaningful insights
  • Project Summarization: Generate AI-powered summaries including tech stack, architecture patterns, and code statistics
  • RAG-Powered Chat: Ask natural language questions and receive context-aware answers by retrieving relevant code snippets and augmenting LLM prompts
  • Vector Embeddings: Uses Pinecone vector database for semantic search and efficient code context retrieval

Tech Stack

Frontend & Backend

  • Next.js 14 - React framework with App Router and API routes
  • React 19 - UI framework
  • TypeScript - Type safety
  • Tailwind CSS - Styling
  • Zod - Runtime type validation

AI & Vector Database (RAG Stack)

  • LangChain - Document loading, chunking, and LLM orchestration
  • Pinecone - Vector database for semantic search and retrieval
  • OpenAI - GPT-4o for generation, text-embedding-3-small for retrieval

Getting Started

Prerequisites

  • Node.js 18+
  • Git
  • OpenAI API key
  • Pinecone API key
  • GitHub Personal Access Token

Installation

  1. Clone the repository:
git clone <repository-url>
cd codebase-intelligence
  1. Install dependencies:
npm install
  1. Set up environment variables:
cp .env.local

Development

Start the Next.js development server:

npm run dev

The application will be available at http://localhost:3000

RAG Pipeline

The application implements a three-stage RAG workflow:

  1. Ingestion (/api/ingest): Loads GitHub repository files, chunks them into semantic segments, generates vector embeddings, and stores them in Pinecone with metadata
  2. Summarization (/api/summarize): Retrieves code context vectors and augments GPT-4o prompts to generate architecture summaries and tech stack analysis
  3. Conversation (/api/chat): For each user query, retrieves relevant code context from Pinecone and augments the LLM prompt to provide accurate, code-informed responses

Build

npm run build

Validation

  • Zod Schemas for request validation across all endpoints
  • Type-safe API responses with TypeScript inference
  • Client-side validation before API calls

Contributing

This project is open source and contributions are welcome! Feel free to open issues, submit pull requests, or suggest improvements.

About

RAG application for intelligent codebase analysis.

Resources

Stars

Watchers

Forks