Codebase Intelligence

A Retrieval-Augmented Generation (RAG) application for intelligent codebase analysis. Upload any GitHub repository and get instant AI-powered insights, summaries, and context-aware Q&A about your code through semantic search and vector embeddings.

Features

Repository Analysis: Automatically analyze GitHub repositories and extract meaningful insights
Project Summarization: Generate AI-powered summaries including tech stack, architecture patterns, and code statistics
RAG-Powered Chat: Ask natural language questions and receive context-aware answers by retrieving relevant code snippets and augmenting LLM prompts
Vector Embeddings: Uses Pinecone vector database for semantic search and efficient code context retrieval

Tech Stack

Frontend & Backend

Next.js 14 - React framework with App Router and API routes
React 19 - UI framework
TypeScript - Type safety
Tailwind CSS - Styling
Zod - Runtime type validation

AI & Vector Database (RAG Stack)

LangChain - Document loading, chunking, and LLM orchestration
Pinecone - Vector database for semantic search and retrieval
OpenAI - GPT-4o for generation, text-embedding-3-small for retrieval

Getting Started

Prerequisites

Node.js 18+
Git
OpenAI API key
Pinecone API key
GitHub Personal Access Token

Installation

Clone the repository:

git clone <repository-url>
cd codebase-intelligence

Install dependencies:

npm install

Set up environment variables:

cp .env.local

Development

Start the Next.js development server:

npm run dev

The application will be available at http://localhost:3000

RAG Pipeline

The application implements a three-stage RAG workflow:

Ingestion (/api/ingest): Loads GitHub repository files, chunks them into semantic segments, generates vector embeddings, and stores them in Pinecone with metadata
Summarization (/api/summarize): Retrieves code context vectors and augments GPT-4o prompts to generate architecture summaries and tech stack analysis
Conversation (/api/chat): For each user query, retrieves relevant code context from Pinecone and augments the LLM prompt to provide accurate, code-informed responses

Build

npm run build

Validation

Zod Schemas for request validation across all endpoints
Type-safe API responses with TypeScript inference
Client-side validation before API calls

Contributing

This project is open source and contributions are welcome! Feel free to open issues, submit pull requests, or suggest improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
app		app
components		components
hooks		hooks
lib		lib
public		public
scripts		scripts
.gitignore		.gitignore
.npmrc		.npmrc
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codebase Intelligence

Features

Tech Stack

Frontend & Backend

AI & Vector Database (RAG Stack)

Getting Started

Prerequisites

Installation

Development

RAG Pipeline

Build

Validation

Contributing

About

Uh oh!

Languages

nancy-kataria/codebase-intelligence

Folders and files

Latest commit

History

Repository files navigation

Codebase Intelligence

Features

Tech Stack

Frontend & Backend

AI & Vector Database (RAG Stack)

Getting Started

Prerequisites

Installation

Development

RAG Pipeline

Build

Validation

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages