Home

mbayue/gitSdm Wiki

Version: 1

Overview

Introduction & Quick Start

Relevant source files

The following files were used as context for generating this wiki page:

Introduction & Quick Start

gitSdm (Git Software Dependency Map) is an AI-powered repository intelligence platform designed to provide instant, interactive architecture visualizations of GitHub codebases. It transforms the traditional method of manually tracing dependencies and reading directory structures into a graph-first experience, allowing developers to understand unfamiliar codebases in seconds rather than days. Sources: README.md:1-25, server/ai/prompts.ts:47-50

The system utilizes a sophisticated pipeline that ingests a GitHub URL, parses manifest files (such as package.json), resolves imports, and generates an interactive dependency map using React Flow and Dagre layout engines. This visualization is further enriched by AI-driven insights, including architectural summaries, code explanations in ELI5 (Explain It Like I'm 5) mode, and health audits. Sources: README.md:139-160, src/components/home/HowItWorks.tsx:3-12

🛠 Project Architecture & Structure

The project is structured as a modular full-stack application with a clear separation between the Express-based backend services and the React-based frontend visualization layer. Sources: README.md:33-72

High-Level Directory Overview

Directory	Purpose
`api/`	Vercel serverless functions for deployment.
`server/`	Core backend logic, including AI handlers, graph builders, and GitHub clients.
`src/`	Frontend application containing UI components, visualization stores, and hooks.
`public/`	Static assets and background workers for layout calculations.

Sources: README.md:33-72

Analysis Pipeline

The following diagram illustrates the data flow from initial user input to the final interactive visualization.

graph TD
    A[GitHub URL Input] --> B[Fetch Tree & Metadata]
    B --> C[Parse Manifests]
    C --> D[Resolve Imports]
    D --> E[Build Dependency Graph]
    E --> F[Generate Visual Map]
    F --> G[Enrich with AI Insights]
    
    style A fill:#161b22,stroke:#30363d
    style G fill:#238636,stroke:#30363d

The pipeline processes repository data in six distinct stages to ensure accurate mapping and deep contextual understanding. Sources: src/components/home/HowItWorks.tsx:3-12, README.md:209-216

🚀 Quick Start Guide

Prerequisites

Before setting up the environment, ensure the following tools are installed:

Bun >= 1.1 (Recommended for runtime and package management)
Node.js >= 22 (Alternative backend support)
pnpm >= 9 (Recommended if not using Bun)
GitHub Personal Access Token (Optional, but recommended to increase API rate limits)

Sources: README.md:75-81, CONTRIBUTING.md:12-14

Installation Steps

Clone the Repository:

git clone https://github.com/mbayue/gitSdm.git
cd gitSdm

Install Dependencies:
```
bun install
# OR
pnpm install
```
Environment Setup: Copy the example environment file and configure your keys:
```
cp .env.example .env
```

Sources: README.md:83-96, CONTRIBUTING.md:17-26

Environment Configuration

The platform supports multiple AI providers. Configuration is handled via the .env file.

Variable	Description	Default / Options
`GITHUB_TOKEN`	GitHub API access token	Optional (increases rate limits)
`AI_PROVIDER`	Active AI service	`mock`, `gemini`, `openai`, `anthropic`
`GEMINI_API_KEY`	Key for Google Gemini	Required if provider is `gemini`
`OPENAI_API_KEY`	Key for OpenAI	Required if provider is `openai`
`ANTHROPIC_API_KEY`	Key for Anthropic Claude	Required if provider is `anthropic`

Sources: README.md:98-110, server/ai/provider.ts:24-58

💻 Development Workflow

Starting the Servers

gitSdm utilizes a concurrent development setup for both the Express backend and the Vite frontend. Sources: CONTRIBUTING.md:36-47

sequenceDiagram
    participant Dev as Developer
    participant Bun as Bun/pnpm Run Dev
    participant BE as Express Backend (Port 3001)
    participant FE as Vite Frontend (Port 5173)
    
    Dev->>Bun: Execute 'pnpm dev'
    Bun->>BE: Start Backend Service
    Bun->>FE: Start Frontend UI
    FE-->>Dev: Accessible at localhost:5173
    BE-->>FE: API Endpoints available

Sources: CONTRIBUTING.md:36-47, README.md:112-115

Core Development Commands

Command	Action
`bun dev`	Starts frontend and backend concurrently.
`bun run build`	Generates a production-ready build in the `dist/` directory.
`bun test`	Runs the full test suite (25 test suites).
`pnpm exec graphify update .`	Updates the interactive directory-topology mapping.

Sources: README.md:112-124, CONTRIBUTING.md:65-70

🤖 AI Integration & Provider Logic

The AIProvider system is designed to be plug-and-playable. It automatically detects the provider based on the environment variables or explicit overrides. Sources: server/ai/provider.ts:24-58

Provider Selection Logic

Explicit Precedence: If AI_PROVIDER is set in the environment, it overrides all other detections.
Key-based Auto-detection: The system scans for GEMINI_API_KEY, OPENAI_API_KEY, or ANTHROPIC_API_KEY in that specific order.
Mock Fallback: If no keys are provided, the system defaults to a mock provider, which returns predefined architectural summaries and "roasts" for demonstration purposes.

Sources: server/ai/provider.ts:39-65, server/ai/provider.ts:168-240

AI Prompting Strategy

The system uses a SYSTEM_PROMPT that instructs the AI to act as a "principal software architect." It focuses on four core principles:

Specificity: Referencing real file names and structures.
Senior Engineering Perspective: Identifying architectural tradeoffs.
Developer Empathy: Addressing the specific needs of someone onboarding.
Technical Language: Using terms like "request lifecycle" and "dependency injection."

Sources: server/ai/prompts.ts:47-75

📦 Core Feature Highlights

Interactive Codebase Mapping

The platform uses @xyflow/react (React Flow) and d3-force to render force-directed graphs. Files are classified by type (e.g., component, utility, config) and visually differentiated. Sources: README.md:139-146, src/components/viz/OverviewTab.tsx:180-210

Architecture Insights

Users can request natural language explanations of the codebase. The explainRepoELI5 task provides a "5-minute tour" of the project, covering the "Big Picture" and "Key Areas to Know" using friendly, conversational language. Sources: server/ai/tasks/onboarding.ts:116-146, server/ai/tasks/explain.ts:85-110

Health & Refactoring

The AI performs rigorous health assessments across five dimensions:

Maintainability
Modularity
Readability
Architecture
Complexity (Inverse complexity)

Sources: server/ai/tasks/refactor.ts:80-110, src/components/viz/ai-sidebar/AiCenterTab.tsx:140-165

Final summary of the gitSdm ecosystem: The project serves as a comprehensive intelligence layer for GitHub repositories, combining static analysis with LLM reasoning to bridge the gap between code and documentation. Sources: README.md:1-25, server/ai/prompts.ts:47-52

Repository Intelligence & Analytics

Relevant source files

The following files were used as context for generating this wiki page:

Repository Intelligence & Analytics

Repository Intelligence & Analytics in gitSdm comprises a suite of tools designed to provide developers with instant, deep insights into codebase architecture, health, and evolution. By combining metadata analysis with AI-powered diagnostics, the system transforms raw repository data into actionable intelligence, including dependency mapping, risk assessment, and contributor activity tracking.

The system operates by ingestng repository structures via the GitHub API and processing this data through a pipeline of parsers, graph builders, and Large Language Models (LLMs). This allows for features ranging from high-level architecture overviews to granular file-level explanations and automated refactoring suggestions. Sources: README.md:12-23, server/ai/prompts.ts:37-56

Architectural Intelligence

The intelligence layer utilizes a "Principal Software Architect" persona to evaluate codebases. It constructs a rich context from the repository's metadata, root-level structure, important entry files, and dependencies to provide senior-level architectural reviews.

Repository Context Construction

The system builds a comprehensive context string that includes:

Metadata: Name, description, primary language, stars, forks, and license.
Structural Data: Root-level directory listing and a flat list of up to 120 detected files for deep context.
Dependency Map: A list of detected dependencies including ecosystem and type.
Activity Metrics: Recent commit history and top contributors.

Sources: server/ai/prompts.ts:3-35, server/ai/prompts.ts:37-41

flowchart TD
    A[RepoAnalysis Object] --> B{Context Builder}
    B --> C[Meta: Stats & License]
    B --> D[Structure: Top Dirs & Files]
    B --> E[Deps: Ecosystem & Versions]
    B --> F[Timeline: Recent Activity]
    C & D & E & F --> G[Markdown Context Prompt]
    G --> H[AI Provider Tasks]

The diagram shows how the RepoAnalysis data is aggregated into a structured prompt for AI tasks. Sources: server/ai/prompts.ts:3-35

Codebase Health & Diagnostics

gitSdm performs rigorous health assessments and identifies architectural risks through dedicated AI tasks. These assessments are visualized in the AiCenterTab and OverviewTab components.

Health Metrics

The health report scores five key dimensions on a scale of 0-100:

Metric	Description	Source Criteria
Maintainability	Ease of changing the codebase	Module size, separation of concerns
Modularity	Decomposition quality	Directory structure, coupling indicators
Readability	Ease of understanding	Naming conventions, organization
Architecture	Overall design soundness	Layer separation, entry points
Complexity	Inverse complexity score	File count, nesting depth, deps

Sources: server/ai/tasks/refactor.ts:85-98, src/components/viz/ai-sidebar/AiCenterTab.tsx:32-36

Refactoring & Risk Identification

The system identifies impactful refactoring opportunities, categorizing them by risk level (High, Medium, Low) and domain (e.g., Performance, DRY, Coupling). Sources: server/ai/tasks/refactor.ts:18-28, src/components/viz/ai-sidebar/AiCenterTab.tsx:64-70

Analytics & Visualization

The frontend provides real-time analytics dashboards that visualize the statistical properties of the repository.

Component Stats and Coupling

The OverviewTab calculates and displays core repository statistics:

Node Counts: Total files and folders.
Dependency Count: External packages detected in manifests.
High Coupling: Identifies the top 5 files or folders with the highest degree of incoming/outgoing edges in the graph.
Entry Points: Surfaces files classified as 'entry' by the parser (e.g., src/main.tsx).

Sources: src/components/viz/OverviewTab.tsx:43-61, server/github/mock-data.ts:25-60

Activity & Density Tracking

The system visualizes commit density over a 24-week period using a bar chart format. This data is derived from the repository timeline and displays the relative intensity of development activity. Sources: src/components/viz/OverviewTab.tsx:280-302, server/github/mock-data.ts:266-285

sequenceDiagram
    participant UI as "OverviewTab"
    participant Store as "VizStore"
    participant RF as "ReactFlow"
    
    UI->>UI: Calculate Degree Centrality
    UI->>UI: Filter High Coupling Nodes
    UI->>Store: Set Selected Node ID
    UI->>RF: setCenter(x, y, zoom)
    Note over RF: Smooth transition to node

This sequence illustrates the interaction between the analytics UI and the visualization engine when a developer interacts with a "High Coupling" node. Sources: src/components/viz/OverviewTab.tsx:24-41

Implementation Details

Data Structures

The analysis is based on the RepoAnalysis type, which encapsulates the entire state of the repository's intelligence data.

// server/ai/prompts.ts:3
import type { RepoAnalysis } from '../../src/types';

// server/github/mock-data.ts:266-271
export async function fetchMockTimeline(): Promise<TimelineWeek[]> {
  // ...
  // week: string (ISO date)
  // count: number (commit count)
  // commits: Commit[]
}

Sources: server/ai/prompts.ts:3, server/github/mock-data.ts:266-271

Refactor Task Logic

The executeAiTask for refactoring uses the SYSTEM_PROMPT to enforce senior-level engineering standards, ensuring that suggestions are grounded in real file paths and structural decisions rather than generic patterns. Sources: server/ai/tasks/refactor.ts:16-33, server/ai/prompts.ts:37-56

Repository Intelligence & Analytics serves as the cognitive engine of gitSdm, providing a transition from simple file browsing to deep architectural understanding through automated analysis and interactive visualization.

System Architecture

High-Level Architecture

Relevant source files

The following files were used as context for generating this wiki page:

High-Level Architecture

gitSdm is a graph-first repository analysis platform designed to transform raw GitHub source code into interactive architectural visualizations. The system employs a decoupled architecture consisting of a React-based frontend for visualization and a Node.js backend for ingestion, dependency parsing, and AI-driven intelligence. The core purpose is to provide instant insights into module boundaries, dependency flows, and architectural health that would otherwise require days of manual code review.

Sources: README.md:15-18, README.md:105-110

System Overview

The project follows a pipeline-oriented architecture. Data flows from a user-provided GitHub URL through a series of server-side analysis stages, eventually being rendered as a force-directed graph on a React Flow canvas.

Analysis Pipeline

The analysis process is divided into six distinct stages:

GitHub URL Ingestion: Users provide a public repository URL.
Tree Fetching: The system retrieves the file structure and metadata via the GitHub API.
Manifest Parsing: Workspaces and dependencies are identified from files like package.json.
Import Resolution: Connections are traced across the codebase.
Graph Construction: An interactive dependency map is generated using layout engines.
AI Enrichment: Intelligence layers generate summaries, onboarding steps, and refactoring risks.

Sources: src/components/home/HowItWorks.tsx:3-10, README.md:143-150

High-Level Component Interaction

The following diagram illustrates the relationship between the browser, the API router, and the underlying services.

graph TD
  User[Developer Browser] -->|Requests| Router[Vite/Express API Router]
  Router -->|Parses GitHub| GitHubService[GitHub Tree Fetcher]
  Router -->|Orchestrates AI| AIService[AI Provider Manager]
  
  GitHubService -->|Manifest Contents| DepParser[Dependency Analyzer]
  GitHubService -->|File Tree| GraphBuilder[Graph Builder Engine]
  
  GraphBuilder -->|Positions Nodes| Layout[Dagre Layout Engine]
  Layout -->|Graph Data| UI[React Flow Canvas]
  AIService -->|Markdown/JSON| UI
  
  class User entry;
  class Router router;
  class AIService service;
  class Layout util;

Sources: server/ai/provider.ts:167-177, server/ai/tasks/diagram.ts:60-61

Backend Architecture

The backend is structured as a modular Node.js application, often deployed as Vercel serverless functions or as a standalone Express server. It handles heavy-duty tasks such as GitHub API communication, dependency resolution, and AI task orchestration.

Sources: README.md:58-69, CONTRIBUTING.md:29-30

AI Intelligence Layer

The AI system is provider-agnostic, supporting Google Gemini, OpenAI, and Anthropic. It uses a factory pattern to instantiate providers based on environment variables or user-provided keys.

Component	Responsibility	Relevant Files
AI Provider	Normalizes requests to different LLM SDKs (Gemini, OpenAI, Claude).	`server/ai/provider.ts`
Task Handlers	Specific logic for explaining code, refactoring, or generating diagrams.	`server/ai/tasks/`
Prompt Builder	Constructs context-rich prompts using repository metadata and file trees.	`server/ai/prompts.ts`
API Router	Exposes AI capabilities via REST endpoints.	`server/router/ai-routes.ts`

Sources: server/ai/provider.ts:41-65, server/router/ai-routes.ts:25-132

AI Request Lifecycle

When a user requests an architectural summary or "ELI5" explanation, the following sequence occurs:

sequenceDiagram
    participant UI as "Frontend UI"
    participant API as "AI Router"
    participant Task as "Task Handler"
    participant Prov as "AI Provider"
    
    UI->>API: POST /api/ai/explain
    API->>Task: explainRepo(params)
    Task->>Task: Build context from analysis
    Task->>Prov: complete(messages)
    Prov-->>Task: LLM String/JSON Response
    Task-->>API: Result + Cache Status
    API-->>UI: AIResponse Object

Sources: server/router/ai-routes.ts:33-41, server/ai/tasks/explain.ts:22-40

Frontend Architecture

The frontend is a React 19 Single Page Application (SPA) powered by Vite. It focuses on rendering the complex graph data and providing interactive tools for exploration.

Key Visualization Modules

Graph Canvas: Uses @xyflow/react (React Flow) and d3-force for rendering the dependency graph.
State Management: Zustand is used for global state, such as vizStore, which tracks selected nodes and UI panel visibility.
Architecture View: An interactive block diagram component that allows users to toggle between a static "Code Graph" (AST-based) and an "AI Enhanced" view (LLM-based).

Sources: README.md:120-125, src/components/viz/ArchitectureView.tsx:32-47, README.md:78-80

UI Layout Structure

The application layout is organized into functional zones:

Explorer: A dock for file inspection and code viewing.
Viz Sidebar: Contains AI chat tabs, health audits, and learning paths.
Top Nav: Handles repository searching, branch switching, and statistics.

Sources: README.md:73-82, src/components/viz/OverviewTab.tsx:145-165

Data Models and API

The system relies on a central RepoAnalysis data structure that encapsulates the repository's metadata, file tree, dependencies, and graph nodes.

Core Data Fields

Field	Type	Description
`meta`	Object	GitHub metadata (stars, forks, owner, repo name).
`tree`	Array	Hierarchical file structure.
`graph`	Object	Nodes and edges representing file relationships.
`dependencies`	Array	List of packages found in manifests.
`timeline`	Array	Commit activity patterns over time.

Sources: server/ai/prompts.ts:3-15, src/components/viz/OverviewTab.tsx:40-44

API Endpoints (AI Module)

Endpoint	Method	Purpose
`/api/ai/explain`	POST	Analyzes a specific file, node, or the whole repo.
`/api/ai/architecture`	POST	Returns a JSON-structured layer analysis.
`/api/ai/health`	POST	Generates scores for maintainability and modularity.
`/api/ai/mermaid`	POST	Generates a Mermaid.js flowchart string.
`/api/ai/roast`	POST	Generates a humorous technical critique of the repo.

Sources: server/router/ai-routes.ts:33-132

Conclusion

The high-level architecture of gitSdm emphasizes a separation between data ingestion and visual presentation. By leveraging a robust analysis pipeline on the backend and a flexible React Flow interface on the frontend, the platform provides a comprehensive environment for codebase intelligence. The integration of a provider-agnostic AI layer ensures that the architectural insights stay relevant across different project types and scales.

Sources: README.md:20-25, server/ai/provider.ts:241-250

Backend API Routing

Relevant source files

The following files were used as context for generating this wiki page:

Backend API Routing

The backend API routing system in gitSdm is designed as a modular middleware architecture that processes incoming HTTP requests, validates payloads, and orchestrates interactions between the GitHub API and various AI providers. The system acts as the primary bridge between the React frontend and the backend service layer, handling tasks ranging from repository analysis to AI-driven code explanations.

Routing is centralized in server/api-router.ts, which delegates specific domain logic to sub-routers such as server/router/ai-routes.ts, repo-routes.ts, and search-routes.ts. This separation ensures that the request lifecycle—including authentication, rate limiting, and caching—is managed consistently across the platform.

Sources: README.md:31-45, server/router/ai-routes.ts:31-36

Routing Architecture

The routing layer follows a functional pattern where handlers receive the request context, including environment-specific tokens and configuration. The system differentiates between standard repository management and specialized AI tasks.

Request Flow and Validation

When a request hits an endpoint, it undergoes the following lifecycle:

Path Identification: The router matches the pathname against predefined API strings.
Schema Validation: Request bodies are parsed and validated using Zod schemas (e.g., aiExplainSchema, repoQuerySchema) to ensure type safety before execution.
Context Injection: Handlers receive a RequestContext and optional GitHub or AI provider tokens provided by the user or environment.
Task Delegation: The router calls specialized service functions (e.g., explainRepo, generateOnboarding) which contain the business logic.

Sources: server/router/ai-routes.ts:38-45, server/router/ai-routes.ts:133-138

API Request Lifecycle Diagram

The following diagram illustrates the sequence of operations from the moment the frontend initiates an API call to the final JSON response.

sequenceDiagram
    participant FE as Frontend (React)
    participant R as AI Router (handleAiRoutes)
    participant V as Zod Validator
    participant S as AI Task Service
    participant P as AI Provider (LLM)

    FE->>R: POST /api/ai/explain
    Note right of R: Extracts userKey & GitHubToken
    R->>V: safeParse(req.json())
    alt Invalid Input
        V-->>R: Parse Error
        R-->>FE: 400 Bad Request
    else Valid Input
        V-->>R: Validated Data
        R->>S: explainRepo(params)
        S->>P: complete(prompt)
        P-->>S: Markdown/JSON Response
        S-->>R: Result + Cache Status
        R-->>FE: 200 OK (JSON)
    end

Sources: server/router/ai-routes.ts:38-51, server/ai/provider.ts:46-65

AI Routing Module

The handleAiRoutes function in server/router/ai-routes.ts is the primary entry point for all intelligence-related features. It manages a wide array of endpoints that interact with large language models to provide repository insights.

Supported Endpoints

Endpoint	Function	Description
`/api/ai/explain`	`explainRepo`	Provides a detailed overview of a repo, file, or node. Supports "ELI5" mode.
`/api/ai/architecture`	`explainArchitecture`	Identifies 4-7 architectural layers (Presentation, API, Core, etc.).
`/api/ai/onboarding`	`generateOnboarding`	Creates a 6-step walkthrough for new developers joining a project.
`/api/ai/learning-path`	`generateLearningPath`	Generates a mental model and recommended file reading order.
`/api/ai/mermaid`	`generateMermaidDiagram`	Produces Mermaid.js code for system flow visualization.
`/api/ai/roast`	`generateRepoRoast`	Generates a sarcastic, witty critique of the codebase.

Sources: server/router/ai-routes.ts:40-145, server/ai/tasks/playground.ts:12-45, server/ai/tasks/explain.ts:10-30

Multi-Provider Orchestration

The routing system is provider-agnostic. Depending on the AI_PROVIDER environment variable or a user-supplied userKey, the router communicates with the AIProvider interface. This allows the backend to switch between Google Gemini, OpenAI, Anthropic, or a local mock provider for testing without changing the route definitions.

Sources: server/ai/provider.ts:25-44, server/router/ai-routes.ts:33-35

Implementation Details

Validation and Error Handling

The router uses a custom AppError class to handle validation failures and service-level exceptions. For example, if required parameters like owner or repo are missing from a query, the router throws a 400 status with an INVALID_PARAMS code.

// server/router/ai-routes.ts:41-45
if (pathname === '/api/ai/explain') {
  const body = await req.json().catch(() => ({}));
  const parsed = aiExplainSchema.safeParse(body);
  if (!parsed.success) {
    throw new AppError(400, 'Invalid request', 'VALIDATION_ERROR', false, parsed.error.flatten());
  }
  // ...
}

Sources: server/router/ai-routes.ts:41-45, server/router/ai-routes.ts:54-57

Mock Data and Fallbacks

To support development without active API keys, the routing logic integrates with a mock system. When the mock provider is active, routes return pre-defined responses for known repositories (like gitsdm or mock-todo-app) instead of querying an LLM.

Sources: server/ai/provider.ts:129-150, server/github/mock-data.ts:20-50

Conclusion

Backend API Routing in gitSdm serves as a structured gateway that translates high-level frontend requests into complex AI and GitHub operations. By utilizing modular routers and schema-based validation, the system maintains a high degree of reliability while remaining flexible enough to support multiple AI backends and diverse repository analysis tasks.

Sources: README.md:120-130, server/router/ai-routes.ts:31-40

Core Features

Interactive Graph Visualization

Relevant source files

The following files were used as context for generating this wiki page:

Interactive Graph Visualization

The Interactive Graph Visualization system in gitSdm is a core feature designed to transform flat repository structures into navigable, multi-dimensional maps. It enables developers to visualize file relationships, dependency chains, and module boundaries through a combination of force-directed physics and hierarchical layout algorithms.

The visualization engine utilizes @xyflow/react (React Flow) for the interactive canvas and d3-force for dynamic node positioning. This allows for real-time filtering, "Blast Radius" impact analysis, and deep-dive inspection of individual code modules. Sources: README.md:144-150, README.md:19-25

System Architecture and Data Flow

The visualization pipeline begins with repository analysis and ends with an interactive React component. Data flows from the server-side analysis (parsing imports and manifests) into a graph structure consisting of nodes (files, folders, repositories) and edges (imports, dependencies).

graph TD
    subgraph Server_Logic ["Server-Side Analysis"]
        A[GitHub Repository] --> B[AST Parser]
        B --> C[Dependency Analyzer]
        C --> D[Graph Builder]
    end

    subgraph Layout_Engine ["Layout & Orchestration"]
        D --> E[Dagre Layout Engine]
        D --> F[D3-Force Engine]
    end

    subgraph Frontend_UI ["Visualization Layer"]
        E --> G[React Flow Canvas]
        F --> G
        G --> H[Interactive Controls]
        G --> I[Inspector / AI Sidebar]
    end

    class A entry;
    class D service;
    class G router;

This diagram illustrates the progression from raw GitHub data to the final interactive UI components used for codebase exploration. Sources: README.md:168-185, server/ai/tasks/diagram.ts:45-55

Key Components

1. Graph Canvas (Primary Visualizer)

The GraphCanvas component serves as the host for the entire visualization workspace. It manages state for filtering, zooming, and export functionality. It integrates the NetworkCanvas (utilizing react-force-graph-2d) and provides a floating UI for legend and control panels. Sources: src/features/graph/canvas/GraphCanvas.tsx:32-60

2. Layout Algorithms

The system supports multiple layout strategies to represent different architectural perspectives:

Force-Directed: Utilizes d3-force for a dynamic, organic clustering of nodes based on connectivity. Sources: README.md:144
Hierarchical (Dagre): Used for structured layouts such as Top-to-Bottom (TB) or Left-to-Right (LR). This is particularly useful for visualizing execution flows and tree structures. Sources: server/graph/layout.test.ts:7-35

3. Architecture View (Mermaid Integration)

Separate from the main interactive graph, the ArchitectureView provides high-level block diagrams. It can generate "Code Graphs" based on static analysis or "AI Enhanced" diagrams that group components into logical subgraphs like "Services", "Controllers", and "Database". Sources: src/components/viz/ArchitectureView.tsx:43-65, server/ai/tasks/diagram.ts:15-35

Interactive Features

Feature	Implementation Detail	Source
Node Focusing	Centers the view on a specific file or folder with a transition duration of 480ms and 1.3x zoom.	src/components/viz/OverviewTab.tsx:23-45
Filtering	Users can toggle node types (file, folder), diff status, and content filters to prune the graph.	src/features/graph/canvas/GraphCanvas.tsx:47-65
Blast Radius	Visualizer showing transitive dependents to predict how changes to one file affect others.	README.md:200, src/features/graph/canvas/GraphCanvas.tsx:55
Exporting	Supports high-resolution exports to PNG, SVG, and PDF.	src/features/graph/canvas/GraphCanvas.tsx:112-118, src/components/viz/ArchitectureView.tsx:102-120

Visual Styling and Classification

Nodes are visually classified to provide immediate context regarding their role in the codebase.

Color Coding: Assigned based on file extension or node type (e.g., #a78bfa for repositories, #fbbf24 for folders, and #3b82f6 for TypeScript files).
Sizing: Node radii vary by type (Repo: 14, Folder: 12, File: 8).
AI Indicators: AI-enhanced views add specific CSS classes such as entry, router, service, and util to nodes for semantic highlighting. Sources: server/graph/layout.test.ts:49-65, server/ai/tasks/diagram.ts:25-30

sequenceDiagram
    participant User
    participant Store as "VizStore"
    participant RF as "ReactFlow Instance"

    User->>Store: Select Node (File/Folder)
    Store->>User: Update Inspector Panel
    User->>RF: Trigger focusOnNode(nodeId)
    RF->>RF: Calculate Node Center (x, y)
    RF->>User: Smooth zoom & Pan to Node

The sequence of focusing on a specific node within the interactive workspace. Sources: src/components/viz/OverviewTab.tsx:23-45

Technical Implementation Details

The graph data structure consists of GraphNode and GraphEdge types. The layout engine calculates positions either on the fly (for force-directed) or via pre-calculated coordinates using Dagre.

// Positioning logic example from Dagre implementation
const laidOut = applyDagreLayout(nodes, edges, 'TB');
expect(laidOut[1].position.y).toBeGreaterThan(laidOut[0].position.y);

Sources: server/graph/layout.test.ts:18-21

In the ArchitectureView, SVG rendering is handled by the Mermaid engine, allowing users to copy the raw Mermaid code or export the generated SVG directly to their clipboard. Sources: src/components/viz/ArchitectureView.tsx:88-100

Interactive Graph Visualization is the primary interface for gitSdm, consolidating static analysis, layout physics, and AI insights into a single interactive canvas for deep repository intelligence.

AI Architecture & File Insights

Relevant source files

The following files were used as context for generating this wiki page:

AI Architecture & File Insights

The AI Architecture within gitSdm is a multi-layered system designed to provide deep, automated insights into software repositories. It leverages Large Language Models (LLMs) to transform raw file structures and dependency data into human-readable architectural summaries, health audits, and onboarding guides. The system is built on a provider-agnostic backend that supports multiple AI engines, including Google Gemini, OpenAI, and Anthropic Claude.

At a high level, the system ingests repository metadata and file trees to construct a rich context for the AI. This context is then processed through specialized task handlers—such as "Refactor," "Explain," and "Onboarding"—which apply specific system prompts and logical constraints to ensure technically accurate and developer-empathetic responses.

Sources: server/ai/provider.ts, server/ai/prompts.ts, README.md

AI Provider Infrastructure

The system employs a factory pattern to manage different AI service providers. The createProvider function detects the appropriate engine based on environment variables or explicit overrides (e.g., API keys starting with sk-ant- for Anthropic or sk- for OpenAI).

Provider Lifecycle and Selection

The backend supports a "Mock" provider for development and fallback scenarios, ensuring the UI remains functional without active API keys.

graph TD
    Start[Get AI Provider] --> CheckOverride{Override Key?}
    CheckOverride -- Yes --> Detect[Detect Provider Type]
    CheckOverride -- No --> EnvCheck{Check ENV Vars}
    
    Detect --> OpenAI[Create OpenAI Provider]
    Detect --> Anthropic[Create Anthropic Provider]
    Detect --> Gemini[Create Gemini Provider]
    
    EnvCheck -- AI_PROVIDER=openai --> OpenAI
    EnvCheck -- AI_PROVIDER=gemini --> Gemini
    EnvCheck -- AI_PROVIDER=anthropic --> Anthropic
    EnvCheck -- None --> Mock[Create Mock Provider]

The diagram shows the logic used to instantiate specific AI service implementations based on configuration.

Sources: server/ai/provider.ts:25-65

Supported AI Engines

Provider	Default Model	Key Env Variable	API Version
Gemini	`gemini-2.5-flash`	`GEMINI_API_KEY`	`v1alpha`
OpenAI	`gpt-4o-mini`	`OPENAI_API_KEY`	N/A
Anthropic	`claude-3-5-haiku-latest`	`ANTHROPIC_API_KEY`	N/A

Sources: server/ai/provider.ts:71-125

Context Building & Prompt Engineering

The accuracy of AI insights depends on the buildRepoContext utility, which flattens the repository's RepoAnalysis data into a structured prompt. This includes metadata, the top 20 root directories, the first 40 dependencies, and up to 120 detected files for "deep context".

Context Components

Repository Meta: Name, description, language, stars, forks, and license.
Activity Metrics: Recent commit counts and top contributors.
Structural Data: Flat file lists and dependency ecosystems (e.g., npm, PyPI).

Sources: server/ai/prompts.ts:3-40

System Principles

The SYSTEM_PROMPT enforces a "Principal Software Architect" persona, requiring the AI to be specific rather than generic, address developer empathy, and strictly avoid fabricating files not present in the provided context.

Sources: server/ai/prompts.ts:42-63

Specialized AI Task Modules

The system divides AI operations into distinct "Tasks," each responsible for a specific type of repository intelligence.

1. Repository Explanation & ELI5

The explainRepo function handles scoped analysis for the entire repository, specific nodes in the graph, or individual files. It supports an "ELI5" (Explain Like I'm 5) mode for simplified onboarding.

sequenceDiagram
    participant UI as Client UI
    participant Route as AI Routes
    participant Task as explain.ts
    participant AI as AI Provider
    
    UI->>Route: POST /api/ai/explain (params)
    Route->>Task: explainRepo(params)
    Task->>AI: complete(SystemPrompt + Context)
    AI-->>Task: Markdown Content
    Task-->>Route: AIExplainResponse
    Route-->>UI: JSON { explanation, cached }

This flow illustrates how the system processes requests for code explanations, moving from the UI to the AI provider and back.

Sources: server/ai/tasks/explain.ts:10-50, server/router/ai-routes.ts:34-42

2. Codebase Health & Refactoring

The system performs "rigorous codebase health assessments" by scoring five dimensions: maintainability, modularity, readability, architecture, and complexity.

Metric	Evaluation Criteria
Maintainability	Ease of change, module size, and config clarity.
Modularity	Directory structure and coupling indicators.
Complexity	Inverse score; higher means lower nesting depth and file count.

Sources: server/ai/tasks/refactor.ts:88-115

3. Onboarding & Learning Paths

The generateLearningPath task creates a structured plan for new developers, including:

Mental Model: Architecture type and core flow tagline.
Recommended Path: Files to read ordered by importance (0-100).
Execution Flow: Step-by-step data transit between files.

Sources: server/ai/tasks/playground.ts:114-160, src/types/api.ts:88-112

Frontend AI Integration

The AiCenterTab serves as the primary interface for AI interactions. It organizes tools into "Core Analysis" (Explain, Health, Risks) and "Creative Tools" (Roast, Readme Enhancer).

UI Component Structure

IntelligenceCard: Displays the AI's markdown response or score visualizations.
RiskCard: Shows specific refactoring risks with associated file badges that allow users to jump directly to the relevant code node.
ToolSection: Categorized buttons that trigger backend AI routes.

Sources: src/components/viz/ai-sidebar/AiCenterTab.tsx:45-120

Score Visualization Logic

The UI dynamically colors scores to provide instant feedback on repository quality:

Emerald (>= 80%): High quality/Maintainable.
Amber (>= 60%): Moderate risk/Technical debt.
Rose (< 60%): Critical issues/Low maintainability.

Sources: src/components/viz/ai-sidebar/AiCenterTab.tsx:32-36

Summary

The AI Architecture & File Insights system provides a comprehensive suite of tools for repository analysis. By combining multi-provider LLM support with specialized prompt engineering and a dedicated UI sidebar, it allows developers to quickly grasp the architecture, health, and entry points of unfamiliar codebases. The modular task-based design ensures that insights—ranging from sarcastic "Roasts" to technical refactoring suggestions—are grounded in the actual file structure and dependencies of the repository.

Semantic Search & Q&A Engine

Relevant source files

The following files were used as context for generating this wiki page:

Semantic Search & Q&A Engine

The Semantic Search & Q&A Engine is a core intelligence component of the gitSdm platform designed to provide developers with instant, context-aware understanding of a codebase. It bypasses traditional keyword matching by using vector-based embeddings to locate code snippets based on meaning and utility, and leverages Large Language Models (LLMs) to answer complex architectural questions with direct citations to the source code.

This system operates in two primary modes: Search Mode, which retrieves relevant code chunks based on semantic similarity, and Ask Mode, which synthesizes an explanatory answer using retrieved code as context. The engine relies on a pre-indexed repository state where code is broken into chunks and stored in a vector space.

Sources: README.md:162-162, src/pages/SearchPage.tsx:123-128

System Architecture & Data Flow

The engine follows a Retrieval-Augmented Generation (RAG) architecture. When a user submits a query, the system first retrieves the most relevant code sections from the indexed repository before passing them to an AI provider for final processing.

Q&A Processing Pipeline

The QAEngine orchestrates the flow from query to answer. It utilizes a searchEngine to fetch the top 5 most relevant code chunks that meet a minimum similarity score threshold. If no relevant chunks are found, it returns a standard "information not available" message.

sequenceDiagram
    participant U as User Interface
    participant API as API Client
    participant QA as QA Engine
    participant SE as Search Engine
    participant AI as AI Provider

    U->>API: POST /api/search/ask
    API->>QA: ask(options)
    QA->>SE: search(query, topK=5)
    SE-->>QA: Relevant Code Chunks
    alt Chunks Found
        QA->>QA: Build Context Prompt
        QA->>AI: complete(systemPrompt, userPrompt)
        AI-->>QA: Generated Answer
        QA-->>API: Answer + Citations
        API-->>U: Render QAAnswerView
    else No Chunks
        QA-->>API: Not Available Message
        API-->>U: Render Empty State
    end

The diagram shows the synchronous flow of data from the initial user request through the search retrieval and AI generation phases. Sources: server/search/qa-engine.ts:13-68

Core Components

Backend QA Engine

The QAEngine is implemented as a singleton service. Its primary responsibility is the transformation of retrieved code chunks into a structured prompt for the AI.

Context Building: It formats source chunks with file paths and line ranges to ensure the LLM has explicit references.
System Prompting: It enforces a strict response structure using Markdown headers: ### Summary, ### How it works, and ### Related files.
Constraint Enforcement: The AI is instructed to answer using ONLY the provided code context and to avoid referencing files not present in the search results.

Sources: server/search/qa-engine.ts:70-85, server/search/qa-engine.ts:87-92

Frontend Search Interface

The SearchPage manages the user interaction and state. It utilizes a mode toggle to switch between semantic search and Q&A.

Caching: Results are cached locally using searchCache and askCache to provide instantaneous responses for repeated queries.
Indexing Awareness: The UI prevents search actions unless the repository has been successfully indexed (indexingStatus.state === 'complete').
Navigation Integration: Users can click on citations or search results to navigate directly to the file in the repository visualizer.

Sources: src/pages/SearchPage.tsx:41-65, src/pages/SearchPage.tsx:94-103

Data Models and API

The system uses specific TypeScript interfaces to ensure type safety across the network boundary between the Express backend and the React frontend.

Search and Q&A Types

Interface	Field	Type	Description
`SearchResultCard`	`snippet`	`string`	The actual code content found.
	`score`	`number`	Similarity score from the vector search.
`QAResponse`	`answer`	`string`	The AI-generated explanation.
	`citations`	`Citation[]`	List of source files used for the answer.
`IndexingStatus`	`state`	`string`	Current status: `idle`, `indexing`, `complete`, or `failed`.

Sources: src/types/api.ts:140-172

API Endpoints

The apiClient provides the following methods for interacting with the Semantic Search & Q&A Engine:

semanticSearch(query, owner, repo, branch): Triggers a POST request to /api/search to find code snippets.
semanticAsk(question, owner, repo, branch): Triggers a POST request to /api/search/ask for architectural Q&A.
triggerIndexing(owner, repo, branch): Initiates the vectorization process for a repository.
fetchIndexingStatus(owner, repo): Polls for the current state of the repository index.

Sources: src/lib/apiClient.ts:182-225

Result Visualization

The QAAnswerView component is responsible for rendering the AI's response. It includes a custom Markdown renderer that handles specialized formatting:

Code Highlighting: Renders inline code and code blocks within the AI's explanation.
Interactive Citations: Renders a list of sources at the bottom of the answer. Each source is a button that, when clicked, triggers the onSelectFile callback to open the file in the project's inspector.
Formatting: Handles bold text, lists, and hierarchical headers as defined in the engine's system prompt.

Sources: src/features/search/QAAnswerView.tsx:11-50, src/features/search/QAAnswerView.tsx:55-88

Conclusion

The Semantic Search & Q&A Engine provides a sophisticated layer of repository intelligence by combining vector search retrieval with LLM synthesis. By enforcing strict context boundaries and structured output formats, it ensures that technical explanations remain grounded in the actual codebase, providing developers with a reliable tool for navigating and understanding complex architectures.

Learning Paths Simulation

Relevant source files

The following files were used as context for generating this wiki page:

Learning Paths Simulation

The Learning Paths Simulation is an AI-driven feature within gitSdm designed to accelerate developer onboarding by transforming complex codebase structures into digestible, step-by-step educational journeys. It synthesizes architectural "mental models," recommends critical files for initial reading, and maps out typical execution flows to provide instant understanding of unfamiliar projects.

This system leverages Large Language Models (LLMs) to analyze repository metadata, file trees, and dependency manifests, generating a structured onboarding intelligence report that is then rendered as an interactive "Guided Codebase Tour" in the frontend visualization workspace. Sources: README.md:28-34, server/ai/tasks/playground.ts:118-125

System Architecture and Data Flow

The Learning Path Simulation operates as a pipeline starting from repository ingestion to AI synthesis and ending at the interactive UI tab.

Backend Orchestration

The core logic resides in the generateLearningPath function, which utilizes the executeAiTask service. This task sends a specialized prompt to the AI provider (Gemini, OpenAI, or Anthropic) containing the repository's context, including the file tree, top contributors, and dependency list.

The AI is instructed to return a JSON object with the following schema:

Mental Model: A high-level description of the architecture type (e.g., Modular Service, Layered MVC).
Recommended Path: A list of 5-8 critical files with importance scores and roles.
Execution Flow: A sequence of steps showing data transfer between components.
Insights: Architecture summaries, detected risks, and contribution suggestions.

Sources: server/ai/tasks/playground.ts:153-194, server/ai/prompts.ts:4-37

Implementation Flow Diagram

The following diagram illustrates the flow from a user submitting a URL to the generation of the simulation data.

graph TD
    User[User Input URL] --> Ingest[Repo Ingestion]
    Ingest --> Analyze[Repository Analysis Engine]
    Analyze --> Context[Context Builder]
    Context --> AI_Task[generateLearningPath Task]
    AI_Task --> LLM[AI Provider Service]
    LLM --> JSON[Structured Learning Data]
    JSON --> UI[LearningPathTab Rendering]
    UI --> Interactive[Guided Codebase Tour]

Sources: server/ai/tasks/playground.ts:228-234, src/components/home/HowItWorks.tsx:4-11

Key Components and Logic

LearningPathResult Data Structure

The backend ensures that the simulation data follows a strict type definition to maintain UI consistency.

Field	Type	Description
`mentalModel`	`Object`	Contains the `type`, `concept`, and `description` of the codebase.
`recommendedPath`	`Array`	List of objects containing `path`, `importance`, `reason`, and `role`.
`executionFlow`	`Object`	Contains `steps` (from/to/description) and `visualSteps` (file path array).
`insights`	`Object`	Contains `architecture` summary, `risks` array, and `suggestions` array.

Sources: server/ai/tasks/playground.ts:145-151

Frontend Interaction Layer

The LearningPathTab component provides the interface for developers to interact with the simulated path. It uses framer-motion for animations and lucide-react for visual cues.

Key features include:

Smart Focus Filters: Allows users to filter the learning path by "API/Routes," "UI/Components," "Core Services," or "Configuration".
Node Synchronization: Clicking a file path in the tour triggers setSelectedNodeId and setFocusedFilePath in the vizStore, centering the main graph canvas on the relevant file.
Contextual Actions: Each step in the path offers an "OPEN" action to view source code and an "EXPLAIN" action to trigger a specific AI analysis of that file.

Sources: src/components/viz/LearningPathTab.tsx:10-30, src/components/viz/LearningPathTab.tsx:125-155

Onboarding Walkthrough Logic

A parallel system, generateOnboarding, provides a 6-step walkthrough that builds progressively from high-level concepts to deployment setup.

sequenceDiagram
    participant D as Developer
    participant UI as LearningPathTab
    participant AI as AI Onboarding Task
    participant S as VizStore

    D->>UI: Select Learning Path
    UI->>AI: generateOnboarding(owner, repo)
    AI-->>UI: Return 6-step JSON
    D->>UI: Click Step 2 (Entry Point)
    UI->>S: setFocusedFilePath(path)
    S-->>D: Graph centers on file node
    D->>UI: Click "Explain"
    UI->>AI: explainRepo(filePath)
    AI-->>D: Show ELI5 / Standard Explanation

Sources: server/ai/tasks/onboarding.ts:52-87, src/components/viz/LearningPathTab.tsx:150-165

Onboarding Steps Progression

The simulation follows a specific logical order to build understanding:

Project Purpose: High-level mental model.
Entry Point: Startup sequence and initial execution.
Routing: Request lifecycle or navigation logic.
Business Logic: Core services or utility layers.
Data Layer: State management or database interactions.
Configuration: Testing, environment setup, or deployment.

Sources: server/ai/tasks/onboarding.ts:70-76

Integration with Repository Analysis

The simulation data is heavily dependent on the preliminary analysis performed by the server. This includes:

File Classification: Identifying files as "entry," "utility," "component," or "config" to help the AI prioritize the reading order.
High Coupling Detection: Files with a high degree of edges (connections) in the graph are often flagged as "risks" or "core logic" in the simulation.
Metadata Synthesis: Topics, primary languages, and contributor counts are used to provide context for the "Mental Model."

Sources: src/components/viz/OverviewTab.tsx:43-52, server/ai/prompts.ts:31-37

Example Recommended Path Configuration

{
  "path": "src/context/TodoContext.tsx",
  "importance": 88,
  "reason": "Core state logic — defines useTodo hook, handles add/remove/toggle operations.",
  "role": "State Context"
}

Sources: server/ai/tasks/playground.ts:205-208

The Learning Paths Simulation effectively bridges the gap between raw code structure and human understanding, providing a roadmap that adapts to the specific architectural patterns of the repository being analyzed. Sources: README.md:12-18

Export & Diagram Generation

Relevant source files

The following files were used as context for generating this wiki page:

Export & Diagram Generation

The Export & Diagram Generation system within gitSdm provides users with the ability to visualize a repository's structural architecture through two distinct modes: a programmatically generated "Code Graph" and an "AI Enhanced" architecture diagram. This module bridges the gap between raw codebase analysis and human-readable documentation by converting repository metadata into interactive Mermaid.js flowcharts that can be exported in multiple formats including PNG, SVG, and raw Mermaid code.

Sources: README.md:129-133, src/components/viz/ArchitectureView.tsx:64-90

System Architecture & Data Flow

The diagram generation process is orchestrated through a specialized React view (ArchitectureView) that manages the state transitions between static analysis and AI-driven insights.

Functional Components

Component	Responsibility
`ArchitectureView`	Main UI container for the visualization canvas and export controls.
`useArchitectureState`	Orchestrates the rendering lifecycle, switching between AI and Code modes.
`useArchitectureExport`	Handles the logic for generating image blobs and interacting with the clipboard.
`mermaid-generator`	Client-side logic for building Mermaid syntax from the local graph analysis.
`diagram.ts` (Server)	Backend task that prompts an LLM to synthesize a high-level architecture overview.

Sources: src/components/viz/ArchitectureView.tsx:1-30, src/components/viz/architecture/hooks/useArchitectureState.ts:8-16, server/ai/tasks/diagram.ts:7-14

Data Flow Diagram

The following diagram illustrates how the system transitions from a user request to a rendered SVG and eventually an exported file.

flowchart TD
    User[User Interface] -->|Toggle Mode| State[useArchitectureState]
    State -->|Mode: Code| Programmatic[Mermaid Generator]
    State -->|Mode: AI| AIService[AI Diagram Task]
    
    Programmatic -->|Raw String| Renderer[Mermaid.js Render]
    AIService -->|LLM Response| Renderer
    
    Renderer -->|SVG String| View[ArchitectureView Canvas]
    
    View -->|Export Request| Export[useArchitectureExport]
    Export -->|html-to-image| PNG[Download PNG]
    Export -->|XMLSerializer| SVG[Download SVG]
    Export -->|Clipboard API| Text[Copy Mermaid Code]

Sources: src/components/viz/architecture/hooks/useArchitectureState.ts:24-60, src/components/viz/architecture/hooks/useArchitectureExport.ts:31-105

Diagram Generation Modes

Programmatic Code Graph

The programmatic generator focuses on structural connectivity. It scores nodes based on their "degree" (incoming and outgoing edges), file classification (e.g., entry points), and whether they are marked as "important files" by the analyzer. This ensures the resulting diagram focuses on the most significant architectural blocks.

Node Scoring: Entry points receive a +10 bonus, and important files receive a +5 bonus to their connectivity score.
Filtering: To maintain readability, the system slices the top 25 scored nodes.
Clustering: Files are grouped into subgraphs based on their directory paths.

Sources: src/components/viz/architecture/mermaid-generator.ts:29-55

AI Enhanced Architecture

The AI mode utilizes the executeAiTask function on the server to prompt an LLM (Gemini, OpenAI, or Anthropic) for a "beautiful, clean, and highly readable" architecture flowchart.

Layout Strategy: Specifically requests a Left-to-Right (graph LR) layout.
Logical Grouping: The AI is instructed to group components into subgraphs like "Entry Points", "Services", and "Database".
Styling: Applies custom CSS classes to Mermaid nodes (e.g., entry, router, service, db) for visual differentiation.

Sources: server/ai/tasks/diagram.ts:16-43

Export & Serialization

The system provides robust export options to facilitate the use of diagrams in external documentation or developer wikis.

Export Implementation Sequence

The useArchitectureExport hook manages the serialization of the DOM elements into various formats.

sequenceDiagram
    participant User as User
    participant Hook as useArchitectureExport
    participant HTI as html-to-image
    participant XMLS as XMLSerializer

    User->>Hook: handleDownloadPng()
    activate Hook
    Hook->>HTI: toPng(svgElement)
    HTI-->>Hook: dataUrl
    Hook->>User: Trigger Browser Download (.png)
    deactivate Hook

    User->>Hook: handleCopySvg()
    activate Hook
    Hook->>XMLS: serializeToString(svgElement)
    XMLS-->>Hook: svgString
    Hook->>User: Write to Clipboard
    deactivate Hook

Sources: src/components/viz/architecture/hooks/useArchitectureExport.ts:79-119

Technical Details of Export Formats

Format	Implementation Detail	Source File
PNG	Uses `html-to-image` with a custom background (`#09090b`) and a pixel ratio of 2 for high resolution.	useArchitectureExport.ts:107-118
SVG	Utilizes `XMLSerializer` to convert the live SVG DOM node into a string blob.	useArchitectureExport.ts:85-103
Mermaid Code	Extracts raw text and uses `stripMermaidFences` to clean the output for clipboard compatibility.	useArchitectureExport.ts:31-59

Rendering & View Management

Diagrams are rendered using the mermaid library. The useArchitectureState hook handles the asynchronous rendering process and implements a cleanup mechanism to prevent memory leaks or overlapping SVG IDs.

Scaling: Rendered SVGs are post-processed to replace fixed width/height attributes with width="100%" and height="100%" to support the pan-and-zoom interface.
Error Handling: If a diagram fails to render (common with complex circular dependencies), the system catches the error and displays a "Failed to layout flowchart" message to the user.
Pan & Zoom: Managed by the useArchitecturePanZoom hook, allowing users to interact with the rendered SVG using mouse wheels and dragging.

Sources: src/components/viz/architecture/hooks/useArchitectureState.ts:40-70, src/components/viz/ArchitectureView.tsx:185-210

Conclusion

The Export & Diagram Generation module is a critical component of gitSdm's mission to provide "instant architecture visualization." By combining deterministic graph analysis with the creative synthesis of AI, it allows developers to generate professional-grade documentation and visual aids directly from their source code.

Smart File Explorer

Relevant source files

The following files were used as context for generating this wiki page:

Smart File Explorer

The Smart File Explorer is a core navigation and analysis system within gitSdm designed to transform raw repository structures into actionable architectural insights. Unlike traditional file browsers, it integrates file classification, dependency analysis, and AI-powered context to help developers understand unfamiliar codebases quickly. It acts as the primary interface for exploring files, identifying entry points, and accessing deep-dive explanations.

The system encompasses the physical directory tree located in src/components/explorer/, the visual graph interaction in src/features/graph/, and the intelligence layer that ranks and explains files. It works by ingestings a repository URL, parsing its manifests, and then annotating the resulting tree with metadata that drives the "Smart" features such as suggested reading and high-coupling detection.

Sources: README.md:16-25, src/components/home/HowItWorks.tsx:5-13, README.md:120-130

Architecture and Data Flow

The Smart File Explorer relies on a multi-stage pipeline that begins with GitHub ingestion and ends with an interactive, annotated UI.

The Analysis Pipeline

Ingestion: The user provides a GitHub URL which is fetched and converted into a flat tree.
Classification: Files are categorized (e.g., entry, test, config) based on their paths and contents.
Graph Construction: Files are transformed into nodes and edges for the React Flow canvas.
UI Enrichment: The explorer uses these classifications to highlight important files and entry points.

graph TD
    A[GitHub URL] --> B[Flat Tree Fetcher]
    B --> C[File Classifier]
    C --> D[Manifest Parser]
    D --> E[Graph Builder]
    E --> F[Smart Explorer UI]
    F --> G[AI Explanation Layer]
    
    subgraph "Classification Logic"
        C1[Entry Points]
        C2[Config Files]
        C3[Test Suites]
    end
    C --- C1
    C --- C2
    C --- C3

This diagram illustrates how raw repository data flows through classification and parsing before being rendered in the Smart Explorer UI. Sources: src/components/home/HowItWorks.tsx:5-12, server/parser/file-classifier.test.ts:7-38

File Classification and Smart Ranking

The "Smart" aspect of the explorer is driven by the file-classifier, which assigns roles to files. These roles determine how files are prioritized in the "Suggested Reading" and "Entry Points" sections.

Key Classifications

Classification	Indicators	Purpose in Explorer
Entry	`index.ts`, `main.go`, `App.tsx`	Identified as start of execution.
Test	`.test.tsx`, `_spec.go`, `__tests__/`	Filtered from primary onboarding flows.
Config	`tsconfig.json`, `vite.config.ts`, `.env`	Grouped as project setup files.
Doc	`README.md`, `LICENSE`	Ranked highly for initial onboarding.
Source	Standard logic files	General exploration nodes.

Sources: server/parser/file-classifier.test.ts:7-38

Ranking Logic

The system uses the findImportantFiles function to rank manifest files (like package.json) and entry points higher than utility or asset files. A score is calculated based on file depth (proximity to root) and type, ensuring that the most impactful files are presented first to the developer. Sources: server/parser/file-classifier.test.ts:60-75, src/components/viz/OverviewTab.tsx:135-155

AI-Augmented Exploration

When a user selects a file or node in the explorer, the system triggers the AI Intelligence layer to provide a high-level technical summary.

Interactive Components

Focusing Logic: When a file is selected in the explorer, the focusOnNode function calculates the node's position on the graph and animates the view to center on it with a specific zoom level (typically 1.3).
AI Explain Selection: Uses the explainRepo task to provide specific headings: "What this does", "Why it matters", and "Where to look next".
ELI5 Mode: A toggleable mode that simplifies complex technical file descriptions for junior developers.

sequenceDiagram
    participant U as User
    participant E as Explorer UI
    participant RF as React Flow Canvas
    participant AI as AI Task Service
    
    U->>E: Click File (e.g. src/App.tsx)
    E->>RF: focusOnNode(id, path)
    RF-->>U: Zoom & Center Node
    E->>AI: explainRepo(scope: 'file', path: 'src/App.tsx')
    AI-->>E: Return Markdown Explanation
    E-->>U: Display "Why it Matters" Section

The sequence above shows the interaction between the file explorer, the visual graph, and the AI backend when a user explores a specific file. Sources: src/components/viz/OverviewTab.tsx:28-45, server/ai/tasks/explain.ts:25-55, src/components/viz/ai-sidebar/AiCenterTab.tsx:220-235

Integrated Insights

The Smart File Explorer doesn't just list files; it provides metadata about the repository's health and coupling.

Architectural Metrics

High Coupling Detection: The explorer calculates the "degree" of files (number of incoming/outgoing edges) to identify "God Objects" or critical modules that many other files depend on.
Commit Density: Integrates commit history to show which files are actively being changed, providing a "heat map" of development activity.
Onboarding Steps: Generates a 6-step walkthrough that references real file paths to guide a new developer through the codebase's execution flow.

Sources: src/components/viz/OverviewTab.tsx:54-65, src/components/viz/OverviewTab.tsx:185-205, server/ai/tasks/onboarding.ts:60-75

Conclusion

The Smart File Explorer serves as the bridge between static code analysis and dynamic developer understanding. By combining automated file classification with AI-driven insights and interactive graph visualization, it allows developers to skip the manual "grep and trace" phase of onboarding. Its ability to surface entry points and high-coupling nodes ensures that exploration is prioritized by architectural significance rather than just alphabetical order.

Data Management & Backend Systems

Language Parsers & Dependency Analysis

Relevant source files

The following files were used as context for generating this wiki page:

Language Parsers & Dependency Analysis

The Language Parsers and Dependency Analysis system is the core intelligence engine of gitSdm. It is responsible for transforming a raw repository file tree into a structured map of interconnected components, third-party dependencies, and categorized modules. This system operates as part of the backend analysis pipeline, taking file names and contents as input to determine the architectural significance of each file within a codebase.

By identifying entry points, resolving manifests across multiple ecosystems (NPM, Go, Python, Rust, etc.), and classifying file roles (e.g., config, test, source), the system enables high-level features such as the interactive dependency graph and AI-powered architecture summaries. Sources: README.md:143-149, src/components/home/HowItWorks.tsx:4-9

The Analysis Pipeline

The system follows a sequential pipeline to process repository data. This flow ensures that every file is evaluated for its specific role and its relationship to the broader project ecosystem.

flowchart TD
    A[Fetch File Tree] --> B[File Classification]
    B --> C[Manifest Parsing]
    C --> D[Dependency Extraction]
    D --> E[Import Resolution]
    E --> F[Graph Construction]
    
    subgraph AnalysisEngine [Analysis Engine]
    B
    C
    D
    E
    end

The analysis pipeline starts with the file structure and metadata, proceeding to identify dependencies and trace connections across all files. Sources: src/components/home/HowItWorks.tsx:4-9

File Classification

Files are categorized into specific classes based on their names and paths. This classification is critical for determining which files are "important" and how they should be represented in the visual graph (e.g., assigning specific colors or rankings).

Key File Classes

Class	Description	Examples
Entry	Primary execution starting points.	`src/index.ts`, `main.go`, `App.tsx`
Test	Files containing unit or integration tests.	`button.test.tsx`, `user_spec.go`, `__tests__/`
Config	Project settings and environment variables.	`tsconfig.json`, `vite.config.ts`, `.env`
Doc	Documentation and license information.	`README.md`, `LICENSE`
Asset	Static files like images and icons.	`logo.png`, `favicon.ico`
Source	General implementation and logic files.	`helpers.ts`, `db/conn.go`

Sources: server/parser/file-classifier.test.ts:7-40

Importance Ranking

The system uses the classification to rank files. For instance, package.json receives a high importance score because it is a manifest, while files in the root or with shallow directory depth are prioritized over deeply nested files. Sources: server/parser/file-classifier.test.ts:63-75

Manifest Parser Registry

The project employs a modular parserRegistry to handle various ecosystem manifest files. Each parser is registered with a unique name and a file pattern (string or RegExp) to match relevant files in a repository.

Supported Ecosystems

The registry supports a wide range of modern programming languages and deployment tools:

Node.js/NPM: package.json (Prod, Dev, and Peer dependencies)
Go: go.mod (Require blocks)
Python: requirements.txt and pyproject.toml
Rust: Cargo.toml
Java: pom.xml
Infrastructure: Dockerfile (Base images)

Sources: server/parser/manifest-parsers/index.test.ts:14-118, server/parser/manifest-parsers/registry.ts

sequenceDiagram
    participant Analyzer as Dependency Analyzer
    participant Registry as Parser Registry
    participant Parser as Manifest Parser
    
    Analyzer->>Registry: getParserForFile(filename)
    Registry-->>Analyzer: parserInstance
    Analyzer->>Parser: parse(content)
    Parser-->>Analyzer: Array of Dependency objects

The interaction between the analyzer and the registry allows for dynamic selection of the correct parsing logic based on the file extension or specific manifest name. Sources: server/parser/manifest-parsers/index.test.ts:125-135, server/parser/manifest-parsers/registry.ts

Dependency Analysis & Deduplication

The dependency-analyzer aggregates findings from all detected manifest files. It is capable of handling monorepos or projects with multiple sub-projects by merging dependencies found in different locations (e.g., a root package.json and a subproject/package.json).

Data Consolidation

Combination: Identifies dependencies across different ecosystems (e.g., finding both lodash from NPM and a Gin-gonic module from Go in the same repo).
Deduplication: Identifies identical dependencies to prevent redundant nodes in the visualization. If lodash version ^4.17.21 is found in two separate package.json files, it is treated as a single dependency entry.

Sources: server/parser/dependency-analyzer.test.ts:6-33

Dependency Data Structure

Each extracted dependency is stored with specific metadata:

{
  "name": "react",
  "version": "^19.0.0",
  "type": "prod",
  "ecosystem": "npm"
}

Sources: server/parser/manifest-parsers/index.test.ts:16-19

Conclusion

The Language Parsers & Dependency Analysis module provides the foundational data for gitSdm's architectural mapping. By combining broad ecosystem support via a extensible parser registry with intelligent file classification and importance ranking, it ensures that developers receive an accurate and prioritized view of any unfamiliar codebase. This structured data allows the system to identify "Hot Paths" and "High Coupling" within a repository, facilitating faster onboarding and more informed refactoring decisions. Sources: server/github/mock-data.ts:167-175, server/parser/file-classifier.test.ts:63-75

Graph Building Algorithms

Relevant source files

The following files were used as context for generating this wiki page:

Graph Building Algorithms

The Graph Building Algorithms in gitSdm are responsible for transforming a flat or nested repository file tree into a multi-dimensional interactive graph. This system orchestrates file classification, relationship extraction (such as imports and dependencies), and spatial positioning to provide a visual mental model of software architecture.

The system leverages both deterministic static analysis—using engines like Dagre and D3-force—and heuristic AI-driven modeling to generate architectural diagrams. This dual approach allows for precise mapping of individual file relationships while providing high-level conceptual overviews of module boundaries.

Graph Construction Logic

The core graph construction process follows a pipeline that starts with repository metadata and file tree ingestion. It generates a collection of nodes and edges that represent the physical and logical structure of the codebase.

Node Generation and Classification

Nodes are the primary entities in the graph, representing repositories, folders, and files. Every graph starts with a root repo node, followed by hierarchical folder and file nodes derived from the GitHub tree. Sources: server/graph/graph-builder.test.ts:4-22, src/components/viz/OverviewTab.tsx:44-48

Edge Resolution

Relationships between nodes are established through two primary mechanisms:

Hierarchy Edges: Represent the directory structure (e.g., Folder A contains File B).
Import Edges: Created by parsing fileContents to identify static import statements. The resolver maps local import paths to existing file nodes within the graph. Sources: server/graph/graph-builder.test.ts:40-66, README.md:144-148

flowchart TD
    Input[File Tree & Manifests] --> Build[buildGraph Engine]
    Build --> Nodes[Generate Nodes: Repo, Dir, File]
    Build --> Edges[Resolve Edges]
    Edges --> Hierarchy[Parent-Child Links]
    Edges --> Imports[Static Import Resolution]
    Nodes --> Styling[Apply Colors & Sizes]
    Styling --> Output[Graph Data Structure]

The diagram shows the sequential flow from raw data ingestion to the final graph data structure.

Layout and Spatial Positioning

Once the graph structure is defined, layout algorithms determine the x and y coordinates for every node. The system supports multiple layout strategies to accommodate different architectural views.

Dagre Hierarchical Layout

The Dagre engine is used to create structured, directed layouts. It is particularly effective for visualizing dependency chains and request lifecycles.

Top-to-Bottom (TB): Positions source nodes at the top and dependents below.
Left-to-Right (LR): Positions entry points on the left, flowing towards utilities and databases on the right. Sources: server/graph/layout.test.ts:6-37, README.md:145-147

Node Visual Mapping

Each node is assigned visual properties based on its type or file extension to aid in rapid identification. Sources: server/graph/layout.test.ts:47-65

Node Type / Extension	Visual Color	Size (Radius)
Repository	`#a78bfa` (Violet)	14
Folder	`#fbbf24` (Amber)	12
.ts / .tsx	`#3b82f6` (Blue)	8
.js / .jsx	`#facc15` (Yellow)	8
Unknown File	`#9ca3af` (Grey)	8

Mermaid Diagram Generation

gitSdm provides two distinct algorithms for generating Mermaid.js flowcharts: Programmatic Scoring and AI Synthesis.

Programmatic Scoring Algorithm

The generateProgrammaticMermaid function uses a connectivity-based scoring system to select the most relevant nodes for a compact diagram (limited to the top 25 nodes).

Connectivity Score: Sum of incoming and outgoing edges.
Class Bonus: Nodes identified as entry receive a +10 score boost; importantFiles receive +5.
Grouping: Nodes are clustered into subgraph blocks based on their directory paths. Sources: src/components/viz/architecture/mermaid-generator.ts:25-58

AI-Driven Architecture Synthesis

The AI generator utilizes a large language model (LLM) to identify conceptual "subgraphs" and "classes" that static analysis might miss. It groups files into logical blocks like "Controllers", "Services", and "Database". Sources: server/ai/tasks/diagram.ts:16-43

sequenceDiagram
    participant UI as Architecture View
    participant GS as Mermaid Generator
    participant AI as AI Task Handler
    UI->>GS: Request Enhanced Diagram
    GS->>AI: executeAiTask (mermaid)
    Note over AI: Analyzes Repository Analysis Context
    AI-->>GS: Mermaid Code Block (graph LR)
    GS->>GS: Sanitize IDs & Clean Scripts
    GS-->>UI: Rendered SVG Flowchart

The sequence diagram illustrates the request flow for generating an AI-enhanced architecture diagram.

Repository Intelligence and Metrics

The graph algorithms also calculate metrics used for the "Overview" and "Health" dashboards.

Coupling Analysis

The system calculates the "degree" of each node by summing its source and target edges. This is used to identify High Coupling points—files or modules that have a high number of dependencies or dependents, indicating potential architectural bottlenecks. Sources: src/components/viz/OverviewTab.tsx:50-59

File Classification

During graph building, files are tagged with specific classes used for both styling and AI prompting:

entry: Main application entry points (e.g., main.tsx, index.ts).
router: API handlers or UI route definitions.
service: Core business logic.
util: Helper functions and parsers. Sources: server/ai/tasks/diagram.ts:22-28, src/components/viz/architecture/mermaid-generator.ts:92-96

Summary

The Graph Building Algorithms in gitSdm provide the technical foundation for repository visualization. By combining deterministic hierarchy extraction with import resolution and sophisticated layout engines like Dagre, the system creates a spatial representation of code. This is further enhanced by scoring algorithms and AI synthesis that distill complex trees into readable architecture diagrams, highlighting high-coupling risks and core execution flows. Sources: README.md:144-150, server/graph/graph-builder.test.ts:4-66

GitHub API Integration

Relevant source files

The following files were used as context for generating this wiki page:

GitHub API Integration

The GitHub API Integration serves as the primary data ingestion layer for the gitSdm platform. Its purpose is to interface with GitHub's REST and Git Data APIs to retrieve repository metadata, file structures, manifest contents, and activity metrics. This system enables the application to transform a standard GitHub URL into a structured dataset for visualization and AI analysis.

The integration is architected to handle both authenticated and unauthenticated requests, incorporating a mock data layer for development and testing environments. It acts as the foundation for downstream features such as the Interactive Visualization and AI-powered insights by providing the raw source code and structural data required for parsing.

Sources: README.md:143-149, server/services/analyze-repo.ts:18-50

Core Analysis Pipeline

The integration follows a linear pipeline to move from a user-provided URL to a fully analyzed repository object. The process begins with URL parsing, followed by metadata retrieval, and culminates in a deep tree traversal to build the project's file hierarchy.

Pipeline Flow

The following diagram illustrates the sequence of operations performed during a repository analysis request:

flowchart TD
    URL[GitHub URL Input] --> Parse[Parse URL: Owner/Repo/Branch]
    Parse --> Info[Fetch Repo Metadata & SHA]
    Info --> Tree[Fetch Flat File Tree]
    Tree --> Filter[Filter Manifests & Source Files]
    Filter --> Content[Fetch File Contents]
    Content --> Final[Aggregate Analysis Object]

    style URL fill:#238636,stroke:#fff
    style Final fill:#1f6feb,stroke:#fff

The analysis pipeline coordinates multiple asynchronous calls to ensure all necessary data (tree structure, contributors, and timeline) is available before passing the data to the graph builder and dependency analyzer. Sources: src/components/home/HowItWorks.tsx:4-11, server/services/analyze-repo.ts:32-60

GitHub Data Services

The project utilizes the Octokit library to interact with GitHub. The services are divided into metadata retrieval, tree fetching, and content extraction.

Repository Metadata and Tree Fetching

Metadata such as star counts, descriptions, and default branches are fetched via the repos.get endpoint. For structural analysis, the system uses git.getTree with the recursive parameter to obtain a flat list of all objects in the repository.

Function	Endpoint/Action	Purpose
`fetchRepoInfo`	`repos.get`	Retrieves `fullName`, `stars`, `forks`, and `license`.
`fetchFlatTree`	`git.getTree`	Fetches all file paths and SHAs recursively.
`fetchFileContents`	`repos.getContent`	Downloads raw content for specific files (e.g., `package.json`).
`fetchTimeline`	`repos.listCommits`	Aggregates commit activity over recent weeks.

Sources: server/github/fetch-tree.ts, server/services/analyze-repo.ts:34-40

Authentication and Rate Limiting

The system supports Personal Access Tokens (PAT) to increase API rate limits. Tokens are passed from the client via the X-GitHub-Token header and utilized by the server-side Octokit instance.

// Client-side header injection
function getGitHubTokenHeader(): Record<string, string> {
  try {
    const token = localStorage.getItem('gitsdm_github_pat');
    return token ? { 'X-GitHub-Token': token } : {};
  } catch {
    return {};
  }
}

Sources: src/lib/apiClient.ts:21-27, README.md:104-106

Development & Mocking

To facilitate development without hitting GitHub API limits, the integration includes a robust mocking layer. If the repository owner is identified as mock, the system redirects all calls to a local database of predefined repository structures.

sequenceDiagram
    participant S as Service Layer
    participant F as fetch-tree.ts
    participant M as mock-data.ts
    participant G as GitHub API

    S->>F: fetchRepoInfo("mock", "gitsdm")
    F->>F: isMockRepo("mock")?
    alt is Mock
        F->>M: fetchMockRepoInfo()
        M-->>F: Return static JSON
    else is Real
        F->>G: octokit.repos.get()
        G-->>F: Return GitHub Response
    end
    F-->>S: Return Unified RepoInfo

Sources: server/github/mock-data.ts:4-6, server/github/fetch-tree.test.ts:110-120

URL Parsing Logic

The parseGitHubUrl utility is responsible for decomposing various GitHub URL formats into their constituent parts: owner, repo, and branch. It supports standard browser URLs, deep-linked file paths, and shorthand owner/repo strings.

Input Type	Example	Extracted Owner	Extracted Repo
Full URL	`https://github.com/mbayue/gitSdm%60	`mbayue`	`gitSdm`
Branch URL	`https://github.com/owner/repo/tree/dev%60	`owner`	`repo` (branch: `dev`)
Shorthand	`owner/repo`	`owner`	`repo`

Sources: server/github/parse-url.ts, server/services/analyze-repo.ts:20-25

Conclusion

The GitHub API Integration provides the essential data bridge between raw GitHub repositories and the gitSdm intelligence engine. By abstracting the complexities of tree traversal, authentication, and rate-limiting through a unified service layer, the platform ensures consistent data availability for both its visualization canvas and AI diagnostic tools. Sources: README.md:143-155, server/services/analyze-repo.ts:80-100

Caching Layer (LRU)

Relevant source files

The following files were used as context for generating this wiki page:

Caching Layer (LRU)

The Caching Layer in gitSdm is a centralized in-memory storage system designed to optimize performance by reducing redundant API calls to GitHub and AI providers. It utilizes the Least Recently Used (LRU) eviction strategy to manage memory efficiently, ensuring that frequently accessed data—such as repository analyses, AI-generated explanations, and search results—is readily available while older, less relevant data is purged.

This layer serves as a critical performance bridge between the Repository Analysis Service and external data sources. By caching expensive operations like full repository scans and semantic search results, the system significantly improves response times for end-users and helps mitigate rate-limiting issues from external providers.

Sources: server/cache/lru.ts:1-72, README.md: Architecture Section

Core Architecture and Buckets

The caching system is partitioned into specific "buckets," each managed by an independent LRUCache instance. This separation allows for granular control over Time-To-Live (TTL) values and maximum entry limits based on the specific data type.

Cache Bucket Configuration

Bucket Name	Target Data Type	Max Size	TTL (Time-To-Live)
`analyzeCache`	Full repository analysis results	200 entries	60 Minutes
`aiCache`	AI task results (summaries, roasts, etc.)	200 entries	30 Minutes
`searchCache`	Semantic search and QA results	500 entries	60 Minutes
`indexCache`	Vector store indices and metadata	50 entries	120 Minutes

Sources: server/cache/lru.ts:10-33

Data Access Flow

The cache object provides a unified CacheStore interface (get, set, has, delete). Internally, the getBucket utility function routes requests to the appropriate LRUCache instance by inspecting the string prefix of the cache key.

flowchart TD
    Req[Cache Request] --> Prefix{Key Prefix?}
    Prefix -- "ai:" --> AI[aiCache]
    Prefix -- "search:" --> SEARCH[searchCache]
    Prefix -- "index:" --> INDEX[indexCache]
    Prefix -- default --> ANALYZE[analyzeCache]
    
    AI --> Op[Execute Get/Set/Delete]
    SEARCH --> Op
    INDEX --> Op
    ANALYZE --> Op

Sources: server/cache/lru.ts:35-58

Key Generation and Hashing

Cache keys are constructed deterministically to ensure that identical requests map to the same cached value. The system includes specialized functions for generating keys for different domains.

Deterministic Key Structures

Analysis Keys: Combines owner, repository name, commit SHA, and optional branch name.
AI Keys: Combines the task kind, repository identifiers, and a unique contextHash.
Context Hashing: The hashContext function generates a base-36 string hash from input strings (like query parameters or code snippets) to ensure key length remains manageable and consistent.

Sources: server/cache/lru.ts:68-96, server/search/constants.test.ts:98-120

Key Generation Logic

// server/cache/lru.ts:68-80
export function analyzeCacheKey(owner: string, repo: string, sha: string, branch?: string): string {
  return branch
    ? `analyze:${owner}/${repo}@${sha}:${branch}`
    : `analyze:${owner}/${repo}@${sha}`;
}

export function aiCacheKey(
  kind: string,
  owner: string,
  repo: string,
  sha: string,
  contextHash: string,
  discriminator?: string,
): string {
  return discriminator
    ? `ai:${kind}:${owner}/${repo}@${sha}:${contextHash}:${discriminator}`
    : `ai:${kind}:${owner}/${repo}@${sha}:${contextHash}`;
}

Cache Invalidation and Management

The system supports both global and targeted cache invalidation.

Global Clear: The clearAllCaches() function resets all four buckets simultaneously.
Targeted Search Invalidation: The invalidateSearchCache(owner, repo) function iterates through the searchCache keys and removes entries that match the specific owner/repo prefix. This is used when a repository is re-indexed or updated.
TTL Expiry: Each bucket automatically handles entry expiration based on the configured milliseconds.

Sources: server/cache/lru.ts:60-70, server/cache/lru.test.ts:33-60

Integration in Service Layer

The caching layer is deeply integrated into the analyzeRepository service. The service checks for a cached result using a generated key before proceeding with expensive operations like fetching trees or parsing dependencies.

sequenceDiagram
    participant S as analyze-repo.ts
    participant C as lru.ts
    participant G as GitHub API
    
    S->>C: analyzeCacheKey(owner, repo, sha)
    C-->>S: return cacheKey
    S->>C: cache.get(cacheKey)
    alt Cache Hit
        C-->>S: return RepoAnalysis
    else Cache Miss
        S->>G: Fetch Flat Tree & Manifests
        G-->>S: return Data
        S->>S: buildGraph & analyzeDependencies
        S->>C: cache.set(cacheKey, analysis)
    end

Sources: server/services/analyze-repo.ts:28-36, server/ai/tasks/explain.ts:21-25

Technical Limitations

As noted in the project documentation and roast mock-ups, the current LRU implementation resides in-memory. In serverless environments (like Vercel functions), this cache is subject to resets during "cold starts," meaning cache persistence is limited to the lifecycle of the active server instance.

Sources: README.md: 🧩 Core Features, server/ai/tasks/playground.ts:33-35

Vector Store & Embeddings

Relevant source files

The following files were used as context for generating this wiki page:

Vector Store & Embeddings

The Vector Store & Embeddings system provides the foundation for gitSdm's semantic search and AI-powered Question & Answering (QA) capabilities. This system enables the platform to perform context-aware searches across a repository's codebase by transforming raw source code into high-dimensional numerical vectors.

The primary purpose of this module is to support the "AI-powered semantic search & Q&A" feature, allowing users to locate entry points and ask technical questions about the project structure through a natural language interface. It integrates closely with the AI Provider layer to utilize Large Language Models (LLMs) for both generating embeddings and synthesizing answers based on retrieved code context.

Sources: README.md:162, server/search/types.ts:101-125

System Architecture

The search system is built on a decoupled architecture consisting of four main interfaces: Chunker, Embedding Provider, Vector Store, and QA/Search Engines. This modularity allows the project to swap out AI providers (such as Gemini, OpenAI, or Anthropic) while maintaining a consistent internal data flow for indexing and retrieval.

Core Components

classDiagram
    class Chunker {
        +chunkFile(content, filePath, language) Chunk[]
    }
    class EmbeddingProvider {
        +embed(text) EmbeddingResult
        +embedBatch(texts) EmbeddingResult[]
        +dimensions int
    }
    class VectorStore {
        +addChunks(chunks) void
        +search(queryVector, repoKey, topK, minScore) SearchResult[]
        +removeByRepo(repoKey) void
    }
    class SearchEngine {
        +search(options) SearchResponse
    }
    class QAEngine {
        +ask(options) QAResponse
    }

    QAEngine ..> SearchEngine : utilizes
    SearchEngine ..> VectorStore : queries
    SearchEngine ..> EmbeddingProvider : vectorizes query

The diagram shows the relationship between core search interfaces and the hierarchical flow from user query to vector retrieval.

Sources: server/search/types.ts:5-131

Indexing Pipeline

The indexing process transforms a repository's file tree into a searchable vector index. This involves traversing the repository, breaking files into manageable chunks, and generating embeddings for each chunk.

Data Flow for Repository Indexing

File Chunking: The Chunker processes file content into Chunk objects, retaining metadata like start/end lines and programming language.
Vectorization: The EmbeddingProvider converts text chunks into Float32Array vectors (typically normalized to unit length).
Storage: IndexedChunk objects containing the vector and original metadata are added to the VectorStore.

flowchart TD
    A[Source Code File] --> B[Chunker]
    B --> C[Text Chunks]
    C --> D[Embedding Provider]
    D --> E[Vector Generation]
    E --> F[Vector Store]
    F --> G[(Indexed Repository)]

This flowchart illustrates the transformation of source code into a searchable vector index.

Sources: server/search/types.ts:5-84, server/search/vector-store.test.ts:16-30

Indexing Status States

The system tracks the lifecycle of an indexing operation through a defined state machine.

State	Description
`idle`	No indexing operation currently active for the repository.
`indexing`	Progressing through files; includes `progress`, `filesProcessed`, and `totalFiles`.
`complete`	Indexing finished; includes `chunkCount` and completion `timestamp`.
`failed`	Operation halted due to error; includes `error` details and `failedFiles` count.

Sources: server/search/types.ts:90-103

Vector Store Operations

The VectorStore serves as the retrieval engine. It is designed to isolate chunks by repoKey (formatted as "owner/repo") to ensure search results are scoped to specific projects.

Similarity Search

Retrieval is performed using cosine similarity. The store calculates the distance between a queryVector and the indexed chunks within a specific repository.

Scoring: Results are returned with a score between 0.0 and 1.0, representing the cosine similarity.
Filtering: Users can specify a minScore threshold to filter out low-relevance results.
Ranking: Results are sorted in descending order by similarity score.

Sources: server/search/vector-store.test.ts:98-125, server/search/types.ts:56-61

Data Models

Chunk Metadata The system stores rich context alongside the vector to allow for precise citations and code snippet rendering in the UI.

Field	Type	Description
`filePath`	`string`	The path of the source file.
`startLine`	`number`	The starting line of the chunk within the file.
`endLine`	`number`	The ending line of the chunk.
`content`	`string`	The raw code content for UI display.
`repoKey`	`string`	Unique identifier (owner/repo).
`commitSha`	`string`	The SHA of the commit when indexed.

Sources: server/search/types.ts:31-40

QA Engine and Semantic Search

The QAEngine and SearchEngine provide the higher-level logic for interacting with the vector data.

Semantic Search: Converts a natural language query into a vector and retrieves the topK most relevant code chunks from the store.
QA Answer Synthesis: Uses the QAEngine to take a user's question, retrieve relevant chunks as context, and generate a markdown-formatted answer via the AI Provider.
Citations: The QAResponse includes a Citation array (file paths and line ranges) to link the AI's answer back to the actual source code.

Sources: server/search/types.ts:107-131, server/ai/tasks/explain.ts:122-135

AI Provider Integration

The embedding generation and answer synthesis rely on the AIProvider interface. The project supports multiple backends which can be configured via environment variables.

Provider	Model Default	Capability
Gemini	`gemini-2.5-flash`	Content generation and embedding logic.
OpenAI	`gpt-4o-mini`	Standard Chat Completion and JSON mode.
Anthropic	`claude-3-5-haiku-latest`	High-quality analysis and synthesis.
Mock	N/A	Local development and testing without API keys.

Sources: server/ai/provider.ts:40-75, README.md:95-103

Summary

The Vector Store & Embeddings module enables gitSdm to move beyond basic file exploration into deep code intelligence. By chunking, vectorizing, and indexing repositories using advanced LLM providers, the system allows developers to perform semantic searches and receive context-aware answers to complex architectural questions. The modular design ensures that as vector database technologies or embedding models evolve, the core repository analysis pipeline remains stable and extensible.

Sources: README.md:14-18, server/search/types.ts:1-131

Global State Management

Relevant source files

The following files were used as context for generating this wiki page:

Global State Management

Global state management in gitSdm is primarily handled through Zustand, which acts as a centralized store for managing the interactive dependency graph, UI panel states, and AI-driven analysis results. The architecture focuses on decoupling the heavy computational layout tasks from the UI, ensuring high-performance interactions within the React Flow canvas and the multi-tabbed AI sidebar.

Sources: README.md:92, README.md:164, server/ai/tasks/playground.ts:258-260

State Management Architecture

The system utilizes a central store, vizStore, to coordinate between the repository analysis engine and the frontend presentation layers. This store manages graph filters, node selections, and the visibility of inspection panels. By centralizing these states, gitSdm maintains synchronization across disparate UI components such as the OverviewTab, AiCenterTab, and the GraphCanvas.

Key Store Responsibilities

State Category	Description	Primary File/Component
Graph Interaction	Selection of nodes, focusing file paths, and branch comparison states.	`src/components/viz/OverviewTab.tsx`
AI Context	Tracking ELI5 mode, active playground tools, and current architectural explanations.	`src/components/viz/ai-sidebar/AiCenterTab.tsx`
UI Layout	Managing inspector visibility and sidebar tab navigation.	`src/components/viz/ai-sidebar/AiCenterTab.tsx`

Sources: src/components/viz/OverviewTab.tsx:28-32, src/components/viz/ai-sidebar/AiCenterTab.tsx:90-112

Data Flow and Synchronization

Data flows from the backend AI task handlers (such as explain, refactor, and playground) into the global state, which then hydrates the UI. The state management layer handles the transition between raw repository analysis and the interactive visualization.

flowchart TD
    API[Backend API Tasks] -->|AI JSON/Markdown| Store[Zustand vizStore]
    Store -->|selectedNodeId| RF[React Flow Canvas]
    Store -->|eli5Mode| AI[AiCenterTab]
    Store -->|inspectorOpen| UI[Inspector Panel]
    RF -->|onNodeClick| Store
    AI -->|toggleEli5| Store

The diagram shows how the vizStore acts as a central hub between backend AI task outputs and frontend interactive components. Sources: src/components/viz/ai-sidebar/AiCenterTab.tsx:90-112, server/ai/tasks/explain.ts:25-30

Node Selection and Navigation Logic

When a user interacts with the graph or the file list, the state is updated globally to trigger side effects such as camera centering in the React Flow viewport and updating the AI context for the AiCenterTab.

// Example of global state interaction for node focusing
const focusOnNode = useCallback((nodeId: string, filePath?: string | null) => {
  useVizStore.getState().setSelectedNodeId(nodeId);
  if (filePath !== undefined) {
    useVizStore.getState().setFocusedFilePath(filePath);
  }
  // React Flow centering logic follows...
}, [setCenter, getNode]);

Sources: src/components/viz/OverviewTab.tsx:28-33

AI & Playground State

The AiCenterTab manages a complex sub-state specifically for AI interactions. This includes toggleable modes like ELI5 (Explain Like I'm 5) and specific playground tools like the Repo Roast or README Enhancer. These states are often cached or managed via hooks like useAiCenterState to prevent redundant AI provider calls.

AI State Components

eli5Mode: A boolean flag that modifies the userPrompt sent to AI providers to request simplified explanations.
activePlayground: Tracks which creative tool (roast, readme) is currently active.
pendingToolRequests: A set of active request keys used to prevent duplicate concurrent AI tasks.

Sources: src/components/viz/ai-sidebar/AiCenterTab.tsx:94-112, server/ai/tasks/explain.ts:47-49, server/ai/tasks/playground.ts:27-30

Branch Comparison State

gitSdm supports visual branch comparison, which is reflected in the global state through graphDiff. This state tracks sets of added, modified, and deleted node IDs. The OverviewTab consumes this state to render filtered file lists and stat summaries.

sequenceDiagram
    participant User
    participant Store as vizStore
    participant Tab as OverviewTab
    User->>Tab: Select Branch to Compare
    Tab->>Store: setCompareBranch(true)
    Store-->>Tab: Provide graphDiff (Set IDs)
    Tab->>Tab: Filter analysis.graph.nodes
    Tab-->>User: Render Added/Modified/Deleted lists

Sequence of events during a branch comparison operation managed via global state. Sources: src/components/viz/OverviewTab.tsx:50-55, src/components/viz/OverviewTab.tsx:78-100

Implementation Details

The following table summarizes the key state-related functions and their locations:

Function / Hook	Responsibility	File Path
`useVizStore`	Main global state hook for graph and UI control.	`src/stores/vizStore.ts`
`useAiCenterState`	Manages transient state for AI sidebar tabs and caching.	`src/components/viz/ai-sidebar/hooks/useAiCenterState.ts`
`setSelectedNodeId`	Updates the globally active node for inspector and AI context.	`src/components/viz/OverviewTab.tsx`
`toggleEli5Mode`	Switches the AI instruction set between Technical and ELI5.	`src/components/viz/ai-sidebar/AiCenterTab.tsx`

Sources: README.md:92, src/components/viz/OverviewTab.tsx:29, src/components/viz/ai-sidebar/AiCenterTab.tsx:94-100

Conclusion

Global state management in gitSdm is designed to facilitate a "graph-first" experience. By utilizing Zustand for core architectural states and specialized hooks for AI task management, the system ensures that user interactions on the visual canvas are immediately reflected in the analytical sidebars, providing a cohesive codebase exploration environment.

Model Integration

AI Provider Integration

Relevant source files

The following files were used as context for generating this wiki page:

AI Provider Integration

The AI Provider Integration system in gitSdm serves as a multi-model abstraction layer that enables the platform to generate repository insights, architectural summaries, and code explanations. By supporting various Large Language Model (LLM) providers, the system ensures flexibility and reliability, allowing the application to fallback to a mock provider for development or if no API keys are configured.

Sources: server/ai/provider.ts:1-7, README.md:144-149

Provider Architecture

The integration follows a factory pattern, where a central provider factory instantiates specific implementations based on environment variables or user-supplied API keys. All providers implement a unified AIProvider interface, ensuring that the rest of the application remains agnostic of the specific LLM being used.

AIProvider Interface

The core of the system is the AIProvider interface, which defines a single complete method for handling asynchronous message exchanges.

export interface AIProvider {
  complete(messages: Message[], options?: { json?: boolean }): Promise<string>;
}

Sources: server/ai/provider.ts:6-10

Provider Data Flow

The following diagram illustrates the lifecycle of an AI request, from the frontend API client through the server router to the specific AI provider.

flowchart TD
    Client[apiClient.ts] -->|POST /api/ai/*| Router[ai-routes.ts]
    Router -->|Calls Task Handler| Task[tasks/explain.ts]
    Task -->|executeAiTask| Service[service.ts]
    Service -->|getAIProvider| ProviderFactory[provider.ts]
    ProviderFactory -->|Returns| ProviderInstance[AIProvider Instance]
    ProviderInstance -->|API Request| LLM[Gemini / OpenAI / Anthropic]
    LLM -->|Response| Client

This flow ensures that authentication, caching, and task-specific logic are handled before interacting with the LLM. Sources: server/router/ai-routes.ts:38-42, server/ai/service.ts, src/lib/apiClient.ts:98-103

Supported Providers

The system identifies and initializes providers using three primary methods: explicit environment variable configuration (AI_PROVIDER), API key pattern detection, or a manual key override passed from the client.

Provider	Detection Pattern	Default Model	Env Variables
Google Gemini	Default / `gemini`	`gemini-2.5-flash`	`GEMINI_API_KEY`, `GEMINI_MODEL`
OpenAI	Starts with `sk-`	`gpt-4o-mini`	`OPENAI_API_KEY`, `OPENAI_MODEL`
Anthropic	Starts with `sk-ant-`	`claude-3-5-haiku-latest`	`ANTHROPIC_API_KEY`, `ANTHROPIC_MODEL`
Mock	`mock`	N/A	`AI_PROVIDER=mock`

Sources: server/ai/provider.ts:12-23, server/ai/provider.ts:70-82, README.md:95-103

Provider Initialization Logic

The getAIProvider function manages singleton instances of providers to avoid redundant creations, except when a user provides a specific overrideKey.

flowchart TD
    Start([getAIProvider]) --> HasOverride{overrideKey?}
    HasOverride -->|Yes| Fresh[Create Fresh Provider]
    HasOverride -->|No| CheckCache{Instance Cached?}
    CheckCache -->|Yes| Match{Key Matches Env?}
    Match -->|Yes| Return[Return Cached Instance]
    Match -->|No| Create[Create New Instance]
    CheckCache -->|No| Create
    Create --> Return

Sources: server/ai/provider.ts:241-255

Context Building & Prompting

To provide accurate architectural insights, the integration uses a specialized utility to convert raw repository analysis data into a structured context string for the LLM.

Repository Context Construction

The buildRepoContext function aggregates metadata, directory structures, important files, and recent activity into a formatted prompt section. This ensures the LLM has a "mental map" of the project before answering specific queries. Sources: server/ai/prompts.ts:3-36

System Prompts

The SYSTEM_PROMPT defines the AI's persona as a "principal software architect and expert code reviewer." It enforces core principles:

Specificity: References real file names and directory structures.
Technical Depth: Uses terms like "request lifecycle" and "module boundary."
Veracity: Strictly forbids fabricating files not present in the provided context.

Sources: server/ai/prompts.ts:38-60

Task-Specific Implementations

The integration supports several specialized tasks, each with its own prompt logic and response formatting.

AI Explanation Tasks

The explainRepo function handles scoped queries (repository, node, or file) and supports "ELI5" (Explain Like I'm 5) mode for beginners.

Scope	Heading Structure	Purpose
Repo	Overview, Architectural Style, Execution Flow	High-level system understanding
Node	What this does, Why it matters, Where to look next	Module-specific analysis
File	Purpose, Role in data flow, Related files	Code-level inspection

Sources: server/ai/tasks/explain.ts:28-65

Mock Provider and Development

When no API keys are provided or AI_PROVIDER is set to mock, the system uses createMockProvider. This provider returns canned JSON responses and Markdown summaries based on keywords found in the user prompt (e.g., "architecture", "roast", "suggest"). This allows developers to test the UI and data flow without incurring LLM costs. Sources: server/ai/provider.ts:133-238

Conclusion

The AI Provider Integration in gitSdm provides a robust, extensible framework for translating complex codebase structures into human-readable insights. By abstracting provider-specific details and centralizing prompt engineering, the system maintains high technical accuracy while remaining flexible to new AI technologies.

AI Task Handlers

Relevant source files

The following files were used as context for generating this wiki page:

AI Task Handlers

AI Task Handlers constitute the core intelligence layer of the gitSdm platform, responsible for transforming raw repository analysis into actionable developer insights. These handlers interface between the backend services (like the GitHub client and repository analyzer) and various AI providers (Google Gemini, OpenAI, Anthropic) to perform specific analytical tasks such as architectural explanation, refactoring suggestions, and onboarding walkthroughs.

The system uses a standardized execution pattern via executeAiTask, which manages prompt construction, API communication, and response caching. This modular design allows the platform to offer features ranging from professional technical audits to creative "Repo Roasts," all while maintaining a consistent context provided by the repository's file structure and metadata.

Sources: server/ai/tasks/explain.ts, server/ai/provider.ts, README.md:92-105

Task Orchestration Architecture

The AI task system is organized into specialized modules within server/ai/tasks/, each focusing on a specific domain of repository intelligence. Requests are typically initiated via the API router and processed through a pipeline that builds repository context before invoking the AI provider.

Core Execution Flow

When a specific task is requested (e.g., explainRepo), the handler performs the following steps:

Context Building: Calls analyzeRepository to get the latest file tree, dependencies, and metadata.
Prompt Engineering: Combines a global SYSTEM_PROMPT with task-specific instructions and the repository context using buildRepoContext.
Task Execution: Passes the configuration to executeAiTask, which handles the actual LLM call and result caching.

flowchart TD
    Router[AI Routes Handler] -->|Call| TaskHandler[Specific Task Handler]
    TaskHandler -->|Fetch| Analysis[Repo Analysis Service]
    TaskHandler -->|Build| Prompt[Repo Context & User Prompt]
    TaskHandler -->|Execute| Service[AI Task Service]
    Service -->|Request| AIProvider[AI Provider - Gemini/OpenAI/Anthropic]
    AIProvider -->|Response| Service
    Service -->|Return| Router

The diagram shows the standard flow from an incoming API request to the AI provider response. Sources: server/router/ai-routes.ts:32-135, server/ai/tasks/explain.ts:18-40, server/ai/tasks/playground.ts:16-45

Primary Task Handlers

Explanation and Architecture Tasks

The explanation handlers provide different levels of granularity for codebase understanding. They support specific scopes such as the entire repository, a graph node, or a single file.

Function	Scope	Output Description
`explainRepo`	Repo/Node/File	Markdown explanation covering purpose, importance, and next steps.
`explainArchitecture`	Repository	JSON defining technical overview and specific architectural layers.
`explainRepoELI5`	Repository	Simplified, conversational explanation for beginners.

Sources: server/ai/tasks/explain.ts:8-70, server/ai/tasks/onboarding.ts:79-115

Onboarding and Learning Paths

These tasks are designed to reduce the time it takes for new developers to understand unfamiliar codebases.

suggestFiles: Identifies 8-10 critical files to read first, categorized by priority (high, medium, low).
generateOnboarding: Creates a 6-step walkthrough that builds a mental model from entry points to deployment.
generateLearningPath: Produces a deep JSON structure including a "Mental Model," "Recommended Path," and "Execution Flow" mapping specific file-to-file data transitions.

Sources: server/ai/tasks/onboarding.ts:10-77, server/ai/tasks/playground.ts:109-150

Code Quality and Health Tasks

Handlers in the refactor.ts module perform rigorous codebase assessments, scoring various dimensions of code quality.

graph TD
    subgraph HealthDimensions[Health Assessment Scores]
        M[Maintainability]
        MOD[Modularity]
        R[Readability]
        A[Architecture]
        C[Complexity]
    end
    
    RefactorTask[Refactor Handler] -->|Produces| Suggestions[Refactor Suggestions]
    HealthTask[Health Handler] -->|Produces| HealthDimensions
    Suggestions -->|Fields| Title[Title]
    Suggestions -->|Fields| Risk[Risk Level: High/Med/Low]
    Suggestions -->|Fields| Files[Affected Files]

Visualization of the data structures returned by quality-focused handlers. Sources: server/ai/tasks/refactor.ts:11-30, server/ai/tasks/refactor.ts:86-110

AI Provider Integration

The AIProvider interface abstracts different LLM backends. The system detects the provider based on the AI_PROVIDER environment variable or the format of the provided API key (e.g., sk-ant- for Anthropic).

Provider	Implementation Detail	Default Model
Gemini	Uses `@google/genai`	`gemini-2.5-flash`
OpenAI	Uses `openai` SDK	`gpt-4o-mini`
Anthropic	Uses `@anthropic-ai/sdk`	`claude-3-5-haiku-latest`
Mock	Local fallback with hardcoded templates	N/A

Sources: server/ai/provider.ts:25-58, server/ai/provider.ts:68-75, server/ai/provider.ts:106-115

Prompt Construction Utility

The buildRepoContext function (found in server/ai/prompts.ts) is critical for ensuring the AI has sufficient data to fulfill its "senior engineer" persona. It flattens the repository structure into a string containing:

Metadata: Language, stars, topics, and license.
Structure: Root-level directories and a list of up to 120 detected files.
Dependencies: Up to 40 dependencies with version and ecosystem info.
Activity: Recent commit counts and top contributors.

Sources: server/ai/prompts.ts:3-37

Example Context Structure

// server/ai/prompts.ts:3-10
export function buildRepoContext(analysis: RepoAnalysis, extra?: string): string {
  const topDirs = analysis.tree.map((n) => n.name).slice(0, 20).join(', ');
  const deps = analysis.dependencies
    .slice(0, 40)
    .map((d) => `${d.name}${d.version ? `@${d.version}` : ''} (${d.ecosystem}, ${d.type})`)
    .join('\n  ');
  // ... building flat file list and metadata string
}

Sources: server/ai/prompts.ts:3-10

Playground and Creative Tasks

Beyond technical analysis, the handlers support "Playground" features that offer creative outputs:

Repo Roast: A sarcastic, witty critique of the codebase referencing real files and structural decisions.
Readme Enhancer: Generates a professional README.md with badges, value propositions, and installation instructions based on detected package managers.

Sources: server/ai/tasks/playground.ts:12-70

Summary

AI Task Handlers serve as the bridge between repository data and developer-friendly insights. By combining standardized context building with specialized prompt engineering, they enable gitSdm to provide deep architectural understanding, quality audits, and educational paths. The architecture is provider-agnostic, supporting major LLM engines while providing a robust mock fallback for local development and testing.

Sources: README.md:92-105, server/ai/tasks/explain.ts, server/ai/provider.ts

System Prompts Configuration

Relevant source files

The following files were used as context for generating this wiki page:

System Prompts Configuration

Introduction

The System Prompts Configuration in gitSdm defines the AI's persona, operational boundaries, and technical expertise when interacting with developers. It is a multi-layered system that combines a global identity with specialized task-specific instructions to provide high-fidelity repository intelligence. The configuration ensures the AI acts as a "principal software architect," delivering specific, technical, and empathetic insights rather than generic summaries.

At its core, the system relies on the SYSTEM_PROMPT variable to establish a consistent voice, while dynamic functions like buildRepoContext inject structured data about the repository's tree, dependencies, and activity into the LLM's context window. Sources: server/ai/prompts.ts:47-51, server/ai/prompts.ts:5-45

Global Persona and Core Principles

The global SYSTEM_PROMPT serves as the foundation for almost all AI tasks in the platform. It explicitly defines the AI's mission: to provide "INSTANT, GENUINE understanding" of a codebase.

Behavioral Directives

The prompt enforces five core principles to maintain high output quality:

Specificity: References to actual filenames and directory structures are mandatory. Sources: server/ai/prompts.ts:56-58
Senior Engineering Mindset: Identifying architectural tradeoffs and implementation choices. Sources: server/ai/prompts.ts:60-62
Developer Empathy: Tailoring responses to onboarding, change management, or quality evaluation. Sources: server/ai/prompts.ts:64-67
Technical Precision: Using industry-standard terms like "request lifecycle" and "hot path." Sources: server/ai/prompts.ts:69-70
Factuality: A strict prohibition against fabricating files or dependencies. Sources: server/ai/prompts.ts:72-73

Repository Context Injection

To make the AI persona effective, the system must provide a structured view of the repository. This is handled by buildRepoContext, which transforms a RepoAnalysis object into a formatted string for the LLM.

Context Composition

Section	Content Description	Sources
Metadata	Full name, description, stars, license, and default branch.	server/ai/prompts.ts:25-32
Structure	Top-level directories (up to 20) and entry/important files (up to 60).	server/ai/prompts.ts:6-10
File List	A flat list of up to 120 detected files for deep path context.	server/ai/prompts.ts:13-20
Dependencies	Up to 40 dependencies with versioning, ecosystem, and type data.	server/ai/prompts.ts:7-9
Activity	Recent commit patterns and top contributors.	server/ai/prompts.ts:22-23

The following diagram illustrates how the SYSTEM_PROMPT and repository data are merged during an AI request:

graph TD
    subgraph InputData [Repository Data]
        A[RepoAnalysis Object]
        B[Metadata/Timeline]
        C[File Tree/Deps]
    end

    subgraph PromptEngine [Prompt Generation]
        D[buildRepoContext Function]
        E[Global SYSTEM_PROMPT]
        F[Task-Specific userPrompt]
    end

    A --> D
    B --> D
    C --> D
    
    D --> G[Final LLM Payload]
    E --> G
    F --> G
    
    G --> H[AI Provider Client]
    H --> I[OpenAI / Gemini / Anthropic]

The diagram shows the flow of raw analysis data through the context builder, where it is combined with static system prompts and task-specific user instructions before being dispatched to the AI provider.

Task-Specific Prompt Variants

While the global persona remains constant, individual modules define specialized userPrompt templates to achieve specific outcomes.

Architectural Visualization (Mermaid)

The system configures the AI to generate graph LR (Left-to-Right) flowcharts, grouping components into subgraphs such as "Entry Points," "Services," and "Utilities." It also requires specific CSS-like class definitions for node styling (e.g., class NodeId entry). Sources: server/ai/tasks/diagram.ts:16-46

Onboarding and Learning Paths

Prompts for onboarding focus on a 6-step progression:

Mental Model
Entry Point/Startup
Routing Lifecycle
Business Logic
Data/State Layer
Config/Deployment Sources: server/ai/tasks/onboarding.ts:72-83

Specialized Q&A Engine

The qa-engine.ts uses a distinct SYSTEM_PROMPT that ignores the global architect persona in favor of a strict "Codebase Analysis Assistant." This assistant is restricted to answering only from provided code chunks and must follow a rigid Markdown structure:

### Summary
### How it works
### Related files Sources: server/search/qa-engine.ts:72-88

Provider Implementation Logic

The AIProvider interface handles how the system role is transmitted to different LLM services.

sequenceDiagram
    participant S as AI Task Service
    participant P as AI Provider Manager
    participant G as Gemini Provider
    participant A as Anthropic Provider

    S->>P: complete(messages, options)
    
    alt Provider is Gemini
        P->>G: extract system role
        Note right of G: Injected as systemInstruction
        G-->>P: Generate response
    else Provider is Anthropic
        P->>A: extract system role
        Note right of A: Passed via system parameter
        A-->>P: Generate response
    end
    P-->>S: Return formatted string

The diagram demonstrates that while the prompts are defined centrally, the providers handle the "system" message role according to their specific API requirements (e.g., systemInstruction in Gemini vs. a system parameter in Anthropic). Sources: server/ai/provider.ts:81-91, server/ai/provider.ts:143-153

Summary of Configuration Elements

Feature	Key Logic / Constant	Role	Sources
Persona	`SYSTEM_PROMPT` (Global)	Establishes the "Principal Architect" identity and principles.	server/ai/prompts.ts:47-78
Context	`buildRepoContext`	Serializes AST and file tree analysis for LLM ingestion.	server/ai/prompts.ts:5-45
Strict Q&A	`SYSTEM_PROMPT` (QA)	Enforces scannability and prevents LLM hallucinations.	server/search/qa-engine.ts:72-88
Formatting	Task-specific JSON schemas	Ensures AI output matches internal TypeScript interfaces.	server/ai/tasks/refactor.ts:108-111
Fallback	`mockFallback`	Provides deterministic data when AI providers are unavailable.	server/ai/tasks/playground.ts:98-110

Conclusion

The System Prompts Configuration in gitSdm is a highly structured framework that balances a unified architectural persona with task-specific constraints. By combining deep repository context with strict engineering principles, the system ensures that AI-generated insights remain technically accurate, relevant to the specific codebase, and formatted for immediate developer utility.

Graphify Agent Integration

Relevant source files

The following files were used as context for generating this wiki page:

Graphify Agent Integration

Graphify Agent Integration represents the core mechanism within gitSdm for transforming flat repository structures into interactive, graph-first architectural visualizations. This system utilizes AI agents and specialized command-line tools to analyze file dependencies, classify modules, and generate topological maps that facilitate instant codebase understanding for developers.

The integration serves as the bridge between raw source code and the visual workspace, providing automated updates to the repository's internal mapping whenever structural changes occur. It specifically leverages the graphify tool to maintain directory-topology consistency and informs AI-driven features like Mermaid diagram generation and architecture summaries. Sources: README.md:1-25, CONTRIBUTING.md:65-69

Core Integration Architecture

The Graphify integration operates as a multi-stage pipeline that ingests GitHub repository data and produces a visual model. The architecture is divided into a backend analysis layer and a frontend rendering layer.

System Data Flow

The following diagram illustrates the lifecycle of a repository analysis request, from the user input to the final graph visualization and AI enrichment.

flowchart TD
    User[User Input URL] --> Router[API Router]
    Router --> Ingest[GitHub Tree Fetcher]
    Ingest --> Parser[Dependency Analyzer]
    Parser --> Builder[Graph Builder Engine]
    Builder --> Layout[Dagre Layout Engine]
    Layout --> UI[React Flow Canvas]
    UI --> AI[AI Agent Insights]
    AI -.-> UI

The system leverages dagre for initial layout math and @xyflow/react (React Flow) for the interactive canvas. Sources: README.md:104-124, server/ai/tasks/diagram.ts:40-52

Component Roles and Responsibilities

Component	Responsibility	Relevant Files
Graphify CLI	Updates interactive directory-topology mapping and verifies AST parser compatibility.	`CONTRIBUTING.md`
Graph Builder	Constructs nodes and edges based on file paths and classified dependencies.	`README.md`, `src/components/viz/architecture/mermaid-generator.ts`
AI Task Handlers	Generates logical system architecture summaries and Mermaid diagrams via LLMs.	`server/ai/tasks/diagram.ts`, `server/ai/tasks/explain.ts`
Mermaid Generator	Programmatically generates Mermaid.js code by scoring file connectivity and importance.	`src/components/viz/architecture/mermaid-generator.ts`

Automated Topology Updates

To maintain the accuracy of the dependency map during development, the integration provides a specific workflow for contributors. When new files are added or exports are modified, the graphify agent must be invoked to synchronize the internal model.

pnpm exec graphify update .

Sources: CONTRIBUTING.md:65-69

Diagram Generation Logic

The integration includes two distinct modes for generating architecture diagrams: Programmatic and AI-Enhanced.

Programmatic Generation

The generateProgrammaticMermaid function calculates a connectivity score for every file in the analysis. This score is determined by the sum of incoming and outgoing edges, with bonuses applied to entry points and files marked as "important."

flowchart TD
    Start[Get File Nodes] --> Connectivity[Calculate In/Out Edges]
    Connectivity --> Scoring[Apply Entry & Importance Bonuses]
    Scoring --> Sort[Sort by Score]
    Sort --> Slice[Take Top 25 Nodes]
    Slice --> Subgraphs[Group by Folder Path]
    Subgraphs --> Output[Generate Mermaid Code]

Sources: src/components/viz/architecture/mermaid-generator.ts:20-60

AI Agent Tasks

The generateMermaidDiagram task utilizes the SYSTEM_PROMPT to instruct an AI provider to create a readable architecture flowchart. It groups components into logical subgraphs such as "Entry Points," "Services," and "Utilities."

Node Classification Styles

The integration applies specific CSS classes to nodes within Mermaid diagrams to provide visual context:

entry: Gateways or main entry points.
service: Business logic modules.
router: Controllers or request handlers.
util: Helpers and parsers.
db: Persistence or external API integrations.
config: Configuration files. Sources: server/ai/tasks/diagram.ts:16-36, src/components/viz/architecture/mermaid-generator.ts:88-95

Conclusion

Graphify Agent Integration is the foundational technology that enables gitSdm to deliver "instant architecture overviews." By combining automated AST parsing via the graphify CLI with intelligent AI-driven summarization, the system creates a live, interactive map of complex software projects, significantly reducing developer onboarding time. Sources: README.md:15-30, CONTRIBUTING.md:65-69

Frontend Components

Workspace & Layout System

Relevant source files

The following files were used as context for generating this wiki page:

Workspace & Layout System

The Workspace & Layout System in gitSdm serves as the primary interactive interface for repository analysis. It transforms static GitHub repository data into a "graph-first" environment where developers can visualize file structures, module boundaries, and architectural dependencies. The system is designed to provide instant insight that typically requires extensive manual code review.

The workspace is divided into several specialized functional zones: a central visualization canvas powered by React Flow, a file explorer sidebar, an AI-driven intelligence panel, and specialized views for system architecture. This layout is managed through a combination of global state (via Zustand) and modular React components.

Sources: README.md:14-25, src/components/home/HeroSection.tsx:43-58

Core Workspace Architecture

The workspace follows a modular IDE-like structure. It is composed of four primary regions that coordinate to display repository data: the Header, the Explorer (Left Sidebar), the Visualization Canvas (Center), and the Analysis Panel (Right Sidebar).

Layout Structure

The interface utilizes a "Fake IDE" metaphor to provide a familiar environment for developers.

Header: Displays repository metadata (owner, repo name), the current active branch, and the parsing status.
Explorer Sidebar: A hierarchical tree view of the repository's directories and files, allowing for manual navigation.
Visualization Canvas: The main area where d3-force and dagre layout algorithms render the repository as an interactive graph of nodes (files/folders) and edges (dependencies).
Intelligence/Analysis Sidebar: Houses the AI Center, health audits, and contributor analytics.

Sources: src/components/home/HeroSection.tsx:59-125, README.md:54-85

Workspace Layout Diagram

The following diagram illustrates the spatial arrangement and component distribution of the gitSdm workspace.

flowchart TD
    subgraph UI_Workspace [Workspace Layout]
        direction TB
        TopNav[Top Navigation: Branch Switcher & Status]
        subgraph Main_Content [Main Interaction Area]
            direction LR
            Explorer[Explorer Panel: File Tree]
            Canvas[Graph Canvas: React Flow View]
            AnalysisPanel[Analysis Sidebar: AI & Stats]
        end
        StatusBar[Status Bar: File/Import Counts]
    end

    TopNav --- Main_Content
    Main_Content --- StatusBar
    Explorer --- Canvas
    Canvas --- AnalysisPanel

Explanation: This diagram shows the high-level layout of the workspace, highlighting the relationship between navigation, the primary interactive canvas, and the supporting sidebars. Sources: src/components/home/HeroSection.tsx:59-165

Component Systems

1. Visualization Canvas

The canvas is the core of the workspace, utilizing @xyflow/react (React Flow) for rendering. It handles the interactive mapping of nodes representing files and folders. Nodes are styled based on their file type and degree of coupling.

Node Interaction: Clicking a node triggers the focusOnNode helper, which centers the graph on the selection and updates the global vizStore with the focused file path.
Visual Classification: Nodes are color-coded (e.g., #3b82f6 for .ts files) and sized according to their role (Repo: 14, Folder: 12, File: 8).

Sources: src/components/viz/OverviewTab.tsx:23-42, server/graph/layout.test.ts:46-60

2. Architecture View

The ArchitectureView component provides a specialized mode for viewing system-level diagrams. It supports two distinct modes:

Code Graph: Programmatically built via static import analysis.
AI Enhanced: A logical system architecture summarized by AI using Mermaid-style block diagrams.

Sources: src/components/viz/ArchitectureView.tsx:48-68, src/components/viz/ArchitectureView.tsx:244-255

3. AI Center Sidebar

The AI Center acts as a context-aware toolset within the workspace. It includes:

Health Audit: Displays scores for maintainability, modularity, and readability.
Risk Identification: Lists specific files affected by high coupling or architectural debt.
Playground: Features tools like "Repo Roast" and "README Enhancer."

Sources: src/components/viz/ai-sidebar/AiCenterTab.tsx:143-200, src/components/viz/ai-sidebar/AiCenterTab.tsx:288-320

Interactivity and Data Flow

Interaction in one part of the workspace often triggers updates across other components. For example, selecting a file in the "Suggested Reading" list within the OverviewTab will re-center the React Flow canvas on that specific node.

Node Selection Sequence

The following diagram shows the data flow when a user selects a file or node.

sequenceDiagram
    participant User as "User Interface"
    participant Store as "Zustand (vizStore)"
    participant RF as "React Flow Instance"
    participant Sidebar as "Analysis Sidebar"

    User->>RF: Clicks Node / Selects File
    RF->>Store: setSelectedNodeId(nodeId)
    Store->>Store: setFocusedFilePath(path)
    Store-->>Sidebar: Update Metadata/AI Context
    RF->>RF: setCenter(x, y, zoom)
    Sidebar-->>User: Display File Details

Explanation: This sequence illustrates how global state coordinates between the interactive graph canvas and the information panels. Sources: src/components/viz/OverviewTab.tsx:23-42, src/components/viz/ai-sidebar/AiCenterTab.tsx:52-65

Summary of Key Features

Feature	Description	Implementation
Dagre Layout	Automatically positions nodes in Top-Bottom (TB) or Left-Right (LR) hierarchies.	`applyDagreLayout`
Branch Diffing	Visualizes added, modified, and deleted files when comparing branches.	`OverviewTab` (graphDiff)
Pan & Zoom	High-precision navigation of large codebases.	`useArchitecturePanZoom`
Export System	Allows downloading diagrams as SVG, PNG, or Mermaid code.	`useArchitectureExport`
Stats Integration	Displays file/folder counts and commit density timelines.	`OverviewTab`

Sources: server/graph/layout.test.ts:7-35, src/components/viz/OverviewTab.tsx:75-150, src/components/viz/ArchitectureView.tsx:180-200

Conclusion

The Workspace & Layout System provides the structural foundation for the gitSdm intelligence platform. By integrating complex graph layouts with modular analysis tabs, it allows developers to move from a high-level "Big Picture" understanding of a repository down to specific file-level details and AI-generated insights seamlessly.

Sources: README.md:158-166

D3-Force Interactive Canvas

Relevant source files

The following files were used as context for generating this wiki page:

src/features/graph/canvas/force/useD3Physics.ts
src/features/graph/canvas/force/ForceMinimap.tsx
src/features/graph/canvas/hooks/useForceCanvasState.ts
src/features/graph/canvas/GraphCanvas.tsx (Referenced via usage in HeroSection)
README.md

D3-Force Interactive Canvas

The D3-Force Interactive Canvas is a core visualization component of gitSdm used to render repository structures as dynamic, force-directed networks. It enables developers to explore file relationships, dependency clusters, and module boundaries through an interactive 2D environment powered by @xyflow/react (React Flow) and D3 physics engines.

This system transforms static repository data into a "network" layout where nodes represent files or folders and edges represent imports or dependencies. It provides advanced features such as blast radius calculation, real-time filtering, and a synchronized minimap for navigation. Sources: README.md:139-144, src/features/graph/canvas/hooks/useForceCanvasState.ts:40-55

Core Architecture and Data Flow

The canvas architecture relies on a unidirectional data flow where raw graph data is processed into force-directed simulation data, which is then rendered and synchronized with the global application state.

State Management and Synchronization

The useForceCanvasState hook acts as the primary orchestrator, bridging the gap between the useVizStore (Zustand) and the rendering engine. It handles node selection, hover states, and dynamic filtering based on file types or node categories. Sources: src/features/graph/canvas/hooks/useForceCanvasState.ts:21-39

Component Relationship Diagram

The following diagram illustrates how the different hooks and components interact to maintain the canvas state:

flowchart TD
    Store[Zustand vizStore] -->|Filters/Selection| Hook[useForceCanvasState]
    Hook -->|Simulation Data| D3[useD3Physics]
    Hook -->|Export Actions| Export[useGraphExport]
    Hook -->|Sync Viewport| Sync[useForceSync]
    D3 -->|Update Positions| Canvas[React Flow Canvas]
    Canvas -->|Viewport Data| Minimap[ForceMinimap]

The flow ensures that any change in global filters immediately updates the force simulation and the resulting visual layout. Sources: src/features/graph/canvas/hooks/useForceCanvasState.ts:153-176

Physics and Layout Engine

The canvas utilizes D3-force simulations to calculate node positions dynamically when the network layout is active.

D3 Physics Implementation

The useD3Physics hook configures the forces acting upon the nodes, including:

Charge Force: Prevents nodes from overlapping.
Link Force: Pulls connected nodes together based on dependency relationships.
Center Force: Keeps the entire graph centered within the viewport.

Layout Transitions

While the project supports multiple layout types (like Dagre), the network mode triggers the specialized useD3Physics logic. Sources: src/features/graph/canvas/hooks/useForceCanvasState.ts:147-151, src/features/graph/canvas/force/useD3Physics.ts:1-10

Interactive Features

The D3-Force canvas supports several advanced interactive tools for codebase analysis:

Feature	Description	Implementation Details
Blast Radius	Visualizes the "impact zone" of a file change.	Calculated via `computeBlastRadius` based on transitive dependencies.
Minimap	Provides a high-level overview of the graph.	Uses a secondary 2D canvas to render node dots and a viewport bounding box.
Node Filtering	Hides/shows nodes based on type.	Uses `buildForceGraphData` with `nodeTypeFilters` and `fileTypeFilters`.
Auto-Centering	Focuses the view on selected files.	Orchestrated by `useForceSync` to transition the camera to specific node coordinates.

Sources: src/features/graph/canvas/hooks/useForceCanvasState.ts:65-75, src/features/graph/canvas/force/ForceMinimap.tsx:50-80, README.md:145-150

Blast Radius Calculation Flow

The blast radius feature allows users to see which files are affected if a specific node is modified.

sequenceDiagram
    participant User
    participant Store as vizStore
    participant Logic as useForceCanvasState
    participant Radius as forceGraphUtils
    
    User->>Store: Toggle Blast Radius
    Store->>Logic: blastRadiusActive = true
    Logic->>Radius: computeBlastRadius(selectedNodeId, edges)
    Radius-->>Logic: Set of affected node IDs
    Logic->>Store: setHighlightedNodeIds(affectedNodes)
    Store->>User: UI highlights impacted nodes

Sources: src/features/graph/canvas/hooks/useForceCanvasState.ts:81-85, src/features/graph/canvas/hooks/useForceCanvasState.ts:107-113

Navigation and Minimap

Navigation is supported by the ForceMinimap component, which renders a simplified version of the network on a separate HTML5 canvas element.

Minimap Logic

The minimap calculates the graph's bounding box to scale the entire network into a 200x150 preview window. It draws:

Nodes: Represented as small arcs/dots colored by their category.
Viewport Bounds: A translucent rectangle representing the area currently visible on the main canvas. Sources: src/features/graph/canvas/force/ForceMinimap.tsx:15-45

Viewport Transformation

To calculate the viewport bounds, the minimap retrieves the current zoom and center coordinates from the force graph API:

const center = fg.centerAt(); 
const zoom = fg.zoom();
const halfWidthInD3 = (width / 2) / zoom;
const halfHeightInD3 = (height / 2) / zoom;

Sources: src/features/graph/canvas/force/ForceMinimap.tsx:28-31, src/features/graph/canvas/force/ForceMinimap.tsx:71-74

Summary

The D3-Force Interactive Canvas is the primary interface for visual dependency exploration in gitSdm. By combining D3's physics-based layout engine with React Flow's canvas management, it provides a performant environment for analyzing complex codebases. Its tight integration with the global vizStore ensures that features like blast radius, filtering, and cross-branch comparisons are reflected instantly in the visual network. Sources: README.md:139-150, src/features/graph/canvas/hooks/useForceCanvasState.ts:1-20

Mermaid Architecture Generation

Relevant source files

The following files were used as context for generating this wiki page:

Mermaid Architecture Generation

Mermaid Architecture Generation is a core feature within the gitSdm platform that provides users with two distinct methods for visualizing repository structure: Code Graph (programmatically generated from static analysis) and AI Enhanced (generated using Large Language Models). These visualizations are rendered as interactive Mermaid.js flowcharts, allowing developers to quickly grasp module boundaries, entry points, and system data flows.

The system orchestrates data from the repository analysis, applies layout logic, and utilizes a custom-themed Mermaid.js configuration to render high-quality SVGs. Users can interact with the resulting diagrams through pan/zoom controls and export them in multiple formats, including PNG, SVG, and raw Mermaid code.

Sources: src/components/viz/ArchitectureView.tsx:1-40, server/ai/tasks/diagram.ts:10-30

Core Components and Workflow

The generation process is managed through a combination of frontend hooks and backend AI tasks. The primary entry point is the ArchitectureView component, which toggles between manual programmatic generation and AI-driven generation modes.

Generation Modes

Mode	Generation Method	Logic Source	Use Case
Code Graph	Programmatic	`mermaid-generator.ts`	Fast, deterministic mapping of all significant files and folders.
AI Enhanced	LLM Task	`server/ai/tasks/diagram.ts`	High-level logical grouping and human-readable architecture summaries.

Sources: src/components/viz/ArchitectureView.tsx:64-100, src/components/viz/architecture/hooks/useArchitectureState.ts:25-40

Logic Flow Diagram

The following diagram illustrates the flow from repository analysis to the final rendered visualization in the UI.

flowchart TD
    Analysis[Repo Analysis Data] --> ModeSelect{Mode Selection}
    ModeSelect -- "Code Graph" --> Programmatic[generateProgrammaticMermaid]
    ModeSelect -- "AI Enhanced" --> AI[AI Task: generateMermaidDiagram]
    
    Programmatic --> Code[Raw Mermaid Code]
    AI --> Code
    
    Code --> Strip[stripMermaidFences]
    Strip --> Render[Mermaid.render]
    Render --> UI[ArchitectureView Canvas]
    
    subgraph Frontend
        ModeSelect
        Programmatic
        Strip
        Render
        UI
    end
    
    subgraph Backend
        AI
    end

Sources: src/components/viz/architecture/hooks/useArchitectureState.ts:28-60, server/ai/tasks/diagram.ts:10-50

Programmatic Generation Logic

The generateProgrammaticMermaid function creates a flowchart based on the file connectivity found in the RepoAnalysis object. It prioritizes nodes based on their importance and connectivity to ensure the diagram remains readable.

Scoring and Selection

Connectivity Calculation: Maps all nodes and edges to determine the degree (incoming + outgoing) of each file.
Node Scoring:
- Base score = degree.
- +10 for entry point files.
- +5 for files marked as "important" in the analysis.
Filtering: Selects the top 25 scored nodes to avoid clutter.
Grouping: Organizes files into subgraphs based on their directory paths.

Sources: src/components/viz/architecture/mermaid-generator.ts:22-60

Styling Classes

The generator applies specific CSS classes to nodes based on their classification in the repository:

entry: For entrypoint files.
config: For configuration manifests.
test: For test suites.
service: Default for business logic files.

Sources: src/components/viz/architecture/mermaid-generator.ts:98-105

AI-Powered Diagram Generation

The AI-enhanced mode utilizes a specialized prompt to generate logical architectural summaries. This task is processed on the backend via the generateMermaidDiagram function.

Prompt Requirements

The LLM is instructed to:

Use a Left-to-Right layout (graph LR).
Group components into logical subgraphs such as "Entry Points", "Services", and "Database".
Limit the diagram to 15-20 nodes.
Classify nodes using specific class lines (e.g., class NodeId router;).

Sources: server/ai/tasks/diagram.ts:20-55

Sequence of AI Diagram Generation

sequenceDiagram
    participant U as User Interface
    participant H as useArchitectureState
    participant S as AI Service (Backend)
    participant M as Mermaid Engine

    U->>H: Toggle to AI Mode
    H->>S: Request: generateMermaidDiagram(owner, repo)
    S-->>H: Return Mermaid Code Block
    H->>H: stripMermaidFences()
    H->>M: render(id, code)
    M-->>H: Rendered SVG String
    H->>U: Update View with SVG

Sources: src/components/viz/architecture/hooks/useArchitectureState.ts:21-70, server/ai/tasks/diagram.ts:10-20

Rendering and Configuration

The mermaid-config.ts file defines a custom dark-themed appearance for all diagrams. It uses a "base" theme with overrides to match the gitSdm UI aesthetic.

Theme Variables

Background: #09090b
Primary Color: #238636 (GitHub-style green)
Node Border: #3f3f46
Text Color: #f4f4f5

Sources: src/components/viz/architecture/mermaid-config.ts:11-30

Interaction and Export

The ArchitectureView provides a canvas with pan and zoom capabilities via useArchitecturePanZoom. The useArchitectureExport hook manages the extraction of the diagram in various formats:

Copy Mermaid Code: Extracts the raw text and strips fences.
Copy SVG: Serializes the DOM SVG element to a string.
Download PNG: Uses html-to-image to convert the SVG to a high-quality raster image.

Sources: src/components/viz/ArchitectureView.tsx:100-150, src/components/viz/architecture/hooks/useArchitectureExport.ts:27-105

Summary

Mermaid Architecture Generation provides a multi-faceted view of a codebase by bridging static analysis and AI insights. By utilizing programmatic scoring for detail and LLM-based grouping for conceptual understanding, it allows developers to interact with a repository's structure visually. The system ensures high performance through node filtering and professional presentation through extensive custom Mermaid.js styling.

Sources: src/components/viz/ArchitectureView.tsx:320-335, server/ai/tasks/diagram.ts:10-15

Configuration & Extensibility

Application Configuration & Env vars

Relevant source files

The following files were used as context for generating this wiki page:

Application Configuration & Env vars

The gitSdm platform utilizes a robust configuration system primarily driven by environment variables to manage its multi-layered architecture. These configurations govern integration with external services, including the GitHub API and various Large Language Model (LLM) providers such as Google Gemini, OpenAI, and Anthropic.

The system is designed to be "zero-config" for basic development by defaulting to a mock provider when no API keys are present, while allowing granular control over model selection and API versions in production environments. Sources: README.md:106-119, server/ai/provider.ts:25-45

Server-Side Environment Variables

The backend services rely on several categories of environment variables defined in a .env file (copied from .env.example). These variables handle authentication, service selection, and model parameters. Sources: CONTRIBUTING.md:28-32

Global Service Configuration

Variable	Description	Default / Options
`GITHUB_TOKEN`	Optional. Increases GitHub API rate limits for public repositories.	N/A
`AI_PROVIDER`	Defines the active LLM service.	`mock`, `gemini`, `openai`, `anthropic`

Sources: README.md:108-111

AI Provider Specifics

Each provider has specific configuration variables for API keys, model identifiers, and API base URLs.

Provider	API Key Variable	Model Variable	Default Model
Gemini	`GEMINI_API_KEY`	`GEMINI_MODEL`	`gemini-2.5-flash`
OpenAI	`OPENAI_API_KEY`	`OPENAI_MODEL`	`gpt-4o-mini`
Anthropic	`ANTHROPIC_API_KEY`	`ANTHROPIC_MODEL`	`claude-3-5-haiku-latest`

Sources: README.md:112-119, server/ai/provider.ts:55-57, server/ai/provider.ts:88-89, server/ai/provider.ts:114-115

AI Provider Initialization Logic

The application uses an automated detection sequence to initialize the AIProvider. The priority is determined first by an explicit AI_PROVIDER variable, followed by the presence of specific API keys.

flowchart TD
    Start[Get AI Provider] --> CheckOverride{Override Key Provided?}
    CheckOverride -- Yes --> DetectType[Detect Type from Key Prefix]
    CheckOverride -- No --> CheckEnvVar{AI_PROVIDER Env Set?}
    
    DetectType --> CreateInstance[Create Specific Provider]
    
    EnvCheckValue{Value?}
    CheckEnvVar -- Yes --> EnvCheckValue
    EnvCheckValue -- "gemini/openai/anthropic" --> CreateInstance
    EnvCheckValue -- "mock" --> Mock[Create Mock Provider]
    
    CheckEnvVar -- No --> KeyAutoDetect{Check API Keys}
    KeyAutoDetect -- "GEMINI_API_KEY exists" --> G[Gemini]
    KeyAutoDetect -- "OPENAI_API_KEY exists" --> O[OpenAI]
    KeyAutoDetect -- "ANTHROPIC_API_KEY exists" --> A[Anthropic]
    KeyAutoDetect -- "No keys found" --> Mock
    
    G --> CreateInstance
    O --> CreateInstance
    A --> CreateInstance

This diagram shows the decision logic used to select the AI backend. Sources: server/ai/provider.ts:25-52

Provider Key Detection

The system can infer the provider type from the format of an overrideKey:

sk-ant-: Identified as anthropic.
sk-: Identified as openai.
Otherwise defaults to gemini.

Sources: server/ai/provider.ts:10-23

Client-Side Configuration

While most configuration resides on the server, the frontend manages user-interface preferences and synchronization with the server-side environment.

Theme Synchronization

The application performs an initial theme check during the HTML document's head execution to prevent "Flash of Unstyled Content" (FOUC). It synchronizes the colorScheme based on localStorage.

// Logic inside index.html
var theme = localStorage.getItem('theme');
if (theme === 'light') {
  document.documentElement.classList.add('light');
  document.documentElement.style.colorScheme = 'light';
} else {
  document.documentElement.style.colorScheme = 'dark';
}

Sources: index.html:26-34

App Config Fetching

The frontend fetches application configuration from the /api/config endpoint (handled via fetchAppConfig) to maintain state awareness of active features and limits. Sources: src/components/home/HeroSection.tsx:18-23

Mock Mode Configuration

If no environment variables are provided, the system defaults to a mock provider. This provider does not perform external network calls to LLMs but instead returns predefined responses based on the query content (e.g., "architecture", "suggest", "onboarding", "ELI5"). Sources: server/ai/provider.ts:133-219

The mock mode also utilizes local static data to simulate GitHub API responses for specific repositories like mbayue/gitSdm or a generic mock-todo-app. Sources: server/github/mock-data.ts:5-66

Summary

The gitSdm configuration system prioritizes flexibility and ease of setup. By leveraging environment variables, the platform can seamlessly switch between different LLM ecosystems or operate in a fully mocked local environment for development and testing. Key variables like AI_PROVIDER and GITHUB_TOKEN ensure the application scales from local prototypes to production-grade repository analysis.

Deployment & Infrastructure

Vercel Serverless Deployment

Relevant source files

The following files were used as context for generating this wiki page:

Vercel Serverless Deployment

Vercel Serverless Deployment serves as the primary hosting and execution infrastructure for the gitSdm platform. It utilizes Vercel's serverless functions to handle backend logic, including GitHub repository ingestion, AI-driven analysis, and dependency graph generation. This architecture allows the project to scale dynamically while maintaining a clear separation between the React-based frontend and the Node.js backend services.

The deployment infrastructure is defined through a combination of configuration files, such as vercel.json, and specialized handlers like server/vercel-handler.ts which bridge the application's internal API router with Vercel's execution environment. This setup supports the project's mission of providing instant repository intelligence without requiring persistent server management.

Sources: README.md:46-59, package.json:69-69, server/vercel-handler.ts

Infrastructure Architecture

The system follows a modular architecture where the frontend is a React SPA and the backend consists of serverless handlers. These handlers orchestrate tasks between the GitHub API and various AI providers.

flowchart TD
    User[Developer Browser] -->|Web Request| Vercel[Vercel Edge/Serverless]
    Vercel -->|Static Assets| Frontend[React + Vite SPA]
    Vercel -->|API Calls| ServerlessFunc[Serverless Functions]
    ServerlessFunc -->|Route Handling| APIRouter[API Router]
    APIRouter -->|Task Execution| AIService[AI Provider Manager]
    APIRouter -->|Data Fetching| GitHubService[GitHub Tree Fetcher]
    
    AIService -->|Queries| Gemini[Google Gemini]
    AIService -->|Queries| OpenAI[OpenAI]
    AIService -->|Queries| Anthropic[Anthropic Claude]

The diagram illustrates the request lifecycle from the browser through Vercel's serverless infrastructure to the internal service layers and external AI providers. Sources: README.md:46-60, server/ai/provider.ts:47-65

Serverless API Configuration

The backend is organized into modular serverless functions located in the api/ directory. These functions are mapped to specific routes via the Vercel deployment configuration.

Deployment Configuration (`vercel.json`)

The project utilizes vercel.json to define the runtime environment and routing rules. Key configurations include the use of the @vercel/node runtime for backend functions and the mapping of API paths.

Configuration Key	Value / Description
`runtime`	`@vercel/node` (defined in `package.json` devDependencies)
`functions`	Configures memory and execution limits for handlers in `api/`
`api/ai/*`	Handlers for AI tasks like architecture analysis and onboarding
`api/repo/*`	Handlers for repository ingestion and analysis

Sources: vercel.json, package.json:88-88, README.md:47-47

AI Task Handlers

Specific AI capabilities are exposed through dedicated serverless endpoints. For example, the api/ai/architecture.ts handler manages deep system architecture analysis by interfacing with the AI service layer.

sequenceDiagram
    participant Client as Client Browser
    participant Handler as api/ai/architecture.ts
    participant Service as server/ai/service.ts
    participant Provider as AI Provider (Gemini/OpenAI)

    Client->>Handler: POST /api/ai/architecture
    Handler->>Service: executeAiTask('architecture', ...)
    Service->>Provider: complete(messages, options)
    Provider-->>Service: JSON Architecture Data
    Service-->>Handler: Parsed Task Result
    Handler-->>Client: 200 OK (Architecture JSON)

This sequence shows how a specific AI architectural request is handled within the serverless environment. Sources: api/ai/architecture.ts, server/ai/provider.ts:74-92

AI Provider Integration

The serverless environment dynamically detects and initializes AI providers based on environment variables. This allows the deployment to support multiple LLMs seamlessly.

Provider Selection Logic

The createProvider function in server/ai/provider.ts prioritizes the AI_PROVIDER environment variable, falling back to auto-detection based on available API keys.

// server/ai/provider.ts:47-65
  if (process.env.AI_PROVIDER) {
    const envProvider = process.env.AI_PROVIDER.toLowerCase();
    if (envProvider === 'gemini' || envProvider === 'openai' || envProvider === 'anthropic' || envProvider === 'mock') {
      providerType = envProvider as 'gemini' | 'openai' | 'anthropic' | 'mock';
    }
  } else if (process.env.GEMINI_API_KEY && process.env.GEMINI_API_KEY.trim()) {
    providerType = 'gemini';
  } else if (process.env.OPENAI_API_KEY && process.env.OPENAI_API_KEY.trim()) {
    providerType = 'openai';
  }

Supported Environment Variables

Variable	Purpose
`AI_PROVIDER`	Explicitly sets the provider (`gemini`, `openai`, `anthropic`, or `mock`)
`GEMINI_API_KEY`	Authentication for Google Gemini API
`OPENAI_API_KEY`	Authentication for OpenAI API
`ANTHROPIC_API_KEY`	Authentication for Anthropic SDK

Sources: server/ai/provider.ts:25-65, README.md:121-131

Caching and Performance

To optimize performance within the serverless lifecycle, the project implements a caching layer. However, since Vercel serverless functions are ephemeral, the LRU Cache implementation (using lru-cache) primarily serves to optimize performance during a single execution or across warm starts.

Memory Constraints: The README notes that the custom LRU cache lives in-memory, which means it may reset during "cold starts" in the Vercel environment.
Dependency Management: The project uses Bun as a recommended runtime but maintains compatibility with Node.js 22 to align with Vercel's standard environments.

Sources: README.md:162-162, package.json:49-49, server/ai/provider.ts:256-271

Conclusion

The Vercel Serverless Deployment provides gitSdm with a scalable and cost-effective infrastructure for repository analysis. By leveraging @vercel/node and a modular routing system, the platform effectively bridges client-side React visualizations with intensive backend AI and GitHub API tasks, ensuring high availability and ease of deployment.

Docker & Google Cloud Run

Relevant source files

The following files were used as context for generating this wiki page:

Docker & Google Cloud Run

The gitSdm platform utilizes containerization and serverless infrastructure to provide a consistent, scalable environment for repository analysis and visualization. Docker is used to bundle the Vite-based frontend assets and the Node.js backend services into a single deployable unit, ensuring that the application behaves identically in development, staging, and production environments.

Google Cloud Run serves as the primary deployment target, offering a managed, auto-scaling environment that handles the execution of the containerized application. This infrastructure supports the application's reliance on external APIs, including the GitHub API and various AI providers (Google Gemini, OpenAI, and Anthropic), by securely managing environment variables and secrets.

Sources: README.md:148-154, README.md:183-195

Docker Architecture & Configuration

The Docker implementation for gitSdm follows a pattern of bundling a small Node.js server to serve static files and handle API requests. The image builds the Vite application, stores the output in a dist/ directory, and utilizes a production-ready server entry point.

Containerization Strategy

The container encapsulates the following components:

Static Assets: Pre-built Vite frontend files served from the /dist directory.
API Router: A Node.js Express server that handles backend logic, including GitHub ingestion and AI task orchestration.
Production Server: Managed via server/prod-server.ts to coordinate static file serving and API routing.

Docker Build and Execution

The build process involves installing dependencies, transpiling TypeScript, and bundling the frontend.

# Build Docker image
docker build -t gitsdm .

# Run container with environment configuration
docker run -p 3000:3000 --env-file .env gitsdm

Sources: README.md:156-163, server/prod-server.ts:1-20

Dockerfile Structure

While the specific Dockerfile is used during build, its structure is typically reflected in the project's mock documentation and build scripts.

FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 5173
CMD ["npm", "run", "dev"]

Note: The production Docker image uses npm run build and serves via server/prod-server.ts rather than the dev server. Sources: server/github/mock-data.ts:155-163, README.md:156-160

Google Cloud Run Deployment

Google Cloud Run is the recommended platform for hosting gitSdm due to its support for stateless containers and integration with Google Cloud's ecosystem.

Deployment Workflow

The deployment is performed directly from the source code, leveraging Google Cloud's build packs to containerize the application on the fly or using the pre-defined Docker configuration.

gcloud run deploy gitsdm \
  --source . \
  --region asia-southeast1 \
  --allow-unauthenticated \
  --env-vars-file .env

Sources: README.md:166-172

Deployment Configuration Summary

Parameter	Value	Description
Region	`asia-southeast1`	Default deployment region for low latency.
Authentication	`--allow-unauthenticated`	Permitting public access to the visualization tool.
Runtime	Node.js 22	Optimized for Express and the AI SDKs.
Port Mapping	3000	The default internal port mapped to the service.

Sources: README.md:168-171, README.md:204

Execution Flow in Production

When deployed via Docker or Cloud Run, the application operates in a hybrid mode where a single entry point manages both the user interface and the backend processing pipeline.

Request Lifecycle

The following diagram illustrates how a request is handled within the deployed container:

flowchart TD
    User[Developer Browser] -->|HTTP Request| Container[Cloud Run Container]
    subgraph ContainerLogic [Internal Container Routing]
        Container -->|URL Path /api/*| APIRouter[Express API Router]
        Container -->|Root / Other Paths| StaticServer[Static File Server]
    end
    APIRouter -->|Analyze Repo| GitHub[GitHub API]
    APIRouter -->|Generate Insights| AI[AI Provider SDK]
    StaticServer -->|Serve Assets| Dist[dist/ Folder]
    
    class User entry;
    class Container service;
    class APIRouter router;
    class StaticServer util;

This diagram shows the dual-path routing logic where the production server distinguishes between frontend assets and backend API calls. Sources: server/prod-server.ts, server/ai/tasks/diagram.ts:32-40

Environment Configuration

Configuration in containerized environments is strictly managed via environment variables. This is critical for the AIProvider factory, which detects the available keys at runtime to determine which service (Gemini, OpenAI, or Anthropic) to instantiate.

Infrastructure Variables

Variable	Requirement	Description
`GITHUB_TOKEN`	Recommended	Increases API rate limits for public repository analysis.
`AI_PROVIDER`	Optional	Specifies `gemini`, `openai`, `anthropic`, or `mock`.
`PORT`	Optional	The port the container listens on (defaults to 3000 in prod).

Sources: README.md:126-145, server/ai/provider.ts:40-58

Summary of Infrastructure Stack

The deployment relies on a specific set of technologies to maintain the performance of the repository mapping and AI analysis features:

Runtime: Node.js 22 (via Docker Alpine images).
Deployment: Google Cloud Run for serverless execution.
Package Management: Bun or pnpm for efficient dependency resolution during image builds.
API Client: Octokit for GitHub communication.

Sources: README.md:191-209, CONTRIBUTING.md:15-18

Development & Testing

Local Development Setup

Relevant source files

The following files were used as context for generating this wiki page:

Local Development Setup

Setting up the gitSdm environment allows developers to contribute to the graph-first repository analysis platform. The project utilizes a full-stack TypeScript architecture, requiring a Node.js or Bun runtime to manage a React frontend and an Express-based backend.

Local development supports both live API integrations (GitHub and various AI providers) and a robust Mock Mode for developers who wish to work offline or without API keys. This setup ensures that all core features, including interactive visualizations and AI-driven insights, can be tested locally.

Sources: CONTRIBUTING.md:1-10, README.md:1-20

Environment Prerequisites

Before starting the development server, ensure the following tools are installed:

Node.js: Version ≥ 22.
Package Manager: pnpm ≥ 9 is recommended, though bun ≥ 1.1 is supported for faster execution.
Git: Required for forking and cloning the repository.

Sources: CONTRIBUTING.md:12-14, README.md:65-70

Initial Installation

The following sequence diagram outlines the initial steps to prepare the local environment:

sequenceDiagram
    participant Dev as Developer
    participant Git as GitHub
    participant Local as Local Machine
    Dev->>Git: Fork mbayue/gitSdm
    Git-->>Dev: Fork created
    Dev->>Local: git clone [fork-url]
    Local->>Local: cd gitSdm
    Dev->>Local: pnpm install
    Local->>Local: cp .env.example .env

Sources: CONTRIBUTING.md:16-30, README.md:73-82

Configuration and Secrets

The application relies on environment variables defined in a .env file. These variables control API access and the behavior of the AI provider system.

AI Provider Selection

The AI_PROVIDER variable determines which backend engine processes natural language tasks. If no provider is specified, the system defaults to mock.

Variable	Values	Description
`AI_PROVIDER`	`mock`, `gemini`, `openai`, `anthropic`	The active AI service.
`GITHUB_TOKEN`	String	Personal Access Token to increase API rate limits.
`GEMINI_API_KEY`	String	Required if provider is `gemini`.
`OPENAI_API_KEY`	String	Required if provider is `openai`.
`ANTHROPIC_API_KEY`	String	Required if provider is `anthropic`.

Sources: README.md:85-95, server/ai/provider.ts:25-50

AI Provider Logic

The system uses an auto-detection mechanism to instantiate the correct provider based on available keys if AI_PROVIDER is not explicitly set.

flowchart TD
    Start[Load .env] --> CheckExplicit{AI_PROVIDER set?}
    CheckExplicit -- Yes --> ReturnExplicit[Use specified provider]
    CheckExplicit -- No --> CheckGemini{GEMINI_API_KEY?}
    CheckGemini -- Yes --> UseGemini[Use Gemini]
    CheckGemini -- No --> CheckOpenAI{OPENAI_API_KEY?}
    CheckOpenAI -- Yes --> UseOpenAI[Use OpenAI]
    CheckOpenAI -- No --> CheckAnthropic{ANTHROPIC_API_KEY?}
    CheckAnthropic -- Yes --> UseAnthropic[Use Anthropic]
    CheckAnthropic -- No --> UseMock[Use Mock Provider]

Sources: server/ai/provider.ts:28-56

Running Development Servers

The project utilizes Vite for the frontend and Express for the backend. These can be run concurrently or separately.

Concurrent Mode: pnpm dev or bun dev starts both the Vite UI (port 5173) and the Express backend (port 3001).
Separate Frontend: pnpm dev:frontend runs only the Vite server.
Separate Backend: pnpm dev:backend runs only the Express server.

Sources: CONTRIBUTING.md:36-44, package.json:5-15

Mock Mode Development

For developers without API access, gitSdm provides comprehensive mock data. When the owner of a repository is set to mock, the system fetches predefined structures and contents.

Mock Data Sources

GitHub Data: Located in server/github/mock-data.ts. It provides simulated file trees for gitsdm and a sample todo-app.
AI Responses: Handled via mockFallback functions in AI task files. For example, generateRepoRoast provides specific "roast" text for mock repositories to test UI rendering without hitting an LLM.

Sources: server/github/mock-data.ts:5-50, server/ai/tasks/playground.ts:32-45

Testing and Quality Control

The project uses bun test for its test suite. Test files are co-located with source files to maintain modularity.

# Run all tests once
pnpm test

# Run tests in watch mode for TDD
pnpm test:watch

# Check linting rules
pnpm lint

If modifications are made to the directory structure or file exports, the codebase graph should be updated:

pnpm exec graphify update .

Sources: CONTRIBUTING.md:47-66, package.json:11-14

Summary

Setting up the gitSdm local environment involves configuring a TypeScript-based stack with flexible AI provider support. By utilizing the provided pnpm or bun scripts and configuring the .env file, developers can toggle between live API analysis and a fully functional mock environment for rapid feature development and testing.

Sources: CONTRIBUTING.md:70-80, README.md:120-130

Testing Strategy & Setup

Relevant source files

The following files were used as context for generating this wiki page:

Testing Strategy & Setup

Introduction

The testing strategy for gitSdm is built around high-velocity feedback and comprehensive coverage of its core repository analysis and AI-driven features. The project utilizes Bun as its primary test runner and framework, emphasizing co-location of test files with source code to ensure maintainability and clarity. The suite encompasses unit tests for parsers, integration tests for GitHub API interactions, and validation for semantic search utilities.

The testing architecture relies heavily on a robust mocking system to simulate GitHub repository structures and AI provider responses. This allows for reliable testing of the backend logic without incurring API costs or hitting rate limits during development.

Sources: README.md:164-168, CONTRIBUTING.md:46-55

Testing Infrastructure

The project utilizes the Bun runtime's native testing capabilities, which support TypeScript out of the box. Tests are identified by the .test.ts extension and are generally located alongside the modules they verify.

Execution Commands

The following commands are defined in the project's configuration for managing the test lifecycle:

Command	Description
`bun test`	Executes all test suites once.
`bun run test:coverage`	Runs tests and generates a code coverage report.
`bun run test:watch`	Enters watch mode, re-running tests on file changes.

Sources: README.md:195-204, CONTRIBUTING.md:49-55

Test Suite Distribution

The project maintains over 25 test suites covering critical backend services. Key areas of focus include:

AI Services: provider.test.ts, service.test.ts.
GitHub Integration: client.test.ts, fetch-tree.test.ts, parse-url.test.ts.
Parsing & Analysis: dependency-analyzer.test.ts, file-classifier.test.ts, manifest-parsers/index.test.ts.
Search Engine: chunker.test.ts, qa-engine.test.ts, vector-store.test.ts.

Sources: README.md:206-231

Mocking Strategy & Data Simulation

To facilitate isolated testing, gitSdm implements a dual-layer mocking strategy: module-level mocking using Bun's mock utility and a dedicated mock data provider.

GitHub API Mocking

The server/github/mock-data.ts file provides a synthetic environment for testing repository ingestion. It includes predefined file lists and contents for two primary scenarios: a "gitSdm" mock repo and a "todo-app" mock repo.

graph TD
    subgraph TestExecution["Test Execution (fetch-tree.test.ts)"]
        TC[Test Case] -->|resolveOctokit| M[Mocked Octokit]
        TC -->|isMockRepo| MD[Mock Data Provider]
    end
    
    subgraph MockData["Mock Data Layer (mock-data.ts)"]
        MD -->|returns| INFO[Repo Info]
        MD -->|returns| TREE[Flat Tree Items]
        MD -->|returns| CONT[File Contents]
    end
    
    M -->|Intercepts API| MD

This diagram illustrates how tests intercept GitHub API calls and redirect them to the local mock data provider.

Key verification points in server/github/mock-data.test.ts include:

isMockRepo: Validates that repositories owned by "mock" are correctly identified.
fetchMockFileContents: Ensures the system returns specific content for files like package.json or placeholders for source files.
fetchMockTimeline: Generates synthetic commit history for visualizing activity patterns.

Sources: server/github/mock-data.test.ts:13-75, server/github/mock-data.ts:5-10

AI Task Fallbacks

AI task handlers (e.g., explain, roast, onboarding) implement a mockFallback function. When an AI provider is set to mock or fails, the system returns structured JSON data from these fallbacks, allowing UI components to be tested with realistic AI output without live API calls.

Sources: server/ai/tasks/playground.ts:31-42, server/ai/tasks/onboarding.ts:74-81

Core Validation Modules

Manifest Parsers

Tests in server/parser/manifest-parsers/index.test.ts verify the extraction of dependencies across multiple ecosystems. The parser registry is tested for its ability to route files to the correct specialized parser based on filename or regex patterns.

Ecosystems Covered: npm (package.json), Go (go.mod), Python (requirements.txt, pyproject.toml), Rust (Cargo.toml), and Java (pom.xml).
Resilience: Tests verify that corrupted or invalid manifest files return empty arrays rather than throwing errors.

Sources: server/parser/manifest-parsers/index.test.ts:16-118

File Classification Logic

The file classifier is tested to ensure accurate categorization of nodes for the visualization graph.

Category	Examples Verified in Tests
Entry	`src/index.ts`, `main.go`, `src/App.tsx`
Test	`.test.tsx`, `_spec.go`, `__tests__/*`
Config	`tsconfig.json`, `vite.config.ts`, `.env.*`
Doc	`README.md`, `LICENSE`
Asset	`assets/*.png`, `public/favicon.ico`

Sources: server/parser/file-classifier.test.ts:7-38

Search & Indexing Constants

The server/search/constants.test.ts file validates the foundational configuration for the semantic search engine, including:

Language Mapping: Ensures file extensions (like .tsx or .py) map to correct highlight/AST languages.
Cache Determinism: Verifies that searchCacheKey and indexCacheKey generate predictable strings for Vercel/LRU caching layers.
Thresholds: Confirms DEFAULT_MIN_SCORE and MAX_CHUNK_TOKENS are within valid operational ranges.

Sources: server/search/constants.test.ts:46-130

Integration Testing Flow

Integration tests for repository analysis simulate the full lifecycle from URL parsing to tree building.

sequenceDiagram
    participant T as "fetch-tree.test.ts"
    participant S as "fetch-tree.ts"
    participant M as "mock-data.ts"
    
    T->>S: fetchRepoInfo("mock-owner", "repo")
    S->>M: isMockRepo("mock-owner")
    M-->>S: true
    S->>M: fetchMockRepoInfo(...)
    M-->>S: Mock Repo Data (sha, stars, etc.)
    S-->>T: RepoInfo Object

This sequence shows the redirection logic used during integration tests to avoid external network dependencies.

Sources: server/github/fetch-tree.test.ts:106-115

Conclusion

The gitSdm testing strategy is optimized for a serverless, AI-integrated environment. By combining strict unit testing of parsers with a comprehensive mocking layer for GitHub and AI providers, the project ensures that architectural insights and visualization components remain reliable. The use of Bun as a unified runner facilitates a fast, co-located testing workflow that supports continuous integration and development.