-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Version: 1
Relevant source files
The following files were used as context for generating this wiki page:
gitSdm (Git Software Dependency Map) is an AI-powered repository intelligence platform designed to provide instant, interactive architecture visualizations of GitHub codebases. It transforms the traditional method of manually tracing dependencies and reading directory structures into a graph-first experience, allowing developers to understand unfamiliar codebases in seconds rather than days. Sources: README.md:1-25, server/ai/prompts.ts:47-50
The system utilizes a sophisticated pipeline that ingests a GitHub URL, parses manifest files (such as package.json), resolves imports, and generates an interactive dependency map using React Flow and Dagre layout engines. This visualization is further enriched by AI-driven insights, including architectural summaries, code explanations in ELI5 (Explain It Like I'm 5) mode, and health audits. Sources: README.md:139-160, src/components/home/HowItWorks.tsx:3-12
The project is structured as a modular full-stack application with a clear separation between the Express-based backend services and the React-based frontend visualization layer. Sources: README.md:33-72
| Directory | Purpose |
|---|---|
api/ |
Vercel serverless functions for deployment. |
server/ |
Core backend logic, including AI handlers, graph builders, and GitHub clients. |
src/ |
Frontend application containing UI components, visualization stores, and hooks. |
public/ |
Static assets and background workers for layout calculations. |
Sources: README.md:33-72
The following diagram illustrates the data flow from initial user input to the final interactive visualization.
graph TD
A[GitHub URL Input] --> B[Fetch Tree & Metadata]
B --> C[Parse Manifests]
C --> D[Resolve Imports]
D --> E[Build Dependency Graph]
E --> F[Generate Visual Map]
F --> G[Enrich with AI Insights]
style A fill:#161b22,stroke:#30363d
style G fill:#238636,stroke:#30363d
The pipeline processes repository data in six distinct stages to ensure accurate mapping and deep contextual understanding. Sources: src/components/home/HowItWorks.tsx:3-12, README.md:209-216
Before setting up the environment, ensure the following tools are installed:
- Bun >= 1.1 (Recommended for runtime and package management)
- Node.js >= 22 (Alternative backend support)
- pnpm >= 9 (Recommended if not using Bun)
- GitHub Personal Access Token (Optional, but recommended to increase API rate limits)
Sources: README.md:75-81, CONTRIBUTING.md:12-14
-
Clone the Repository:
git clone https://github.com/mbayue/gitSdm.git cd gitSdm -
Install Dependencies:
bun install # OR pnpm install -
Environment Setup:
Copy the example environment file and configure your keys:
cp .env.example .env
Sources: README.md:83-96, CONTRIBUTING.md:17-26
The platform supports multiple AI providers. Configuration is handled via the .env file.
| Variable | Description | Default / Options |
|---|---|---|
GITHUB_TOKEN |
GitHub API access token | Optional (increases rate limits) |
AI_PROVIDER |
Active AI service |
mock, gemini, openai, anthropic
|
GEMINI_API_KEY |
Key for Google Gemini | Required if provider is gemini
|
OPENAI_API_KEY |
Key for OpenAI | Required if provider is openai
|
ANTHROPIC_API_KEY |
Key for Anthropic Claude | Required if provider is anthropic
|
Sources: README.md:98-110, server/ai/provider.ts:24-58
gitSdm utilizes a concurrent development setup for both the Express backend and the Vite frontend. Sources: CONTRIBUTING.md:36-47
sequenceDiagram
participant Dev as Developer
participant Bun as Bun/pnpm Run Dev
participant BE as Express Backend (Port 3001)
participant FE as Vite Frontend (Port 5173)
Dev->>Bun: Execute 'pnpm dev'
Bun->>BE: Start Backend Service
Bun->>FE: Start Frontend UI
FE-->>Dev: Accessible at localhost:5173
BE-->>FE: API Endpoints available
Sources: CONTRIBUTING.md:36-47, README.md:112-115
| Command | Action |
|---|---|
bun dev |
Starts frontend and backend concurrently. |
bun run build |
Generates a production-ready build in the dist/ directory. |
bun test |
Runs the full test suite (25 test suites). |
pnpm exec graphify update . |
Updates the interactive directory-topology mapping. |
Sources: README.md:112-124, CONTRIBUTING.md:65-70
The AIProvider system is designed to be plug-and-playable. It automatically detects the provider based on the environment variables or explicit overrides. Sources: server/ai/provider.ts:24-58
-
Explicit Precedence: If
AI_PROVIDERis set in the environment, it overrides all other detections. -
Key-based Auto-detection: The system scans for
GEMINI_API_KEY,OPENAI_API_KEY, orANTHROPIC_API_KEYin that specific order. -
Mock Fallback: If no keys are provided, the system defaults to a
mockprovider, which returns predefined architectural summaries and "roasts" for demonstration purposes.
Sources: server/ai/provider.ts:39-65, server/ai/provider.ts:168-240
The system uses a SYSTEM_PROMPT that instructs the AI to act as a "principal software architect." It focuses on four core principles:
- Specificity: Referencing real file names and structures.
- Senior Engineering Perspective: Identifying architectural tradeoffs.
- Developer Empathy: Addressing the specific needs of someone onboarding.
- Technical Language: Using terms like "request lifecycle" and "dependency injection."
Sources: server/ai/prompts.ts:47-75
The platform uses @xyflow/react (React Flow) and d3-force to render force-directed graphs. Files are classified by type (e.g., component, utility, config) and visually differentiated. Sources: README.md:139-146, src/components/viz/OverviewTab.tsx:180-210
Users can request natural language explanations of the codebase. The explainRepoELI5 task provides a "5-minute tour" of the project, covering the "Big Picture" and "Key Areas to Know" using friendly, conversational language. Sources: server/ai/tasks/onboarding.ts:116-146, server/ai/tasks/explain.ts:85-110
The AI performs rigorous health assessments across five dimensions:
- Maintainability
- Modularity
- Readability
- Architecture
- Complexity (Inverse complexity)
Sources: server/ai/tasks/refactor.ts:80-110, src/components/viz/ai-sidebar/AiCenterTab.tsx:140-165
Final summary of the gitSdm ecosystem: The project serves as a comprehensive intelligence layer for GitHub repositories, combining static analysis with LLM reasoning to bridge the gap between code and documentation. Sources: README.md:1-25, server/ai/prompts.ts:47-52
Relevant source files
The following files were used as context for generating this wiki page:
Repository Intelligence & Analytics in gitSdm comprises a suite of tools designed to provide developers with instant, deep insights into codebase architecture, health, and evolution. By combining metadata analysis with AI-powered diagnostics, the system transforms raw repository data into actionable intelligence, including dependency mapping, risk assessment, and contributor activity tracking.
The system operates by ingestng repository structures via the GitHub API and processing this data through a pipeline of parsers, graph builders, and Large Language Models (LLMs). This allows for features ranging from high-level architecture overviews to granular file-level explanations and automated refactoring suggestions. Sources: README.md:12-23, server/ai/prompts.ts:37-56
The intelligence layer utilizes a "Principal Software Architect" persona to evaluate codebases. It constructs a rich context from the repository's metadata, root-level structure, important entry files, and dependencies to provide senior-level architectural reviews.
The system builds a comprehensive context string that includes:
- Metadata: Name, description, primary language, stars, forks, and license.
- Structural Data: Root-level directory listing and a flat list of up to 120 detected files for deep context.
- Dependency Map: A list of detected dependencies including ecosystem and type.
- Activity Metrics: Recent commit history and top contributors.
Sources: server/ai/prompts.ts:3-35, server/ai/prompts.ts:37-41
flowchart TD
A[RepoAnalysis Object] --> B{Context Builder}
B --> C[Meta: Stats & License]
B --> D[Structure: Top Dirs & Files]
B --> E[Deps: Ecosystem & Versions]
B --> F[Timeline: Recent Activity]
C & D & E & F --> G[Markdown Context Prompt]
G --> H[AI Provider Tasks]
The diagram shows how the RepoAnalysis data is aggregated into a structured prompt for AI tasks.
Sources: server/ai/prompts.ts:3-35
gitSdm performs rigorous health assessments and identifies architectural risks through dedicated AI tasks. These assessments are visualized in the AiCenterTab and OverviewTab components.
The health report scores five key dimensions on a scale of 0-100:
| Metric | Description | Source Criteria |
|---|---|---|
| Maintainability | Ease of changing the codebase | Module size, separation of concerns |
| Modularity | Decomposition quality | Directory structure, coupling indicators |
| Readability | Ease of understanding | Naming conventions, organization |
| Architecture | Overall design soundness | Layer separation, entry points |
| Complexity | Inverse complexity score | File count, nesting depth, deps |
Sources: server/ai/tasks/refactor.ts:85-98, src/components/viz/ai-sidebar/AiCenterTab.tsx:32-36
The system identifies impactful refactoring opportunities, categorizing them by risk level (High, Medium, Low) and domain (e.g., Performance, DRY, Coupling). Sources: server/ai/tasks/refactor.ts:18-28, src/components/viz/ai-sidebar/AiCenterTab.tsx:64-70
The frontend provides real-time analytics dashboards that visualize the statistical properties of the repository.
The OverviewTab calculates and displays core repository statistics:
- Node Counts: Total files and folders.
- Dependency Count: External packages detected in manifests.
- High Coupling: Identifies the top 5 files or folders with the highest degree of incoming/outgoing edges in the graph.
-
Entry Points: Surfaces files classified as 'entry' by the parser (e.g.,
src/main.tsx).
Sources: src/components/viz/OverviewTab.tsx:43-61, server/github/mock-data.ts:25-60
The system visualizes commit density over a 24-week period using a bar chart format. This data is derived from the repository timeline and displays the relative intensity of development activity. Sources: src/components/viz/OverviewTab.tsx:280-302, server/github/mock-data.ts:266-285
sequenceDiagram
participant UI as "OverviewTab"
participant Store as "VizStore"
participant RF as "ReactFlow"
UI->>UI: Calculate Degree Centrality
UI->>UI: Filter High Coupling Nodes
UI->>Store: Set Selected Node ID
UI->>RF: setCenter(x, y, zoom)
Note over RF: Smooth transition to node
This sequence illustrates the interaction between the analytics UI and the visualization engine when a developer interacts with a "High Coupling" node. Sources: src/components/viz/OverviewTab.tsx:24-41
The analysis is based on the RepoAnalysis type, which encapsulates the entire state of the repository's intelligence data.
// server/ai/prompts.ts:3
import type { RepoAnalysis } from '../../src/types';
// server/github/mock-data.ts:266-271
export async function fetchMockTimeline(): Promise<TimelineWeek[]> {
// ...
// week: string (ISO date)
// count: number (commit count)
// commits: Commit[]
}Sources: server/ai/prompts.ts:3, server/github/mock-data.ts:266-271
The executeAiTask for refactoring uses the SYSTEM_PROMPT to enforce senior-level engineering standards, ensuring that suggestions are grounded in real file paths and structural decisions rather than generic patterns.
Sources: server/ai/tasks/refactor.ts:16-33, server/ai/prompts.ts:37-56
Repository Intelligence & Analytics serves as the cognitive engine of gitSdm, providing a transition from simple file browsing to deep architectural understanding through automated analysis and interactive visualization.
Relevant source files
The following files were used as context for generating this wiki page:
gitSdm is a graph-first repository analysis platform designed to transform raw GitHub source code into interactive architectural visualizations. The system employs a decoupled architecture consisting of a React-based frontend for visualization and a Node.js backend for ingestion, dependency parsing, and AI-driven intelligence. The core purpose is to provide instant insights into module boundaries, dependency flows, and architectural health that would otherwise require days of manual code review.
Sources: README.md:15-18, README.md:105-110
The project follows a pipeline-oriented architecture. Data flows from a user-provided GitHub URL through a series of server-side analysis stages, eventually being rendered as a force-directed graph on a React Flow canvas.
The analysis process is divided into six distinct stages:
- GitHub URL Ingestion: Users provide a public repository URL.
- Tree Fetching: The system retrieves the file structure and metadata via the GitHub API.
-
Manifest Parsing: Workspaces and dependencies are identified from files like
package.json. - Import Resolution: Connections are traced across the codebase.
- Graph Construction: An interactive dependency map is generated using layout engines.
- AI Enrichment: Intelligence layers generate summaries, onboarding steps, and refactoring risks.
Sources: src/components/home/HowItWorks.tsx:3-10, README.md:143-150
The following diagram illustrates the relationship between the browser, the API router, and the underlying services.
graph TD
User[Developer Browser] -->|Requests| Router[Vite/Express API Router]
Router -->|Parses GitHub| GitHubService[GitHub Tree Fetcher]
Router -->|Orchestrates AI| AIService[AI Provider Manager]
GitHubService -->|Manifest Contents| DepParser[Dependency Analyzer]
GitHubService -->|File Tree| GraphBuilder[Graph Builder Engine]
GraphBuilder -->|Positions Nodes| Layout[Dagre Layout Engine]
Layout -->|Graph Data| UI[React Flow Canvas]
AIService -->|Markdown/JSON| UI
class User entry;
class Router router;
class AIService service;
class Layout util;
Sources: server/ai/provider.ts:167-177, server/ai/tasks/diagram.ts:60-61
The backend is structured as a modular Node.js application, often deployed as Vercel serverless functions or as a standalone Express server. It handles heavy-duty tasks such as GitHub API communication, dependency resolution, and AI task orchestration.
Sources: README.md:58-69, CONTRIBUTING.md:29-30
The AI system is provider-agnostic, supporting Google Gemini, OpenAI, and Anthropic. It uses a factory pattern to instantiate providers based on environment variables or user-provided keys.
| Component | Responsibility | Relevant Files |
|---|---|---|
| AI Provider | Normalizes requests to different LLM SDKs (Gemini, OpenAI, Claude). | server/ai/provider.ts |
| Task Handlers | Specific logic for explaining code, refactoring, or generating diagrams. | server/ai/tasks/ |
| Prompt Builder | Constructs context-rich prompts using repository metadata and file trees. | server/ai/prompts.ts |
| API Router | Exposes AI capabilities via REST endpoints. | server/router/ai-routes.ts |
Sources: server/ai/provider.ts:41-65, server/router/ai-routes.ts:25-132
When a user requests an architectural summary or "ELI5" explanation, the following sequence occurs:
sequenceDiagram
participant UI as "Frontend UI"
participant API as "AI Router"
participant Task as "Task Handler"
participant Prov as "AI Provider"
UI->>API: POST /api/ai/explain
API->>Task: explainRepo(params)
Task->>Task: Build context from analysis
Task->>Prov: complete(messages)
Prov-->>Task: LLM String/JSON Response
Task-->>API: Result + Cache Status
API-->>UI: AIResponse Object
Sources: server/router/ai-routes.ts:33-41, server/ai/tasks/explain.ts:22-40
The frontend is a React 19 Single Page Application (SPA) powered by Vite. It focuses on rendering the complex graph data and providing interactive tools for exploration.
-
Graph Canvas: Uses
@xyflow/react(React Flow) andd3-forcefor rendering the dependency graph. -
State Management:
Zustandis used for global state, such asvizStore, which tracks selected nodes and UI panel visibility. - Architecture View: An interactive block diagram component that allows users to toggle between a static "Code Graph" (AST-based) and an "AI Enhanced" view (LLM-based).
Sources: README.md:120-125, src/components/viz/ArchitectureView.tsx:32-47, README.md:78-80
The application layout is organized into functional zones:
- Explorer: A dock for file inspection and code viewing.
- Viz Sidebar: Contains AI chat tabs, health audits, and learning paths.
- Top Nav: Handles repository searching, branch switching, and statistics.
Sources: README.md:73-82, src/components/viz/OverviewTab.tsx:145-165
The system relies on a central RepoAnalysis data structure that encapsulates the repository's metadata, file tree, dependencies, and graph nodes.
| Field | Type | Description |
|---|---|---|
meta |
Object | GitHub metadata (stars, forks, owner, repo name). |
tree |
Array | Hierarchical file structure. |
graph |
Object | Nodes and edges representing file relationships. |
dependencies |
Array | List of packages found in manifests. |
timeline |
Array | Commit activity patterns over time. |
Sources: server/ai/prompts.ts:3-15, src/components/viz/OverviewTab.tsx:40-44
| Endpoint | Method | Purpose |
|---|---|---|
/api/ai/explain |
POST | Analyzes a specific file, node, or the whole repo. |
/api/ai/architecture |
POST | Returns a JSON-structured layer analysis. |
/api/ai/health |
POST | Generates scores for maintainability and modularity. |
/api/ai/mermaid |
POST | Generates a Mermaid.js flowchart string. |
/api/ai/roast |
POST | Generates a humorous technical critique of the repo. |
Sources: server/router/ai-routes.ts:33-132
The high-level architecture of gitSdm emphasizes a separation between data ingestion and visual presentation. By leveraging a robust analysis pipeline on the backend and a flexible React Flow interface on the frontend, the platform provides a comprehensive environment for codebase intelligence. The integration of a provider-agnostic AI layer ensures that the architectural insights stay relevant across different project types and scales.
Sources: README.md:20-25, server/ai/provider.ts:241-250
Relevant source files
The following files were used as context for generating this wiki page:
The backend API routing system in gitSdm is designed as a modular middleware architecture that processes incoming HTTP requests, validates payloads, and orchestrates interactions between the GitHub API and various AI providers. The system acts as the primary bridge between the React frontend and the backend service layer, handling tasks ranging from repository analysis to AI-driven code explanations.
Routing is centralized in server/api-router.ts, which delegates specific domain logic to sub-routers such as server/router/ai-routes.ts, repo-routes.ts, and search-routes.ts. This separation ensures that the request lifecycle—including authentication, rate limiting, and caching—is managed consistently across the platform.
Sources: README.md:31-45, server/router/ai-routes.ts:31-36
The routing layer follows a functional pattern where handlers receive the request context, including environment-specific tokens and configuration. The system differentiates between standard repository management and specialized AI tasks.
When a request hits an endpoint, it undergoes the following lifecycle:
-
Path Identification: The router matches the
pathnameagainst predefined API strings. -
Schema Validation: Request bodies are parsed and validated using Zod schemas (e.g.,
aiExplainSchema,repoQuerySchema) to ensure type safety before execution. -
Context Injection: Handlers receive a
RequestContextand optional GitHub or AI provider tokens provided by the user or environment. -
Task Delegation: The router calls specialized service functions (e.g.,
explainRepo,generateOnboarding) which contain the business logic.
Sources: server/router/ai-routes.ts:38-45, server/router/ai-routes.ts:133-138
The following diagram illustrates the sequence of operations from the moment the frontend initiates an API call to the final JSON response.
sequenceDiagram
participant FE as Frontend (React)
participant R as AI Router (handleAiRoutes)
participant V as Zod Validator
participant S as AI Task Service
participant P as AI Provider (LLM)
FE->>R: POST /api/ai/explain
Note right of R: Extracts userKey & GitHubToken
R->>V: safeParse(req.json())
alt Invalid Input
V-->>R: Parse Error
R-->>FE: 400 Bad Request
else Valid Input
V-->>R: Validated Data
R->>S: explainRepo(params)
S->>P: complete(prompt)
P-->>S: Markdown/JSON Response
S-->>R: Result + Cache Status
R-->>FE: 200 OK (JSON)
end
Sources: server/router/ai-routes.ts:38-51, server/ai/provider.ts:46-65
The handleAiRoutes function in server/router/ai-routes.ts is the primary entry point for all intelligence-related features. It manages a wide array of endpoints that interact with large language models to provide repository insights.
| Endpoint | Function | Description |
|---|---|---|
/api/ai/explain |
explainRepo |
Provides a detailed overview of a repo, file, or node. Supports "ELI5" mode. |
/api/ai/architecture |
explainArchitecture |
Identifies 4-7 architectural layers (Presentation, API, Core, etc.). |
/api/ai/onboarding |
generateOnboarding |
Creates a 6-step walkthrough for new developers joining a project. |
/api/ai/learning-path |
generateLearningPath |
Generates a mental model and recommended file reading order. |
/api/ai/mermaid |
generateMermaidDiagram |
Produces Mermaid.js code for system flow visualization. |
/api/ai/roast |
generateRepoRoast |
Generates a sarcastic, witty critique of the codebase. |
Sources: server/router/ai-routes.ts:40-145, server/ai/tasks/playground.ts:12-45, server/ai/tasks/explain.ts:10-30
The routing system is provider-agnostic. Depending on the AI_PROVIDER environment variable or a user-supplied userKey, the router communicates with the AIProvider interface. This allows the backend to switch between Google Gemini, OpenAI, Anthropic, or a local mock provider for testing without changing the route definitions.
Sources: server/ai/provider.ts:25-44, server/router/ai-routes.ts:33-35
The router uses a custom AppError class to handle validation failures and service-level exceptions. For example, if required parameters like owner or repo are missing from a query, the router throws a 400 status with an INVALID_PARAMS code.
// server/router/ai-routes.ts:41-45
if (pathname === '/api/ai/explain') {
const body = await req.json().catch(() => ({}));
const parsed = aiExplainSchema.safeParse(body);
if (!parsed.success) {
throw new AppError(400, 'Invalid request', 'VALIDATION_ERROR', false, parsed.error.flatten());
}
// ...
}Sources: server/router/ai-routes.ts:41-45, server/router/ai-routes.ts:54-57
To support development without active API keys, the routing logic integrates with a mock system. When the mock provider is active, routes return pre-defined responses for known repositories (like gitsdm or mock-todo-app) instead of querying an LLM.
Sources: server/ai/provider.ts:129-150, server/github/mock-data.ts:20-50
Backend API Routing in gitSdm serves as a structured gateway that translates high-level frontend requests into complex AI and GitHub operations. By utilizing modular routers and schema-based validation, the system maintains a high degree of reliability while remaining flexible enough to support multiple AI backends and diverse repository analysis tasks.
Sources: README.md:120-130, server/router/ai-routes.ts:31-40
Relevant source files
The following files were used as context for generating this wiki page:
The Interactive Graph Visualization system in gitSdm is a core feature designed to transform flat repository structures into navigable, multi-dimensional maps. It enables developers to visualize file relationships, dependency chains, and module boundaries through a combination of force-directed physics and hierarchical layout algorithms.
The visualization engine utilizes @xyflow/react (React Flow) for the interactive canvas and d3-force for dynamic node positioning. This allows for real-time filtering, "Blast Radius" impact analysis, and deep-dive inspection of individual code modules. Sources: README.md:144-150, README.md:19-25
The visualization pipeline begins with repository analysis and ends with an interactive React component. Data flows from the server-side analysis (parsing imports and manifests) into a graph structure consisting of nodes (files, folders, repositories) and edges (imports, dependencies).
graph TD
subgraph Server_Logic ["Server-Side Analysis"]
A[GitHub Repository] --> B[AST Parser]
B --> C[Dependency Analyzer]
C --> D[Graph Builder]
end
subgraph Layout_Engine ["Layout & Orchestration"]
D --> E[Dagre Layout Engine]
D --> F[D3-Force Engine]
end
subgraph Frontend_UI ["Visualization Layer"]
E --> G[React Flow Canvas]
F --> G
G --> H[Interactive Controls]
G --> I[Inspector / AI Sidebar]
end
class A entry;
class D service;
class G router;
This diagram illustrates the progression from raw GitHub data to the final interactive UI components used for codebase exploration. Sources: README.md:168-185, server/ai/tasks/diagram.ts:45-55
The GraphCanvas component serves as the host for the entire visualization workspace. It manages state for filtering, zooming, and export functionality. It integrates the NetworkCanvas (utilizing react-force-graph-2d) and provides a floating UI for legend and control panels.
Sources: src/features/graph/canvas/GraphCanvas.tsx:32-60
The system supports multiple layout strategies to represent different architectural perspectives:
-
Force-Directed: Utilizes
d3-forcefor a dynamic, organic clustering of nodes based on connectivity. Sources: README.md:144 - Hierarchical (Dagre): Used for structured layouts such as Top-to-Bottom (TB) or Left-to-Right (LR). This is particularly useful for visualizing execution flows and tree structures. Sources: server/graph/layout.test.ts:7-35
Separate from the main interactive graph, the ArchitectureView provides high-level block diagrams. It can generate "Code Graphs" based on static analysis or "AI Enhanced" diagrams that group components into logical subgraphs like "Services", "Controllers", and "Database".
Sources: src/components/viz/ArchitectureView.tsx:43-65, server/ai/tasks/diagram.ts:15-35
| Feature | Implementation Detail | Source |
|---|---|---|
| Node Focusing | Centers the view on a specific file or folder with a transition duration of 480ms and 1.3x zoom. | src/components/viz/OverviewTab.tsx:23-45 |
| Filtering | Users can toggle node types (file, folder), diff status, and content filters to prune the graph. | src/features/graph/canvas/GraphCanvas.tsx:47-65 |
| Blast Radius | Visualizer showing transitive dependents to predict how changes to one file affect others. | README.md:200, src/features/graph/canvas/GraphCanvas.tsx:55 |
| Exporting | Supports high-resolution exports to PNG, SVG, and PDF. | src/features/graph/canvas/GraphCanvas.tsx:112-118, src/components/viz/ArchitectureView.tsx:102-120 |
Nodes are visually classified to provide immediate context regarding their role in the codebase.
-
Color Coding: Assigned based on file extension or node type (e.g.,
#a78bfafor repositories,#fbbf24for folders, and#3b82f6for TypeScript files). - Sizing: Node radii vary by type (Repo: 14, Folder: 12, File: 8).
-
AI Indicators: AI-enhanced views add specific CSS classes such as
entry,router,service, andutilto nodes for semantic highlighting. Sources: server/graph/layout.test.ts:49-65, server/ai/tasks/diagram.ts:25-30
sequenceDiagram
participant User
participant Store as "VizStore"
participant RF as "ReactFlow Instance"
User->>Store: Select Node (File/Folder)
Store->>User: Update Inspector Panel
User->>RF: Trigger focusOnNode(nodeId)
RF->>RF: Calculate Node Center (x, y)
RF->>User: Smooth zoom & Pan to Node
The sequence of focusing on a specific node within the interactive workspace. Sources: src/components/viz/OverviewTab.tsx:23-45
The graph data structure consists of GraphNode and GraphEdge types. The layout engine calculates positions either on the fly (for force-directed) or via pre-calculated coordinates using Dagre.
// Positioning logic example from Dagre implementation
const laidOut = applyDagreLayout(nodes, edges, 'TB');
expect(laidOut[1].position.y).toBeGreaterThan(laidOut[0].position.y);Sources: server/graph/layout.test.ts:18-21
In the ArchitectureView, SVG rendering is handled by the Mermaid engine, allowing users to copy the raw Mermaid code or export the generated SVG directly to their clipboard.
Sources: src/components/viz/ArchitectureView.tsx:88-100
Interactive Graph Visualization is the primary interface for gitSdm, consolidating static analysis, layout physics, and AI insights into a single interactive canvas for deep repository intelligence.
Relevant source files
The following files were used as context for generating this wiki page:
The AI Architecture within gitSdm is a multi-layered system designed to provide deep, automated insights into software repositories. It leverages Large Language Models (LLMs) to transform raw file structures and dependency data into human-readable architectural summaries, health audits, and onboarding guides. The system is built on a provider-agnostic backend that supports multiple AI engines, including Google Gemini, OpenAI, and Anthropic Claude.
At a high level, the system ingests repository metadata and file trees to construct a rich context for the AI. This context is then processed through specialized task handlers—such as "Refactor," "Explain," and "Onboarding"—which apply specific system prompts and logical constraints to ensure technically accurate and developer-empathetic responses.
Sources: server/ai/provider.ts, server/ai/prompts.ts, README.md
The system employs a factory pattern to manage different AI service providers. The createProvider function detects the appropriate engine based on environment variables or explicit overrides (e.g., API keys starting with sk-ant- for Anthropic or sk- for OpenAI).
The backend supports a "Mock" provider for development and fallback scenarios, ensuring the UI remains functional without active API keys.
graph TD
Start[Get AI Provider] --> CheckOverride{Override Key?}
CheckOverride -- Yes --> Detect[Detect Provider Type]
CheckOverride -- No --> EnvCheck{Check ENV Vars}
Detect --> OpenAI[Create OpenAI Provider]
Detect --> Anthropic[Create Anthropic Provider]
Detect --> Gemini[Create Gemini Provider]
EnvCheck -- AI_PROVIDER=openai --> OpenAI
EnvCheck -- AI_PROVIDER=gemini --> Gemini
EnvCheck -- AI_PROVIDER=anthropic --> Anthropic
EnvCheck -- None --> Mock[Create Mock Provider]
The diagram shows the logic used to instantiate specific AI service implementations based on configuration.
Sources: server/ai/provider.ts:25-65
| Provider | Default Model | Key Env Variable | API Version |
|---|---|---|---|
| Gemini | gemini-2.5-flash |
GEMINI_API_KEY |
v1alpha |
| OpenAI | gpt-4o-mini |
OPENAI_API_KEY |
N/A |
| Anthropic | claude-3-5-haiku-latest |
ANTHROPIC_API_KEY |
N/A |
Sources: server/ai/provider.ts:71-125
The accuracy of AI insights depends on the buildRepoContext utility, which flattens the repository's RepoAnalysis data into a structured prompt. This includes metadata, the top 20 root directories, the first 40 dependencies, and up to 120 detected files for "deep context".
- Repository Meta: Name, description, language, stars, forks, and license.
- Activity Metrics: Recent commit counts and top contributors.
- Structural Data: Flat file lists and dependency ecosystems (e.g., npm, PyPI).
Sources: server/ai/prompts.ts:3-40
The SYSTEM_PROMPT enforces a "Principal Software Architect" persona, requiring the AI to be specific rather than generic, address developer empathy, and strictly avoid fabricating files not present in the provided context.
Sources: server/ai/prompts.ts:42-63
The system divides AI operations into distinct "Tasks," each responsible for a specific type of repository intelligence.
The explainRepo function handles scoped analysis for the entire repository, specific nodes in the graph, or individual files. It supports an "ELI5" (Explain Like I'm 5) mode for simplified onboarding.
sequenceDiagram
participant UI as Client UI
participant Route as AI Routes
participant Task as explain.ts
participant AI as AI Provider
UI->>Route: POST /api/ai/explain (params)
Route->>Task: explainRepo(params)
Task->>AI: complete(SystemPrompt + Context)
AI-->>Task: Markdown Content
Task-->>Route: AIExplainResponse
Route-->>UI: JSON { explanation, cached }
This flow illustrates how the system processes requests for code explanations, moving from the UI to the AI provider and back.
Sources: server/ai/tasks/explain.ts:10-50, server/router/ai-routes.ts:34-42
The system performs "rigorous codebase health assessments" by scoring five dimensions: maintainability, modularity, readability, architecture, and complexity.
| Metric | Evaluation Criteria |
|---|---|
| Maintainability | Ease of change, module size, and config clarity. |
| Modularity | Directory structure and coupling indicators. |
| Complexity | Inverse score; higher means lower nesting depth and file count. |
Sources: server/ai/tasks/refactor.ts:88-115
The generateLearningPath task creates a structured plan for new developers, including:
- Mental Model: Architecture type and core flow tagline.
- Recommended Path: Files to read ordered by importance (0-100).
- Execution Flow: Step-by-step data transit between files.
Sources: server/ai/tasks/playground.ts:114-160, src/types/api.ts:88-112
The AiCenterTab serves as the primary interface for AI interactions. It organizes tools into "Core Analysis" (Explain, Health, Risks) and "Creative Tools" (Roast, Readme Enhancer).
- IntelligenceCard: Displays the AI's markdown response or score visualizations.
- RiskCard: Shows specific refactoring risks with associated file badges that allow users to jump directly to the relevant code node.
- ToolSection: Categorized buttons that trigger backend AI routes.
Sources: src/components/viz/ai-sidebar/AiCenterTab.tsx:45-120
The UI dynamically colors scores to provide instant feedback on repository quality:
- Emerald (>= 80%): High quality/Maintainable.
- Amber (>= 60%): Moderate risk/Technical debt.
- Rose (< 60%): Critical issues/Low maintainability.
Sources: src/components/viz/ai-sidebar/AiCenterTab.tsx:32-36
The AI Architecture & File Insights system provides a comprehensive suite of tools for repository analysis. By combining multi-provider LLM support with specialized prompt engineering and a dedicated UI sidebar, it allows developers to quickly grasp the architecture, health, and entry points of unfamiliar codebases. The modular task-based design ensures that insights—ranging from sarcastic "Roasts" to technical refactoring suggestions—are grounded in the actual file structure and dependencies of the repository.
Relevant source files
The following files were used as context for generating this wiki page:
The Semantic Search & Q&A Engine is a core intelligence component of the gitSdm platform designed to provide developers with instant, context-aware understanding of a codebase. It bypasses traditional keyword matching by using vector-based embeddings to locate code snippets based on meaning and utility, and leverages Large Language Models (LLMs) to answer complex architectural questions with direct citations to the source code.
This system operates in two primary modes: Search Mode, which retrieves relevant code chunks based on semantic similarity, and Ask Mode, which synthesizes an explanatory answer using retrieved code as context. The engine relies on a pre-indexed repository state where code is broken into chunks and stored in a vector space.
Sources: README.md:162-162, src/pages/SearchPage.tsx:123-128
The engine follows a Retrieval-Augmented Generation (RAG) architecture. When a user submits a query, the system first retrieves the most relevant code sections from the indexed repository before passing them to an AI provider for final processing.
The QAEngine orchestrates the flow from query to answer. It utilizes a searchEngine to fetch the top 5 most relevant code chunks that meet a minimum similarity score threshold. If no relevant chunks are found, it returns a standard "information not available" message.
sequenceDiagram
participant U as User Interface
participant API as API Client
participant QA as QA Engine
participant SE as Search Engine
participant AI as AI Provider
U->>API: POST /api/search/ask
API->>QA: ask(options)
QA->>SE: search(query, topK=5)
SE-->>QA: Relevant Code Chunks
alt Chunks Found
QA->>QA: Build Context Prompt
QA->>AI: complete(systemPrompt, userPrompt)
AI-->>QA: Generated Answer
QA-->>API: Answer + Citations
API-->>U: Render QAAnswerView
else No Chunks
QA-->>API: Not Available Message
API-->>U: Render Empty State
end
The diagram shows the synchronous flow of data from the initial user request through the search retrieval and AI generation phases. Sources: server/search/qa-engine.ts:13-68
The QAEngine is implemented as a singleton service. Its primary responsibility is the transformation of retrieved code chunks into a structured prompt for the AI.
- Context Building: It formats source chunks with file paths and line ranges to ensure the LLM has explicit references.
-
System Prompting: It enforces a strict response structure using Markdown headers:
### Summary,### How it works, and### Related files. - Constraint Enforcement: The AI is instructed to answer using ONLY the provided code context and to avoid referencing files not present in the search results.
Sources: server/search/qa-engine.ts:70-85, server/search/qa-engine.ts:87-92
The SearchPage manages the user interaction and state. It utilizes a mode toggle to switch between semantic search and Q&A.
-
Caching: Results are cached locally using
searchCacheandaskCacheto provide instantaneous responses for repeated queries. -
Indexing Awareness: The UI prevents search actions unless the repository has been successfully indexed (
indexingStatus.state === 'complete'). - Navigation Integration: Users can click on citations or search results to navigate directly to the file in the repository visualizer.
Sources: src/pages/SearchPage.tsx:41-65, src/pages/SearchPage.tsx:94-103
The system uses specific TypeScript interfaces to ensure type safety across the network boundary between the Express backend and the React frontend.
| Interface | Field | Type | Description |
|---|---|---|---|
SearchResultCard |
snippet |
string |
The actual code content found. |
score |
number |
Similarity score from the vector search. | |
QAResponse |
answer |
string |
The AI-generated explanation. |
citations |
Citation[] |
List of source files used for the answer. | |
IndexingStatus |
state |
string |
Current status: idle, indexing, complete, or failed. |
Sources: src/types/api.ts:140-172
The apiClient provides the following methods for interacting with the Semantic Search & Q&A Engine:
-
semanticSearch(query, owner, repo, branch): Triggers a POST request to/api/searchto find code snippets. -
semanticAsk(question, owner, repo, branch): Triggers a POST request to/api/search/askfor architectural Q&A. -
triggerIndexing(owner, repo, branch): Initiates the vectorization process for a repository. -
fetchIndexingStatus(owner, repo): Polls for the current state of the repository index.
Sources: src/lib/apiClient.ts:182-225
The QAAnswerView component is responsible for rendering the AI's response. It includes a custom Markdown renderer that handles specialized formatting:
- Code Highlighting: Renders inline code and code blocks within the AI's explanation.
-
Interactive Citations: Renders a list of sources at the bottom of the answer. Each source is a button that, when clicked, triggers the
onSelectFilecallback to open the file in the project's inspector. - Formatting: Handles bold text, lists, and hierarchical headers as defined in the engine's system prompt.
Sources: src/features/search/QAAnswerView.tsx:11-50, src/features/search/QAAnswerView.tsx:55-88
The Semantic Search & Q&A Engine provides a sophisticated layer of repository intelligence by combining vector search retrieval with LLM synthesis. By enforcing strict context boundaries and structured output formats, it ensures that technical explanations remain grounded in the actual codebase, providing developers with a reliable tool for navigating and understanding complex architectures.
Relevant source files
The following files were used as context for generating this wiki page:
The Learning Paths Simulation is an AI-driven feature within gitSdm designed to accelerate developer onboarding by transforming complex codebase structures into digestible, step-by-step educational journeys. It synthesizes architectural "mental models," recommends critical files for initial reading, and maps out typical execution flows to provide instant understanding of unfamiliar projects.
This system leverages Large Language Models (LLMs) to analyze repository metadata, file trees, and dependency manifests, generating a structured onboarding intelligence report that is then rendered as an interactive "Guided Codebase Tour" in the frontend visualization workspace. Sources: README.md:28-34, server/ai/tasks/playground.ts:118-125
The Learning Path Simulation operates as a pipeline starting from repository ingestion to AI synthesis and ending at the interactive UI tab.
The core logic resides in the generateLearningPath function, which utilizes the executeAiTask service. This task sends a specialized prompt to the AI provider (Gemini, OpenAI, or Anthropic) containing the repository's context, including the file tree, top contributors, and dependency list.
The AI is instructed to return a JSON object with the following schema:
- Mental Model: A high-level description of the architecture type (e.g., Modular Service, Layered MVC).
- Recommended Path: A list of 5-8 critical files with importance scores and roles.
- Execution Flow: A sequence of steps showing data transfer between components.
- Insights: Architecture summaries, detected risks, and contribution suggestions.
Sources: server/ai/tasks/playground.ts:153-194, server/ai/prompts.ts:4-37
The following diagram illustrates the flow from a user submitting a URL to the generation of the simulation data.
graph TD
User[User Input URL] --> Ingest[Repo Ingestion]
Ingest --> Analyze[Repository Analysis Engine]
Analyze --> Context[Context Builder]
Context --> AI_Task[generateLearningPath Task]
AI_Task --> LLM[AI Provider Service]
LLM --> JSON[Structured Learning Data]
JSON --> UI[LearningPathTab Rendering]
UI --> Interactive[Guided Codebase Tour]
Sources: server/ai/tasks/playground.ts:228-234, src/components/home/HowItWorks.tsx:4-11
The backend ensures that the simulation data follows a strict type definition to maintain UI consistency.
| Field | Type | Description |
|---|---|---|
mentalModel |
Object |
Contains the type, concept, and description of the codebase. |
recommendedPath |
Array |
List of objects containing path, importance, reason, and role. |
executionFlow |
Object |
Contains steps (from/to/description) and visualSteps (file path array). |
insights |
Object |
Contains architecture summary, risks array, and suggestions array. |
Sources: server/ai/tasks/playground.ts:145-151
The LearningPathTab component provides the interface for developers to interact with the simulated path. It uses framer-motion for animations and lucide-react for visual cues.
Key features include:
- Smart Focus Filters: Allows users to filter the learning path by "API/Routes," "UI/Components," "Core Services," or "Configuration".
-
Node Synchronization: Clicking a file path in the tour triggers
setSelectedNodeIdandsetFocusedFilePathin thevizStore, centering the main graph canvas on the relevant file. - Contextual Actions: Each step in the path offers an "OPEN" action to view source code and an "EXPLAIN" action to trigger a specific AI analysis of that file.
Sources: src/components/viz/LearningPathTab.tsx:10-30, src/components/viz/LearningPathTab.tsx:125-155
A parallel system, generateOnboarding, provides a 6-step walkthrough that builds progressively from high-level concepts to deployment setup.
sequenceDiagram
participant D as Developer
participant UI as LearningPathTab
participant AI as AI Onboarding Task
participant S as VizStore
D->>UI: Select Learning Path
UI->>AI: generateOnboarding(owner, repo)
AI-->>UI: Return 6-step JSON
D->>UI: Click Step 2 (Entry Point)
UI->>S: setFocusedFilePath(path)
S-->>D: Graph centers on file node
D->>UI: Click "Explain"
UI->>AI: explainRepo(filePath)
AI-->>D: Show ELI5 / Standard Explanation
Sources: server/ai/tasks/onboarding.ts:52-87, src/components/viz/LearningPathTab.tsx:150-165
The simulation follows a specific logical order to build understanding:
- Project Purpose: High-level mental model.
- Entry Point: Startup sequence and initial execution.
- Routing: Request lifecycle or navigation logic.
- Business Logic: Core services or utility layers.
- Data Layer: State management or database interactions.
- Configuration: Testing, environment setup, or deployment.
Sources: server/ai/tasks/onboarding.ts:70-76
The simulation data is heavily dependent on the preliminary analysis performed by the server. This includes:
- File Classification: Identifying files as "entry," "utility," "component," or "config" to help the AI prioritize the reading order.
- High Coupling Detection: Files with a high degree of edges (connections) in the graph are often flagged as "risks" or "core logic" in the simulation.
- Metadata Synthesis: Topics, primary languages, and contributor counts are used to provide context for the "Mental Model."
Sources: src/components/viz/OverviewTab.tsx:43-52, server/ai/prompts.ts:31-37
{
"path": "src/context/TodoContext.tsx",
"importance": 88,
"reason": "Core state logic — defines useTodo hook, handles add/remove/toggle operations.",
"role": "State Context"
}Sources: server/ai/tasks/playground.ts:205-208
The Learning Paths Simulation effectively bridges the gap between raw code structure and human understanding, providing a roadmap that adapts to the specific architectural patterns of the repository being analyzed. Sources: README.md:12-18
Relevant source files
The following files were used as context for generating this wiki page:
The Export & Diagram Generation system within gitSdm provides users with the ability to visualize a repository's structural architecture through two distinct modes: a programmatically generated "Code Graph" and an "AI Enhanced" architecture diagram. This module bridges the gap between raw codebase analysis and human-readable documentation by converting repository metadata into interactive Mermaid.js flowcharts that can be exported in multiple formats including PNG, SVG, and raw Mermaid code.
Sources: README.md:129-133, src/components/viz/ArchitectureView.tsx:64-90
The diagram generation process is orchestrated through a specialized React view (ArchitectureView) that manages the state transitions between static analysis and AI-driven insights.
| Component | Responsibility |
|---|---|
ArchitectureView |
Main UI container for the visualization canvas and export controls. |
useArchitectureState |
Orchestrates the rendering lifecycle, switching between AI and Code modes. |
useArchitectureExport |
Handles the logic for generating image blobs and interacting with the clipboard. |
mermaid-generator |
Client-side logic for building Mermaid syntax from the local graph analysis. |
diagram.ts (Server) |
Backend task that prompts an LLM to synthesize a high-level architecture overview. |
Sources: src/components/viz/ArchitectureView.tsx:1-30, src/components/viz/architecture/hooks/useArchitectureState.ts:8-16, server/ai/tasks/diagram.ts:7-14
The following diagram illustrates how the system transitions from a user request to a rendered SVG and eventually an exported file.
flowchart TD
User[User Interface] -->|Toggle Mode| State[useArchitectureState]
State -->|Mode: Code| Programmatic[Mermaid Generator]
State -->|Mode: AI| AIService[AI Diagram Task]
Programmatic -->|Raw String| Renderer[Mermaid.js Render]
AIService -->|LLM Response| Renderer
Renderer -->|SVG String| View[ArchitectureView Canvas]
View -->|Export Request| Export[useArchitectureExport]
Export -->|html-to-image| PNG[Download PNG]
Export -->|XMLSerializer| SVG[Download SVG]
Export -->|Clipboard API| Text[Copy Mermaid Code]
Sources: src/components/viz/architecture/hooks/useArchitectureState.ts:24-60, src/components/viz/architecture/hooks/useArchitectureExport.ts:31-105
The programmatic generator focuses on structural connectivity. It scores nodes based on their "degree" (incoming and outgoing edges), file classification (e.g., entry points), and whether they are marked as "important files" by the analyzer. This ensures the resulting diagram focuses on the most significant architectural blocks.
- Node Scoring: Entry points receive a +10 bonus, and important files receive a +5 bonus to their connectivity score.
- Filtering: To maintain readability, the system slices the top 25 scored nodes.
- Clustering: Files are grouped into subgraphs based on their directory paths.
Sources: src/components/viz/architecture/mermaid-generator.ts:29-55
The AI mode utilizes the executeAiTask function on the server to prompt an LLM (Gemini, OpenAI, or Anthropic) for a "beautiful, clean, and highly readable" architecture flowchart.
-
Layout Strategy: Specifically requests a Left-to-Right (
graph LR) layout. - Logical Grouping: The AI is instructed to group components into subgraphs like "Entry Points", "Services", and "Database".
-
Styling: Applies custom CSS classes to Mermaid nodes (e.g.,
entry,router,service,db) for visual differentiation.
Sources: server/ai/tasks/diagram.ts:16-43
The system provides robust export options to facilitate the use of diagrams in external documentation or developer wikis.
The useArchitectureExport hook manages the serialization of the DOM elements into various formats.
sequenceDiagram
participant User as User
participant Hook as useArchitectureExport
participant HTI as html-to-image
participant XMLS as XMLSerializer
User->>Hook: handleDownloadPng()
activate Hook
Hook->>HTI: toPng(svgElement)
HTI-->>Hook: dataUrl
Hook->>User: Trigger Browser Download (.png)
deactivate Hook
User->>Hook: handleCopySvg()
activate Hook
Hook->>XMLS: serializeToString(svgElement)
XMLS-->>Hook: svgString
Hook->>User: Write to Clipboard
deactivate Hook
Sources: src/components/viz/architecture/hooks/useArchitectureExport.ts:79-119
| Format | Implementation Detail | Source File |
|---|---|---|
| PNG | Uses html-to-image with a custom background (#09090b) and a pixel ratio of 2 for high resolution. |
useArchitectureExport.ts:107-118 |
| SVG | Utilizes XMLSerializer to convert the live SVG DOM node into a string blob. |
useArchitectureExport.ts:85-103 |
| Mermaid Code | Extracts raw text and uses stripMermaidFences to clean the output for clipboard compatibility. |
useArchitectureExport.ts:31-59 |
Diagrams are rendered using the mermaid library. The useArchitectureState hook handles the asynchronous rendering process and implements a cleanup mechanism to prevent memory leaks or overlapping SVG IDs.
-
Scaling: Rendered SVGs are post-processed to replace fixed width/height attributes with
width="100%"andheight="100%"to support the pan-and-zoom interface. - Error Handling: If a diagram fails to render (common with complex circular dependencies), the system catches the error and displays a "Failed to layout flowchart" message to the user.
-
Pan & Zoom: Managed by the
useArchitecturePanZoomhook, allowing users to interact with the rendered SVG using mouse wheels and dragging.
Sources: src/components/viz/architecture/hooks/useArchitectureState.ts:40-70, src/components/viz/ArchitectureView.tsx:185-210
The Export & Diagram Generation module is a critical component of gitSdm's mission to provide "instant architecture visualization." By combining deterministic graph analysis with the creative synthesis of AI, it allows developers to generate professional-grade documentation and visual aids directly from their source code.
Relevant source files
The following files were used as context for generating this wiki page:
The Smart File Explorer is a core navigation and analysis system within gitSdm designed to transform raw repository structures into actionable architectural insights. Unlike traditional file browsers, it integrates file classification, dependency analysis, and AI-powered context to help developers understand unfamiliar codebases quickly. It acts as the primary interface for exploring files, identifying entry points, and accessing deep-dive explanations.
The system encompasses the physical directory tree located in src/components/explorer/, the visual graph interaction in src/features/graph/, and the intelligence layer that ranks and explains files. It works by ingestings a repository URL, parsing its manifests, and then annotating the resulting tree with metadata that drives the "Smart" features such as suggested reading and high-coupling detection.
Sources: README.md:16-25, src/components/home/HowItWorks.tsx:5-13, README.md:120-130
The Smart File Explorer relies on a multi-stage pipeline that begins with GitHub ingestion and ends with an interactive, annotated UI.
- Ingestion: The user provides a GitHub URL which is fetched and converted into a flat tree.
-
Classification: Files are categorized (e.g.,
entry,test,config) based on their paths and contents. - Graph Construction: Files are transformed into nodes and edges for the React Flow canvas.
- UI Enrichment: The explorer uses these classifications to highlight important files and entry points.
graph TD
A[GitHub URL] --> B[Flat Tree Fetcher]
B --> C[File Classifier]
C --> D[Manifest Parser]
D --> E[Graph Builder]
E --> F[Smart Explorer UI]
F --> G[AI Explanation Layer]
subgraph "Classification Logic"
C1[Entry Points]
C2[Config Files]
C3[Test Suites]
end
C --- C1
C --- C2
C --- C3
This diagram illustrates how raw repository data flows through classification and parsing before being rendered in the Smart Explorer UI. Sources: src/components/home/HowItWorks.tsx:5-12, server/parser/file-classifier.test.ts:7-38
The "Smart" aspect of the explorer is driven by the file-classifier, which assigns roles to files. These roles determine how files are prioritized in the "Suggested Reading" and "Entry Points" sections.
| Classification | Indicators | Purpose in Explorer |
|---|---|---|
| Entry |
index.ts, main.go, App.tsx
|
Identified as start of execution. |
| Test |
*.test.tsx, *_spec.go, __tests__/
|
Filtered from primary onboarding flows. |
| Config |
tsconfig.json, vite.config.ts, .env
|
Grouped as project setup files. |
| Doc |
README.md, LICENSE
|
Ranked highly for initial onboarding. |
| Source | Standard logic files | General exploration nodes. |
Sources: server/parser/file-classifier.test.ts:7-38
The system uses the findImportantFiles function to rank manifest files (like package.json) and entry points higher than utility or asset files. A score is calculated based on file depth (proximity to root) and type, ensuring that the most impactful files are presented first to the developer.
Sources: server/parser/file-classifier.test.ts:60-75, src/components/viz/OverviewTab.tsx:135-155
When a user selects a file or node in the explorer, the system triggers the AI Intelligence layer to provide a high-level technical summary.
-
Focusing Logic: When a file is selected in the explorer, the
focusOnNodefunction calculates the node's position on the graph and animates the view to center on it with a specific zoom level (typically 1.3). -
AI Explain Selection: Uses the
explainRepotask to provide specific headings: "What this does", "Why it matters", and "Where to look next". - ELI5 Mode: A toggleable mode that simplifies complex technical file descriptions for junior developers.
sequenceDiagram
participant U as User
participant E as Explorer UI
participant RF as React Flow Canvas
participant AI as AI Task Service
U->>E: Click File (e.g. src/App.tsx)
E->>RF: focusOnNode(id, path)
RF-->>U: Zoom & Center Node
E->>AI: explainRepo(scope: 'file', path: 'src/App.tsx')
AI-->>E: Return Markdown Explanation
E-->>U: Display "Why it Matters" Section
The sequence above shows the interaction between the file explorer, the visual graph, and the AI backend when a user explores a specific file. Sources: src/components/viz/OverviewTab.tsx:28-45, server/ai/tasks/explain.ts:25-55, src/components/viz/ai-sidebar/AiCenterTab.tsx:220-235
The Smart File Explorer doesn't just list files; it provides metadata about the repository's health and coupling.
- High Coupling Detection: The explorer calculates the "degree" of files (number of incoming/outgoing edges) to identify "God Objects" or critical modules that many other files depend on.
- Commit Density: Integrates commit history to show which files are actively being changed, providing a "heat map" of development activity.
- Onboarding Steps: Generates a 6-step walkthrough that references real file paths to guide a new developer through the codebase's execution flow.
Sources: src/components/viz/OverviewTab.tsx:54-65, src/components/viz/OverviewTab.tsx:185-205, server/ai/tasks/onboarding.ts:60-75
The Smart File Explorer serves as the bridge between static code analysis and dynamic developer understanding. By combining automated file classification with AI-driven insights and interactive graph visualization, it allows developers to skip the manual "grep and trace" phase of onboarding. Its ability to surface entry points and high-coupling nodes ensures that exploration is prioritized by architectural significance rather than just alphabetical order.
Relevant source files
The following files were used as context for generating this wiki page:
The Language Parsers and Dependency Analysis system is the core intelligence engine of gitSdm. It is responsible for transforming a raw repository file tree into a structured map of interconnected components, third-party dependencies, and categorized modules. This system operates as part of the backend analysis pipeline, taking file names and contents as input to determine the architectural significance of each file within a codebase.
By identifying entry points, resolving manifests across multiple ecosystems (NPM, Go, Python, Rust, etc.), and classifying file roles (e.g., config, test, source), the system enables high-level features such as the interactive dependency graph and AI-powered architecture summaries. Sources: README.md:143-149, src/components/home/HowItWorks.tsx:4-9
The system follows a sequential pipeline to process repository data. This flow ensures that every file is evaluated for its specific role and its relationship to the broader project ecosystem.
flowchart TD
A[Fetch File Tree] --> B[File Classification]
B --> C[Manifest Parsing]
C --> D[Dependency Extraction]
D --> E[Import Resolution]
E --> F[Graph Construction]
subgraph AnalysisEngine [Analysis Engine]
B
C
D
E
end
The analysis pipeline starts with the file structure and metadata, proceeding to identify dependencies and trace connections across all files. Sources: src/components/home/HowItWorks.tsx:4-9
Files are categorized into specific classes based on their names and paths. This classification is critical for determining which files are "important" and how they should be represented in the visual graph (e.g., assigning specific colors or rankings).
| Class | Description | Examples |
|---|---|---|
| Entry | Primary execution starting points. |
src/index.ts, main.go, App.tsx
|
| Test | Files containing unit or integration tests. |
button.test.tsx, user_spec.go, __tests__/
|
| Config | Project settings and environment variables. |
tsconfig.json, vite.config.ts, .env
|
| Doc | Documentation and license information. |
README.md, LICENSE
|
| Asset | Static files like images and icons. |
logo.png, favicon.ico
|
| Source | General implementation and logic files. |
helpers.ts, db/conn.go
|
Sources: server/parser/file-classifier.test.ts:7-40
The system uses the classification to rank files. For instance, package.json receives a high importance score because it is a manifest, while files in the root or with shallow directory depth are prioritized over deeply nested files.
Sources: server/parser/file-classifier.test.ts:63-75
The project employs a modular parserRegistry to handle various ecosystem manifest files. Each parser is registered with a unique name and a file pattern (string or RegExp) to match relevant files in a repository.
The registry supports a wide range of modern programming languages and deployment tools:
-
Node.js/NPM:
package.json(Prod, Dev, and Peer dependencies) -
Go:
go.mod(Require blocks) -
Python:
requirements.txtandpyproject.toml -
Rust:
Cargo.toml -
Java:
pom.xml -
Infrastructure:
Dockerfile(Base images)
Sources: server/parser/manifest-parsers/index.test.ts:14-118, server/parser/manifest-parsers/registry.ts
sequenceDiagram
participant Analyzer as Dependency Analyzer
participant Registry as Parser Registry
participant Parser as Manifest Parser
Analyzer->>Registry: getParserForFile(filename)
Registry-->>Analyzer: parserInstance
Analyzer->>Parser: parse(content)
Parser-->>Analyzer: Array of Dependency objects
The interaction between the analyzer and the registry allows for dynamic selection of the correct parsing logic based on the file extension or specific manifest name. Sources: server/parser/manifest-parsers/index.test.ts:125-135, server/parser/manifest-parsers/registry.ts
The dependency-analyzer aggregates findings from all detected manifest files. It is capable of handling monorepos or projects with multiple sub-projects by merging dependencies found in different locations (e.g., a root package.json and a subproject/package.json).
-
Combination: Identifies dependencies across different ecosystems (e.g., finding both
lodashfrom NPM and a Gin-gonic module from Go in the same repo). -
Deduplication: Identifies identical dependencies to prevent redundant nodes in the visualization. If
lodashversion^4.17.21is found in two separatepackage.jsonfiles, it is treated as a single dependency entry.
Sources: server/parser/dependency-analyzer.test.ts:6-33
Each extracted dependency is stored with specific metadata:
{
"name": "react",
"version": "^19.0.0",
"type": "prod",
"ecosystem": "npm"
}Sources: server/parser/manifest-parsers/index.test.ts:16-19
The Language Parsers & Dependency Analysis module provides the foundational data for gitSdm's architectural mapping. By combining broad ecosystem support via a extensible parser registry with intelligent file classification and importance ranking, it ensures that developers receive an accurate and prioritized view of any unfamiliar codebase. This structured data allows the system to identify "Hot Paths" and "High Coupling" within a repository, facilitating faster onboarding and more informed refactoring decisions. Sources: server/github/mock-data.ts:167-175, server/parser/file-classifier.test.ts:63-75
Relevant source files
The following files were used as context for generating this wiki page:
The Graph Building Algorithms in gitSdm are responsible for transforming a flat or nested repository file tree into a multi-dimensional interactive graph. This system orchestrates file classification, relationship extraction (such as imports and dependencies), and spatial positioning to provide a visual mental model of software architecture.
The system leverages both deterministic static analysis—using engines like Dagre and D3-force—and heuristic AI-driven modeling to generate architectural diagrams. This dual approach allows for precise mapping of individual file relationships while providing high-level conceptual overviews of module boundaries.
The core graph construction process follows a pipeline that starts with repository metadata and file tree ingestion. It generates a collection of nodes and edges that represent the physical and logical structure of the codebase.
Nodes are the primary entities in the graph, representing repositories, folders, and files. Every graph starts with a root repo node, followed by hierarchical folder and file nodes derived from the GitHub tree.
Sources: server/graph/graph-builder.test.ts:4-22, src/components/viz/OverviewTab.tsx:44-48
Relationships between nodes are established through two primary mechanisms:
- Hierarchy Edges: Represent the directory structure (e.g., Folder A contains File B).
-
Import Edges: Created by parsing
fileContentsto identify static import statements. The resolver maps local import paths to existing file nodes within the graph. Sources: server/graph/graph-builder.test.ts:40-66, README.md:144-148
flowchart TD
Input[File Tree & Manifests] --> Build[buildGraph Engine]
Build --> Nodes[Generate Nodes: Repo, Dir, File]
Build --> Edges[Resolve Edges]
Edges --> Hierarchy[Parent-Child Links]
Edges --> Imports[Static Import Resolution]
Nodes --> Styling[Apply Colors & Sizes]
Styling --> Output[Graph Data Structure]
The diagram shows the sequential flow from raw data ingestion to the final graph data structure.
Once the graph structure is defined, layout algorithms determine the x and y coordinates for every node. The system supports multiple layout strategies to accommodate different architectural views.
The Dagre engine is used to create structured, directed layouts. It is particularly effective for visualizing dependency chains and request lifecycles.
- Top-to-Bottom (TB): Positions source nodes at the top and dependents below.
- Left-to-Right (LR): Positions entry points on the left, flowing towards utilities and databases on the right. Sources: server/graph/layout.test.ts:6-37, README.md:145-147
Each node is assigned visual properties based on its type or file extension to aid in rapid identification. Sources: server/graph/layout.test.ts:47-65
| Node Type / Extension | Visual Color | Size (Radius) |
|---|---|---|
| Repository |
#a78bfa (Violet) |
14 |
| Folder |
#fbbf24 (Amber) |
12 |
| .ts / .tsx |
#3b82f6 (Blue) |
8 |
| .js / .jsx |
#facc15 (Yellow) |
8 |
| Unknown File |
#9ca3af (Grey) |
8 |
gitSdm provides two distinct algorithms for generating Mermaid.js flowcharts: Programmatic Scoring and AI Synthesis.
The generateProgrammaticMermaid function uses a connectivity-based scoring system to select the most relevant nodes for a compact diagram (limited to the top 25 nodes).
- Connectivity Score: Sum of incoming and outgoing edges.
-
Class Bonus: Nodes identified as
entryreceive a +10 score boost;importantFilesreceive +5. -
Grouping: Nodes are clustered into
subgraphblocks based on their directory paths. Sources: src/components/viz/architecture/mermaid-generator.ts:25-58
The AI generator utilizes a large language model (LLM) to identify conceptual "subgraphs" and "classes" that static analysis might miss. It groups files into logical blocks like "Controllers", "Services", and "Database". Sources: server/ai/tasks/diagram.ts:16-43
sequenceDiagram
participant UI as Architecture View
participant GS as Mermaid Generator
participant AI as AI Task Handler
UI->>GS: Request Enhanced Diagram
GS->>AI: executeAiTask (mermaid)
Note over AI: Analyzes Repository Analysis Context
AI-->>GS: Mermaid Code Block (graph LR)
GS->>GS: Sanitize IDs & Clean Scripts
GS-->>UI: Rendered SVG Flowchart
The sequence diagram illustrates the request flow for generating an AI-enhanced architecture diagram.
The graph algorithms also calculate metrics used for the "Overview" and "Health" dashboards.
The system calculates the "degree" of each node by summing its source and target edges. This is used to identify High Coupling points—files or modules that have a high number of dependencies or dependents, indicating potential architectural bottlenecks. Sources: src/components/viz/OverviewTab.tsx:50-59
During graph building, files are tagged with specific classes used for both styling and AI prompting:
-
entry: Main application entry points (e.g.,main.tsx,index.ts). -
router: API handlers or UI route definitions. -
service: Core business logic. -
util: Helper functions and parsers. Sources: server/ai/tasks/diagram.ts:22-28, src/components/viz/architecture/mermaid-generator.ts:92-96
The Graph Building Algorithms in gitSdm provide the technical foundation for repository visualization. By combining deterministic hierarchy extraction with import resolution and sophisticated layout engines like Dagre, the system creates a spatial representation of code. This is further enhanced by scoring algorithms and AI synthesis that distill complex trees into readable architecture diagrams, highlighting high-coupling risks and core execution flows. Sources: README.md:144-150, server/graph/graph-builder.test.ts:4-66
Relevant source files
The following files were used as context for generating this wiki page:
The GitHub API Integration serves as the primary data ingestion layer for the gitSdm platform. Its purpose is to interface with GitHub's REST and Git Data APIs to retrieve repository metadata, file structures, manifest contents, and activity metrics. This system enables the application to transform a standard GitHub URL into a structured dataset for visualization and AI analysis.
The integration is architected to handle both authenticated and unauthenticated requests, incorporating a mock data layer for development and testing environments. It acts as the foundation for downstream features such as the Interactive Visualization and AI-powered insights by providing the raw source code and structural data required for parsing.
Sources: README.md:143-149, server/services/analyze-repo.ts:18-50
The integration follows a linear pipeline to move from a user-provided URL to a fully analyzed repository object. The process begins with URL parsing, followed by metadata retrieval, and culminates in a deep tree traversal to build the project's file hierarchy.
The following diagram illustrates the sequence of operations performed during a repository analysis request:
flowchart TD
URL[GitHub URL Input] --> Parse[Parse URL: Owner/Repo/Branch]
Parse --> Info[Fetch Repo Metadata & SHA]
Info --> Tree[Fetch Flat File Tree]
Tree --> Filter[Filter Manifests & Source Files]
Filter --> Content[Fetch File Contents]
Content --> Final[Aggregate Analysis Object]
style URL fill:#238636,stroke:#fff
style Final fill:#1f6feb,stroke:#fff
The analysis pipeline coordinates multiple asynchronous calls to ensure all necessary data (tree structure, contributors, and timeline) is available before passing the data to the graph builder and dependency analyzer. Sources: src/components/home/HowItWorks.tsx:4-11, server/services/analyze-repo.ts:32-60
The project utilizes the Octokit library to interact with GitHub. The services are divided into metadata retrieval, tree fetching, and content extraction.
Metadata such as star counts, descriptions, and default branches are fetched via the repos.get endpoint. For structural analysis, the system uses git.getTree with the recursive parameter to obtain a flat list of all objects in the repository.
| Function | Endpoint/Action | Purpose |
|---|---|---|
fetchRepoInfo |
repos.get |
Retrieves fullName, stars, forks, and license. |
fetchFlatTree |
git.getTree |
Fetches all file paths and SHAs recursively. |
fetchFileContents |
repos.getContent |
Downloads raw content for specific files (e.g., package.json). |
fetchTimeline |
repos.listCommits |
Aggregates commit activity over recent weeks. |
Sources: server/github/fetch-tree.ts, server/services/analyze-repo.ts:34-40
The system supports Personal Access Tokens (PAT) to increase API rate limits. Tokens are passed from the client via the X-GitHub-Token header and utilized by the server-side Octokit instance.
// Client-side header injection
function getGitHubTokenHeader(): Record<string, string> {
try {
const token = localStorage.getItem('gitsdm_github_pat');
return token ? { 'X-GitHub-Token': token } : {};
} catch {
return {};
}
}Sources: src/lib/apiClient.ts:21-27, README.md:104-106
To facilitate development without hitting GitHub API limits, the integration includes a robust mocking layer. If the repository owner is identified as mock, the system redirects all calls to a local database of predefined repository structures.
sequenceDiagram
participant S as Service Layer
participant F as fetch-tree.ts
participant M as mock-data.ts
participant G as GitHub API
S->>F: fetchRepoInfo("mock", "gitsdm")
F->>F: isMockRepo("mock")?
alt is Mock
F->>M: fetchMockRepoInfo()
M-->>F: Return static JSON
else is Real
F->>G: octokit.repos.get()
G-->>F: Return GitHub Response
end
F-->>S: Return Unified RepoInfo
Sources: server/github/mock-data.ts:4-6, server/github/fetch-tree.test.ts:110-120
The parseGitHubUrl utility is responsible for decomposing various GitHub URL formats into their constituent parts: owner, repo, and branch. It supports standard browser URLs, deep-linked file paths, and shorthand owner/repo strings.
| Input Type | Example | Extracted Owner | Extracted Repo |
|---|---|---|---|
| Full URL | `https://github.com/mbayue/gitSdm%60 | mbayue |
gitSdm |
| Branch URL | `https://github.com/owner/repo/tree/dev%60 | owner |
repo (branch: dev) |
| Shorthand | owner/repo |
owner |
repo |
Sources: server/github/parse-url.ts, server/services/analyze-repo.ts:20-25
The GitHub API Integration provides the essential data bridge between raw GitHub repositories and the gitSdm intelligence engine. By abstracting the complexities of tree traversal, authentication, and rate-limiting through a unified service layer, the platform ensures consistent data availability for both its visualization canvas and AI diagnostic tools. Sources: README.md:143-155, server/services/analyze-repo.ts:80-100
Relevant source files
The following files were used as context for generating this wiki page:
The Caching Layer in gitSdm is a centralized in-memory storage system designed to optimize performance by reducing redundant API calls to GitHub and AI providers. It utilizes the Least Recently Used (LRU) eviction strategy to manage memory efficiently, ensuring that frequently accessed data—such as repository analyses, AI-generated explanations, and search results—is readily available while older, less relevant data is purged.
This layer serves as a critical performance bridge between the Repository Analysis Service and external data sources. By caching expensive operations like full repository scans and semantic search results, the system significantly improves response times for end-users and helps mitigate rate-limiting issues from external providers.
Sources: server/cache/lru.ts:1-72, README.md: Architecture Section
The caching system is partitioned into specific "buckets," each managed by an independent LRUCache instance. This separation allows for granular control over Time-To-Live (TTL) values and maximum entry limits based on the specific data type.
| Bucket Name | Target Data Type | Max Size | TTL (Time-To-Live) |
|---|---|---|---|
analyzeCache |
Full repository analysis results | 200 entries | 60 Minutes |
aiCache |
AI task results (summaries, roasts, etc.) | 200 entries | 30 Minutes |
searchCache |
Semantic search and QA results | 500 entries | 60 Minutes |
indexCache |
Vector store indices and metadata | 50 entries | 120 Minutes |
Sources: server/cache/lru.ts:10-33
The cache object provides a unified CacheStore interface (get, set, has, delete). Internally, the getBucket utility function routes requests to the appropriate LRUCache instance by inspecting the string prefix of the cache key.
flowchart TD
Req[Cache Request] --> Prefix{Key Prefix?}
Prefix -- "ai:" --> AI[aiCache]
Prefix -- "search:" --> SEARCH[searchCache]
Prefix -- "index:" --> INDEX[indexCache]
Prefix -- default --> ANALYZE[analyzeCache]
AI --> Op[Execute Get/Set/Delete]
SEARCH --> Op
INDEX --> Op
ANALYZE --> Op
Sources: server/cache/lru.ts:35-58
Cache keys are constructed deterministically to ensure that identical requests map to the same cached value. The system includes specialized functions for generating keys for different domains.
- Analysis Keys: Combines owner, repository name, commit SHA, and optional branch name.
-
AI Keys: Combines the task kind, repository identifiers, and a unique
contextHash. -
Context Hashing: The
hashContextfunction generates a base-36 string hash from input strings (like query parameters or code snippets) to ensure key length remains manageable and consistent.
Sources: server/cache/lru.ts:68-96, server/search/constants.test.ts:98-120
// server/cache/lru.ts:68-80
export function analyzeCacheKey(owner: string, repo: string, sha: string, branch?: string): string {
return branch
? `analyze:${owner}/${repo}@${sha}:${branch}`
: `analyze:${owner}/${repo}@${sha}`;
}
export function aiCacheKey(
kind: string,
owner: string,
repo: string,
sha: string,
contextHash: string,
discriminator?: string,
): string {
return discriminator
? `ai:${kind}:${owner}/${repo}@${sha}:${contextHash}:${discriminator}`
: `ai:${kind}:${owner}/${repo}@${sha}:${contextHash}`;
}The system supports both global and targeted cache invalidation.
-
Global Clear: The
clearAllCaches()function resets all four buckets simultaneously. -
Targeted Search Invalidation: The
invalidateSearchCache(owner, repo)function iterates through thesearchCachekeys and removes entries that match the specificowner/repoprefix. This is used when a repository is re-indexed or updated. - TTL Expiry: Each bucket automatically handles entry expiration based on the configured milliseconds.
Sources: server/cache/lru.ts:60-70, server/cache/lru.test.ts:33-60
The caching layer is deeply integrated into the analyzeRepository service. The service checks for a cached result using a generated key before proceeding with expensive operations like fetching trees or parsing dependencies.
sequenceDiagram
participant S as analyze-repo.ts
participant C as lru.ts
participant G as GitHub API
S->>C: analyzeCacheKey(owner, repo, sha)
C-->>S: return cacheKey
S->>C: cache.get(cacheKey)
alt Cache Hit
C-->>S: return RepoAnalysis
else Cache Miss
S->>G: Fetch Flat Tree & Manifests
G-->>S: return Data
S->>S: buildGraph & analyzeDependencies
S->>C: cache.set(cacheKey, analysis)
end
Sources: server/services/analyze-repo.ts:28-36, server/ai/tasks/explain.ts:21-25
As noted in the project documentation and roast mock-ups, the current LRU implementation resides in-memory. In serverless environments (like Vercel functions), this cache is subject to resets during "cold starts," meaning cache persistence is limited to the lifecycle of the active server instance.
Sources: README.md: 🧩 Core Features, server/ai/tasks/playground.ts:33-35
Relevant source files
The following files were used as context for generating this wiki page:
The Vector Store & Embeddings system provides the foundation for gitSdm's semantic search and AI-powered Question & Answering (QA) capabilities. This system enables the platform to perform context-aware searches across a repository's codebase by transforming raw source code into high-dimensional numerical vectors.
The primary purpose of this module is to support the "AI-powered semantic search & Q&A" feature, allowing users to locate entry points and ask technical questions about the project structure through a natural language interface. It integrates closely with the AI Provider layer to utilize Large Language Models (LLMs) for both generating embeddings and synthesizing answers based on retrieved code context.
Sources: README.md:162, server/search/types.ts:101-125
The search system is built on a decoupled architecture consisting of four main interfaces: Chunker, Embedding Provider, Vector Store, and QA/Search Engines. This modularity allows the project to swap out AI providers (such as Gemini, OpenAI, or Anthropic) while maintaining a consistent internal data flow for indexing and retrieval.
classDiagram
class Chunker {
+chunkFile(content, filePath, language) Chunk[]
}
class EmbeddingProvider {
+embed(text) EmbeddingResult
+embedBatch(texts) EmbeddingResult[]
+dimensions int
}
class VectorStore {
+addChunks(chunks) void
+search(queryVector, repoKey, topK, minScore) SearchResult[]
+removeByRepo(repoKey) void
}
class SearchEngine {
+search(options) SearchResponse
}
class QAEngine {
+ask(options) QAResponse
}
QAEngine ..> SearchEngine : utilizes
SearchEngine ..> VectorStore : queries
SearchEngine ..> EmbeddingProvider : vectorizes query
The diagram shows the relationship between core search interfaces and the hierarchical flow from user query to vector retrieval.
Sources: server/search/types.ts:5-131
The indexing process transforms a repository's file tree into a searchable vector index. This involves traversing the repository, breaking files into manageable chunks, and generating embeddings for each chunk.
-
File Chunking: The
Chunkerprocesses file content intoChunkobjects, retaining metadata like start/end lines and programming language. -
Vectorization: The
EmbeddingProviderconverts text chunks intoFloat32Arrayvectors (typically normalized to unit length). -
Storage:
IndexedChunkobjects containing the vector and original metadata are added to theVectorStore.
flowchart TD
A[Source Code File] --> B[Chunker]
B --> C[Text Chunks]
C --> D[Embedding Provider]
D --> E[Vector Generation]
E --> F[Vector Store]
F --> G[(Indexed Repository)]
This flowchart illustrates the transformation of source code into a searchable vector index.
Sources: server/search/types.ts:5-84, server/search/vector-store.test.ts:16-30
The system tracks the lifecycle of an indexing operation through a defined state machine.
| State | Description |
|---|---|
idle |
No indexing operation currently active for the repository. |
indexing |
Progressing through files; includes progress, filesProcessed, and totalFiles. |
complete |
Indexing finished; includes chunkCount and completion timestamp. |
failed |
Operation halted due to error; includes error details and failedFiles count. |
Sources: server/search/types.ts:90-103
The VectorStore serves as the retrieval engine. It is designed to isolate chunks by repoKey (formatted as "owner/repo") to ensure search results are scoped to specific projects.
Retrieval is performed using cosine similarity. The store calculates the distance between a queryVector and the indexed chunks within a specific repository.
- Scoring: Results are returned with a score between 0.0 and 1.0, representing the cosine similarity.
-
Filtering: Users can specify a
minScorethreshold to filter out low-relevance results. - Ranking: Results are sorted in descending order by similarity score.
Sources: server/search/vector-store.test.ts:98-125, server/search/types.ts:56-61
Chunk Metadata The system stores rich context alongside the vector to allow for precise citations and code snippet rendering in the UI.
| Field | Type | Description |
|---|---|---|
filePath |
string |
The path of the source file. |
startLine |
number |
The starting line of the chunk within the file. |
endLine |
number |
The ending line of the chunk. |
content |
string |
The raw code content for UI display. |
repoKey |
string |
Unique identifier (owner/repo). |
commitSha |
string |
The SHA of the commit when indexed. |
Sources: server/search/types.ts:31-40
The QAEngine and SearchEngine provide the higher-level logic for interacting with the vector data.
-
Semantic Search: Converts a natural language query into a vector and retrieves the
topKmost relevant code chunks from the store. -
QA Answer Synthesis: Uses the
QAEngineto take a user's question, retrieve relevant chunks as context, and generate a markdown-formatted answer via the AI Provider. -
Citations: The
QAResponseincludes aCitationarray (file paths and line ranges) to link the AI's answer back to the actual source code.
Sources: server/search/types.ts:107-131, server/ai/tasks/explain.ts:122-135
The embedding generation and answer synthesis rely on the AIProvider interface. The project supports multiple backends which can be configured via environment variables.
| Provider | Model Default | Capability |
|---|---|---|
| Gemini | gemini-2.5-flash |
Content generation and embedding logic. |
| OpenAI | gpt-4o-mini |
Standard Chat Completion and JSON mode. |
| Anthropic | claude-3-5-haiku-latest |
High-quality analysis and synthesis. |
| Mock | N/A | Local development and testing without API keys. |
Sources: server/ai/provider.ts:40-75, README.md:95-103
The Vector Store & Embeddings module enables gitSdm to move beyond basic file exploration into deep code intelligence. By chunking, vectorizing, and indexing repositories using advanced LLM providers, the system allows developers to perform semantic searches and receive context-aware answers to complex architectural questions. The modular design ensures that as vector database technologies or embedding models evolve, the core repository analysis pipeline remains stable and extensible.
Sources: README.md:14-18, server/search/types.ts:1-131
Relevant source files
The following files were used as context for generating this wiki page:
Global state management in gitSdm is primarily handled through Zustand, which acts as a centralized store for managing the interactive dependency graph, UI panel states, and AI-driven analysis results. The architecture focuses on decoupling the heavy computational layout tasks from the UI, ensuring high-performance interactions within the React Flow canvas and the multi-tabbed AI sidebar.
Sources: README.md:92, README.md:164, server/ai/tasks/playground.ts:258-260
The system utilizes a central store, vizStore, to coordinate between the repository analysis engine and the frontend presentation layers. This store manages graph filters, node selections, and the visibility of inspection panels. By centralizing these states, gitSdm maintains synchronization across disparate UI components such as the OverviewTab, AiCenterTab, and the GraphCanvas.
| State Category | Description | Primary File/Component |
|---|---|---|
| Graph Interaction | Selection of nodes, focusing file paths, and branch comparison states. | src/components/viz/OverviewTab.tsx |
| AI Context | Tracking ELI5 mode, active playground tools, and current architectural explanations. | src/components/viz/ai-sidebar/AiCenterTab.tsx |
| UI Layout | Managing inspector visibility and sidebar tab navigation. | src/components/viz/ai-sidebar/AiCenterTab.tsx |
Sources: src/components/viz/OverviewTab.tsx:28-32, src/components/viz/ai-sidebar/AiCenterTab.tsx:90-112
Data flows from the backend AI task handlers (such as explain, refactor, and playground) into the global state, which then hydrates the UI. The state management layer handles the transition between raw repository analysis and the interactive visualization.
flowchart TD
API[Backend API Tasks] -->|AI JSON/Markdown| Store[Zustand vizStore]
Store -->|selectedNodeId| RF[React Flow Canvas]
Store -->|eli5Mode| AI[AiCenterTab]
Store -->|inspectorOpen| UI[Inspector Panel]
RF -->|onNodeClick| Store
AI -->|toggleEli5| Store
The diagram shows how the vizStore acts as a central hub between backend AI task outputs and frontend interactive components. Sources: src/components/viz/ai-sidebar/AiCenterTab.tsx:90-112, server/ai/tasks/explain.ts:25-30
When a user interacts with the graph or the file list, the state is updated globally to trigger side effects such as camera centering in the React Flow viewport and updating the AI context for the AiCenterTab.
// Example of global state interaction for node focusing
const focusOnNode = useCallback((nodeId: string, filePath?: string | null) => {
useVizStore.getState().setSelectedNodeId(nodeId);
if (filePath !== undefined) {
useVizStore.getState().setFocusedFilePath(filePath);
}
// React Flow centering logic follows...
}, [setCenter, getNode]);Sources: src/components/viz/OverviewTab.tsx:28-33
The AiCenterTab manages a complex sub-state specifically for AI interactions. This includes toggleable modes like ELI5 (Explain Like I'm 5) and specific playground tools like the Repo Roast or README Enhancer. These states are often cached or managed via hooks like useAiCenterState to prevent redundant AI provider calls.
-
eli5Mode: A boolean flag that modifies the
userPromptsent to AI providers to request simplified explanations. -
activePlayground: Tracks which creative tool (
roast,readme) is currently active. - pendingToolRequests: A set of active request keys used to prevent duplicate concurrent AI tasks.
Sources: src/components/viz/ai-sidebar/AiCenterTab.tsx:94-112, server/ai/tasks/explain.ts:47-49, server/ai/tasks/playground.ts:27-30
gitSdm supports visual branch comparison, which is reflected in the global state through graphDiff. This state tracks sets of added, modified, and deleted node IDs. The OverviewTab consumes this state to render filtered file lists and stat summaries.
sequenceDiagram
participant User
participant Store as vizStore
participant Tab as OverviewTab
User->>Tab: Select Branch to Compare
Tab->>Store: setCompareBranch(true)
Store-->>Tab: Provide graphDiff (Set IDs)
Tab->>Tab: Filter analysis.graph.nodes
Tab-->>User: Render Added/Modified/Deleted lists
Sequence of events during a branch comparison operation managed via global state. Sources: src/components/viz/OverviewTab.tsx:50-55, src/components/viz/OverviewTab.tsx:78-100
The following table summarizes the key state-related functions and their locations:
| Function / Hook | Responsibility | File Path |
|---|---|---|
useVizStore |
Main global state hook for graph and UI control. | src/stores/vizStore.ts |
useAiCenterState |
Manages transient state for AI sidebar tabs and caching. | src/components/viz/ai-sidebar/hooks/useAiCenterState.ts |
setSelectedNodeId |
Updates the globally active node for inspector and AI context. | src/components/viz/OverviewTab.tsx |
toggleEli5Mode |
Switches the AI instruction set between Technical and ELI5. | src/components/viz/ai-sidebar/AiCenterTab.tsx |
Sources: README.md:92, src/components/viz/OverviewTab.tsx:29, src/components/viz/ai-sidebar/AiCenterTab.tsx:94-100
Global state management in gitSdm is designed to facilitate a "graph-first" experience. By utilizing Zustand for core architectural states and specialized hooks for AI task management, the system ensures that user interactions on the visual canvas are immediately reflected in the analytical sidebars, providing a cohesive codebase exploration environment.
Relevant source files
The following files were used as context for generating this wiki page:
The AI Provider Integration system in gitSdm serves as a multi-model abstraction layer that enables the platform to generate repository insights, architectural summaries, and code explanations. By supporting various Large Language Model (LLM) providers, the system ensures flexibility and reliability, allowing the application to fallback to a mock provider for development or if no API keys are configured.
Sources: server/ai/provider.ts:1-7, README.md:144-149
The integration follows a factory pattern, where a central provider factory instantiates specific implementations based on environment variables or user-supplied API keys. All providers implement a unified AIProvider interface, ensuring that the rest of the application remains agnostic of the specific LLM being used.
The core of the system is the AIProvider interface, which defines a single complete method for handling asynchronous message exchanges.
export interface AIProvider {
complete(messages: Message[], options?: { json?: boolean }): Promise<string>;
}Sources: server/ai/provider.ts:6-10
The following diagram illustrates the lifecycle of an AI request, from the frontend API client through the server router to the specific AI provider.
flowchart TD
Client[apiClient.ts] -->|POST /api/ai/*| Router[ai-routes.ts]
Router -->|Calls Task Handler| Task[tasks/explain.ts]
Task -->|executeAiTask| Service[service.ts]
Service -->|getAIProvider| ProviderFactory[provider.ts]
ProviderFactory -->|Returns| ProviderInstance[AIProvider Instance]
ProviderInstance -->|API Request| LLM[Gemini / OpenAI / Anthropic]
LLM -->|Response| Client
This flow ensures that authentication, caching, and task-specific logic are handled before interacting with the LLM. Sources: server/router/ai-routes.ts:38-42, server/ai/service.ts, src/lib/apiClient.ts:98-103
The system identifies and initializes providers using three primary methods: explicit environment variable configuration (AI_PROVIDER), API key pattern detection, or a manual key override passed from the client.
| Provider | Detection Pattern | Default Model | Env Variables |
|---|---|---|---|
| Google Gemini | Default / gemini
|
gemini-2.5-flash |
GEMINI_API_KEY, GEMINI_MODEL
|
| OpenAI | Starts with sk-
|
gpt-4o-mini |
OPENAI_API_KEY, OPENAI_MODEL
|
| Anthropic | Starts with sk-ant-
|
claude-3-5-haiku-latest |
ANTHROPIC_API_KEY, ANTHROPIC_MODEL
|
| Mock | mock |
N/A | AI_PROVIDER=mock |
Sources: server/ai/provider.ts:12-23, server/ai/provider.ts:70-82, README.md:95-103
The getAIProvider function manages singleton instances of providers to avoid redundant creations, except when a user provides a specific overrideKey.
flowchart TD
Start([getAIProvider]) --> HasOverride{overrideKey?}
HasOverride -->|Yes| Fresh[Create Fresh Provider]
HasOverride -->|No| CheckCache{Instance Cached?}
CheckCache -->|Yes| Match{Key Matches Env?}
Match -->|Yes| Return[Return Cached Instance]
Match -->|No| Create[Create New Instance]
CheckCache -->|No| Create
Create --> Return
Sources: server/ai/provider.ts:241-255
To provide accurate architectural insights, the integration uses a specialized utility to convert raw repository analysis data into a structured context string for the LLM.
The buildRepoContext function aggregates metadata, directory structures, important files, and recent activity into a formatted prompt section. This ensures the LLM has a "mental map" of the project before answering specific queries.
Sources: server/ai/prompts.ts:3-36
The SYSTEM_PROMPT defines the AI's persona as a "principal software architect and expert code reviewer." It enforces core principles:
- Specificity: References real file names and directory structures.
- Technical Depth: Uses terms like "request lifecycle" and "module boundary."
- Veracity: Strictly forbids fabricating files not present in the provided context.
Sources: server/ai/prompts.ts:38-60
The integration supports several specialized tasks, each with its own prompt logic and response formatting.
The explainRepo function handles scoped queries (repository, node, or file) and supports "ELI5" (Explain Like I'm 5) mode for beginners.
| Scope | Heading Structure | Purpose |
|---|---|---|
| Repo | Overview, Architectural Style, Execution Flow | High-level system understanding |
| Node | What this does, Why it matters, Where to look next | Module-specific analysis |
| File | Purpose, Role in data flow, Related files | Code-level inspection |
Sources: server/ai/tasks/explain.ts:28-65
When no API keys are provided or AI_PROVIDER is set to mock, the system uses createMockProvider. This provider returns canned JSON responses and Markdown summaries based on keywords found in the user prompt (e.g., "architecture", "roast", "suggest"). This allows developers to test the UI and data flow without incurring LLM costs.
Sources: server/ai/provider.ts:133-238
The AI Provider Integration in gitSdm provides a robust, extensible framework for translating complex codebase structures into human-readable insights. By abstracting provider-specific details and centralizing prompt engineering, the system maintains high technical accuracy while remaining flexible to new AI technologies.
Relevant source files
The following files were used as context for generating this wiki page:
AI Task Handlers constitute the core intelligence layer of the gitSdm platform, responsible for transforming raw repository analysis into actionable developer insights. These handlers interface between the backend services (like the GitHub client and repository analyzer) and various AI providers (Google Gemini, OpenAI, Anthropic) to perform specific analytical tasks such as architectural explanation, refactoring suggestions, and onboarding walkthroughs.
The system uses a standardized execution pattern via executeAiTask, which manages prompt construction, API communication, and response caching. This modular design allows the platform to offer features ranging from professional technical audits to creative "Repo Roasts," all while maintaining a consistent context provided by the repository's file structure and metadata.
Sources: server/ai/tasks/explain.ts, server/ai/provider.ts, README.md:92-105
The AI task system is organized into specialized modules within server/ai/tasks/, each focusing on a specific domain of repository intelligence. Requests are typically initiated via the API router and processed through a pipeline that builds repository context before invoking the AI provider.
When a specific task is requested (e.g., explainRepo), the handler performs the following steps:
-
Context Building: Calls
analyzeRepositoryto get the latest file tree, dependencies, and metadata. -
Prompt Engineering: Combines a global
SYSTEM_PROMPTwith task-specific instructions and the repository context usingbuildRepoContext. -
Task Execution: Passes the configuration to
executeAiTask, which handles the actual LLM call and result caching.
flowchart TD
Router[AI Routes Handler] -->|Call| TaskHandler[Specific Task Handler]
TaskHandler -->|Fetch| Analysis[Repo Analysis Service]
TaskHandler -->|Build| Prompt[Repo Context & User Prompt]
TaskHandler -->|Execute| Service[AI Task Service]
Service -->|Request| AIProvider[AI Provider - Gemini/OpenAI/Anthropic]
AIProvider -->|Response| Service
Service -->|Return| Router
The diagram shows the standard flow from an incoming API request to the AI provider response. Sources: server/router/ai-routes.ts:32-135, server/ai/tasks/explain.ts:18-40, server/ai/tasks/playground.ts:16-45
The explanation handlers provide different levels of granularity for codebase understanding. They support specific scopes such as the entire repository, a graph node, or a single file.
| Function | Scope | Output Description |
|---|---|---|
explainRepo |
Repo/Node/File | Markdown explanation covering purpose, importance, and next steps. |
explainArchitecture |
Repository | JSON defining technical overview and specific architectural layers. |
explainRepoELI5 |
Repository | Simplified, conversational explanation for beginners. |
Sources: server/ai/tasks/explain.ts:8-70, server/ai/tasks/onboarding.ts:79-115
These tasks are designed to reduce the time it takes for new developers to understand unfamiliar codebases.
-
suggestFiles: Identifies 8-10 critical files to read first, categorized by priority (high, medium, low). -
generateOnboarding: Creates a 6-step walkthrough that builds a mental model from entry points to deployment. -
generateLearningPath: Produces a deep JSON structure including a "Mental Model," "Recommended Path," and "Execution Flow" mapping specific file-to-file data transitions.
Sources: server/ai/tasks/onboarding.ts:10-77, server/ai/tasks/playground.ts:109-150
Handlers in the refactor.ts module perform rigorous codebase assessments, scoring various dimensions of code quality.
graph TD
subgraph HealthDimensions[Health Assessment Scores]
M[Maintainability]
MOD[Modularity]
R[Readability]
A[Architecture]
C[Complexity]
end
RefactorTask[Refactor Handler] -->|Produces| Suggestions[Refactor Suggestions]
HealthTask[Health Handler] -->|Produces| HealthDimensions
Suggestions -->|Fields| Title[Title]
Suggestions -->|Fields| Risk[Risk Level: High/Med/Low]
Suggestions -->|Fields| Files[Affected Files]
Visualization of the data structures returned by quality-focused handlers. Sources: server/ai/tasks/refactor.ts:11-30, server/ai/tasks/refactor.ts:86-110
The AIProvider interface abstracts different LLM backends. The system detects the provider based on the AI_PROVIDER environment variable or the format of the provided API key (e.g., sk-ant- for Anthropic).
| Provider | Implementation Detail | Default Model |
|---|---|---|
| Gemini | Uses @google/genai
|
gemini-2.5-flash |
| OpenAI | Uses openai SDK |
gpt-4o-mini |
| Anthropic | Uses @anthropic-ai/sdk
|
claude-3-5-haiku-latest |
| Mock | Local fallback with hardcoded templates | N/A |
Sources: server/ai/provider.ts:25-58, server/ai/provider.ts:68-75, server/ai/provider.ts:106-115
The buildRepoContext function (found in server/ai/prompts.ts) is critical for ensuring the AI has sufficient data to fulfill its "senior engineer" persona. It flattens the repository structure into a string containing:
- Metadata: Language, stars, topics, and license.
- Structure: Root-level directories and a list of up to 120 detected files.
- Dependencies: Up to 40 dependencies with version and ecosystem info.
- Activity: Recent commit counts and top contributors.
Sources: server/ai/prompts.ts:3-37
// server/ai/prompts.ts:3-10
export function buildRepoContext(analysis: RepoAnalysis, extra?: string): string {
const topDirs = analysis.tree.map((n) => n.name).slice(0, 20).join(', ');
const deps = analysis.dependencies
.slice(0, 40)
.map((d) => `${d.name}${d.version ? `@${d.version}` : ''} (${d.ecosystem}, ${d.type})`)
.join('\n ');
// ... building flat file list and metadata string
}Sources: server/ai/prompts.ts:3-10
Beyond technical analysis, the handlers support "Playground" features that offer creative outputs:
- Repo Roast: A sarcastic, witty critique of the codebase referencing real files and structural decisions.
-
Readme Enhancer: Generates a professional
README.mdwith badges, value propositions, and installation instructions based on detected package managers.
Sources: server/ai/tasks/playground.ts:12-70
AI Task Handlers serve as the bridge between repository data and developer-friendly insights. By combining standardized context building with specialized prompt engineering, they enable gitSdm to provide deep architectural understanding, quality audits, and educational paths. The architecture is provider-agnostic, supporting major LLM engines while providing a robust mock fallback for local development and testing.
Sources: README.md:92-105, server/ai/tasks/explain.ts, server/ai/provider.ts
Relevant source files
The following files were used as context for generating this wiki page:
The System Prompts Configuration in gitSdm defines the AI's persona, operational boundaries, and technical expertise when interacting with developers. It is a multi-layered system that combines a global identity with specialized task-specific instructions to provide high-fidelity repository intelligence. The configuration ensures the AI acts as a "principal software architect," delivering specific, technical, and empathetic insights rather than generic summaries.
At its core, the system relies on the SYSTEM_PROMPT variable to establish a consistent voice, while dynamic functions like buildRepoContext inject structured data about the repository's tree, dependencies, and activity into the LLM's context window. Sources: server/ai/prompts.ts:47-51, server/ai/prompts.ts:5-45
The global SYSTEM_PROMPT serves as the foundation for almost all AI tasks in the platform. It explicitly defines the AI's mission: to provide "INSTANT, GENUINE understanding" of a codebase.
The prompt enforces five core principles to maintain high output quality:
- Specificity: References to actual filenames and directory structures are mandatory. Sources: server/ai/prompts.ts:56-58
- Senior Engineering Mindset: Identifying architectural tradeoffs and implementation choices. Sources: server/ai/prompts.ts:60-62
- Developer Empathy: Tailoring responses to onboarding, change management, or quality evaluation. Sources: server/ai/prompts.ts:64-67
- Technical Precision: Using industry-standard terms like "request lifecycle" and "hot path." Sources: server/ai/prompts.ts:69-70
- Factuality: A strict prohibition against fabricating files or dependencies. Sources: server/ai/prompts.ts:72-73
To make the AI persona effective, the system must provide a structured view of the repository. This is handled by buildRepoContext, which transforms a RepoAnalysis object into a formatted string for the LLM.
| Section | Content Description | Sources |
|---|---|---|
| Metadata | Full name, description, stars, license, and default branch. | server/ai/prompts.ts:25-32 |
| Structure | Top-level directories (up to 20) and entry/important files (up to 60). | server/ai/prompts.ts:6-10 |
| File List | A flat list of up to 120 detected files for deep path context. | server/ai/prompts.ts:13-20 |
| Dependencies | Up to 40 dependencies with versioning, ecosystem, and type data. | server/ai/prompts.ts:7-9 |
| Activity | Recent commit patterns and top contributors. | server/ai/prompts.ts:22-23 |
The following diagram illustrates how the SYSTEM_PROMPT and repository data are merged during an AI request:
graph TD
subgraph InputData [Repository Data]
A[RepoAnalysis Object]
B[Metadata/Timeline]
C[File Tree/Deps]
end
subgraph PromptEngine [Prompt Generation]
D[buildRepoContext Function]
E[Global SYSTEM_PROMPT]
F[Task-Specific userPrompt]
end
A --> D
B --> D
C --> D
D --> G[Final LLM Payload]
E --> G
F --> G
G --> H[AI Provider Client]
H --> I[OpenAI / Gemini / Anthropic]
The diagram shows the flow of raw analysis data through the context builder, where it is combined with static system prompts and task-specific user instructions before being dispatched to the AI provider.
While the global persona remains constant, individual modules define specialized userPrompt templates to achieve specific outcomes.
The system configures the AI to generate graph LR (Left-to-Right) flowcharts, grouping components into subgraphs such as "Entry Points," "Services," and "Utilities." It also requires specific CSS-like class definitions for node styling (e.g., class NodeId entry). Sources: server/ai/tasks/diagram.ts:16-46
Prompts for onboarding focus on a 6-step progression:
- Mental Model
- Entry Point/Startup
- Routing Lifecycle
- Business Logic
- Data/State Layer
- Config/Deployment Sources: server/ai/tasks/onboarding.ts:72-83
The qa-engine.ts uses a distinct SYSTEM_PROMPT that ignores the global architect persona in favor of a strict "Codebase Analysis Assistant." This assistant is restricted to answering only from provided code chunks and must follow a rigid Markdown structure:
- ### Summary
- ### How it works
- ### Related files Sources: server/search/qa-engine.ts:72-88
The AIProvider interface handles how the system role is transmitted to different LLM services.
sequenceDiagram
participant S as AI Task Service
participant P as AI Provider Manager
participant G as Gemini Provider
participant A as Anthropic Provider
S->>P: complete(messages, options)
alt Provider is Gemini
P->>G: extract system role
Note right of G: Injected as systemInstruction
G-->>P: Generate response
else Provider is Anthropic
P->>A: extract system role
Note right of A: Passed via system parameter
A-->>P: Generate response
end
P-->>S: Return formatted string
The diagram demonstrates that while the prompts are defined centrally, the providers handle the "system" message role according to their specific API requirements (e.g., systemInstruction in Gemini vs. a system parameter in Anthropic). Sources: server/ai/provider.ts:81-91, server/ai/provider.ts:143-153
| Feature | Key Logic / Constant | Role | Sources |
|---|---|---|---|
| Persona |
SYSTEM_PROMPT (Global) |
Establishes the "Principal Architect" identity and principles. | server/ai/prompts.ts:47-78 |
| Context | buildRepoContext |
Serializes AST and file tree analysis for LLM ingestion. | server/ai/prompts.ts:5-45 |
| Strict Q&A |
SYSTEM_PROMPT (QA) |
Enforces scannability and prevents LLM hallucinations. | server/search/qa-engine.ts:72-88 |
| Formatting | Task-specific JSON schemas | Ensures AI output matches internal TypeScript interfaces. | server/ai/tasks/refactor.ts:108-111 |
| Fallback | mockFallback |
Provides deterministic data when AI providers are unavailable. | server/ai/tasks/playground.ts:98-110 |
The System Prompts Configuration in gitSdm is a highly structured framework that balances a unified architectural persona with task-specific constraints. By combining deep repository context with strict engineering principles, the system ensures that AI-generated insights remain technically accurate, relevant to the specific codebase, and formatted for immediate developer utility.
Relevant source files
The following files were used as context for generating this wiki page:
Graphify Agent Integration represents the core mechanism within gitSdm for transforming flat repository structures into interactive, graph-first architectural visualizations. This system utilizes AI agents and specialized command-line tools to analyze file dependencies, classify modules, and generate topological maps that facilitate instant codebase understanding for developers.
The integration serves as the bridge between raw source code and the visual workspace, providing automated updates to the repository's internal mapping whenever structural changes occur. It specifically leverages the graphify tool to maintain directory-topology consistency and informs AI-driven features like Mermaid diagram generation and architecture summaries.
Sources: README.md:1-25, CONTRIBUTING.md:65-69
The Graphify integration operates as a multi-stage pipeline that ingests GitHub repository data and produces a visual model. The architecture is divided into a backend analysis layer and a frontend rendering layer.
The following diagram illustrates the lifecycle of a repository analysis request, from the user input to the final graph visualization and AI enrichment.
flowchart TD
User[User Input URL] --> Router[API Router]
Router --> Ingest[GitHub Tree Fetcher]
Ingest --> Parser[Dependency Analyzer]
Parser --> Builder[Graph Builder Engine]
Builder --> Layout[Dagre Layout Engine]
Layout --> UI[React Flow Canvas]
UI --> AI[AI Agent Insights]
AI -.-> UI
The system leverages dagre for initial layout math and @xyflow/react (React Flow) for the interactive canvas.
Sources: README.md:104-124, server/ai/tasks/diagram.ts:40-52
| Component | Responsibility | Relevant Files |
|---|---|---|
| Graphify CLI | Updates interactive directory-topology mapping and verifies AST parser compatibility. | CONTRIBUTING.md |
| Graph Builder | Constructs nodes and edges based on file paths and classified dependencies. |
README.md, src/components/viz/architecture/mermaid-generator.ts
|
| AI Task Handlers | Generates logical system architecture summaries and Mermaid diagrams via LLMs. |
server/ai/tasks/diagram.ts, server/ai/tasks/explain.ts
|
| Mermaid Generator | Programmatically generates Mermaid.js code by scoring file connectivity and importance. | src/components/viz/architecture/mermaid-generator.ts |
To maintain the accuracy of the dependency map during development, the integration provides a specific workflow for contributors. When new files are added or exports are modified, the graphify agent must be invoked to synchronize the internal model.
pnpm exec graphify update .Sources: CONTRIBUTING.md:65-69
The integration includes two distinct modes for generating architecture diagrams: Programmatic and AI-Enhanced.
The generateProgrammaticMermaid function calculates a connectivity score for every file in the analysis. This score is determined by the sum of incoming and outgoing edges, with bonuses applied to entry points and files marked as "important."
flowchart TD
Start[Get File Nodes] --> Connectivity[Calculate In/Out Edges]
Connectivity --> Scoring[Apply Entry & Importance Bonuses]
Scoring --> Sort[Sort by Score]
Sort --> Slice[Take Top 25 Nodes]
Slice --> Subgraphs[Group by Folder Path]
Subgraphs --> Output[Generate Mermaid Code]
Sources: src/components/viz/architecture/mermaid-generator.ts:20-60
The generateMermaidDiagram task utilizes the SYSTEM_PROMPT to instruct an AI provider to create a readable architecture flowchart. It groups components into logical subgraphs such as "Entry Points," "Services," and "Utilities."
The integration applies specific CSS classes to nodes within Mermaid diagrams to provide visual context:
-
entry: Gateways or main entry points. -
service: Business logic modules. -
router: Controllers or request handlers. -
util: Helpers and parsers. -
db: Persistence or external API integrations. -
config: Configuration files. Sources: server/ai/tasks/diagram.ts:16-36, src/components/viz/architecture/mermaid-generator.ts:88-95
Graphify Agent Integration is the foundational technology that enables gitSdm to deliver "instant architecture overviews." By combining automated AST parsing via the graphify CLI with intelligent AI-driven summarization, the system creates a live, interactive map of complex software projects, significantly reducing developer onboarding time.
Sources: README.md:15-30, CONTRIBUTING.md:65-69
Relevant source files
The following files were used as context for generating this wiki page:
The Workspace & Layout System in gitSdm serves as the primary interactive interface for repository analysis. It transforms static GitHub repository data into a "graph-first" environment where developers can visualize file structures, module boundaries, and architectural dependencies. The system is designed to provide instant insight that typically requires extensive manual code review.
The workspace is divided into several specialized functional zones: a central visualization canvas powered by React Flow, a file explorer sidebar, an AI-driven intelligence panel, and specialized views for system architecture. This layout is managed through a combination of global state (via Zustand) and modular React components.
Sources: README.md:14-25, src/components/home/HeroSection.tsx:43-58
The workspace follows a modular IDE-like structure. It is composed of four primary regions that coordinate to display repository data: the Header, the Explorer (Left Sidebar), the Visualization Canvas (Center), and the Analysis Panel (Right Sidebar).
The interface utilizes a "Fake IDE" metaphor to provide a familiar environment for developers.
- Header: Displays repository metadata (owner, repo name), the current active branch, and the parsing status.
- Explorer Sidebar: A hierarchical tree view of the repository's directories and files, allowing for manual navigation.
-
Visualization Canvas: The main area where
d3-forceanddagrelayout algorithms render the repository as an interactive graph of nodes (files/folders) and edges (dependencies). - Intelligence/Analysis Sidebar: Houses the AI Center, health audits, and contributor analytics.
Sources: src/components/home/HeroSection.tsx:59-125, README.md:54-85
The following diagram illustrates the spatial arrangement and component distribution of the gitSdm workspace.
flowchart TD
subgraph UI_Workspace [Workspace Layout]
direction TB
TopNav[Top Navigation: Branch Switcher & Status]
subgraph Main_Content [Main Interaction Area]
direction LR
Explorer[Explorer Panel: File Tree]
Canvas[Graph Canvas: React Flow View]
AnalysisPanel[Analysis Sidebar: AI & Stats]
end
StatusBar[Status Bar: File/Import Counts]
end
TopNav --- Main_Content
Main_Content --- StatusBar
Explorer --- Canvas
Canvas --- AnalysisPanel
Explanation: This diagram shows the high-level layout of the workspace, highlighting the relationship between navigation, the primary interactive canvas, and the supporting sidebars. Sources: src/components/home/HeroSection.tsx:59-165
The canvas is the core of the workspace, utilizing @xyflow/react (React Flow) for rendering. It handles the interactive mapping of nodes representing files and folders. Nodes are styled based on their file type and degree of coupling.
-
Node Interaction: Clicking a node triggers the
focusOnNodehelper, which centers the graph on the selection and updates the globalvizStorewith the focused file path. -
Visual Classification: Nodes are color-coded (e.g.,
#3b82f6for.tsfiles) and sized according to their role (Repo: 14, Folder: 12, File: 8).
Sources: src/components/viz/OverviewTab.tsx:23-42, server/graph/layout.test.ts:46-60
The ArchitectureView component provides a specialized mode for viewing system-level diagrams. It supports two distinct modes:
- Code Graph: Programmatically built via static import analysis.
- AI Enhanced: A logical system architecture summarized by AI using Mermaid-style block diagrams.
Sources: src/components/viz/ArchitectureView.tsx:48-68, src/components/viz/ArchitectureView.tsx:244-255
The AI Center acts as a context-aware toolset within the workspace. It includes:
- Health Audit: Displays scores for maintainability, modularity, and readability.
- Risk Identification: Lists specific files affected by high coupling or architectural debt.
- Playground: Features tools like "Repo Roast" and "README Enhancer."
Sources: src/components/viz/ai-sidebar/AiCenterTab.tsx:143-200, src/components/viz/ai-sidebar/AiCenterTab.tsx:288-320
Interaction in one part of the workspace often triggers updates across other components. For example, selecting a file in the "Suggested Reading" list within the OverviewTab will re-center the React Flow canvas on that specific node.
The following diagram shows the data flow when a user selects a file or node.
sequenceDiagram
participant User as "User Interface"
participant Store as "Zustand (vizStore)"
participant RF as "React Flow Instance"
participant Sidebar as "Analysis Sidebar"
User->>RF: Clicks Node / Selects File
RF->>Store: setSelectedNodeId(nodeId)
Store->>Store: setFocusedFilePath(path)
Store-->>Sidebar: Update Metadata/AI Context
RF->>RF: setCenter(x, y, zoom)
Sidebar-->>User: Display File Details
Explanation: This sequence illustrates how global state coordinates between the interactive graph canvas and the information panels. Sources: src/components/viz/OverviewTab.tsx:23-42, src/components/viz/ai-sidebar/AiCenterTab.tsx:52-65
| Feature | Description | Implementation |
|---|---|---|
| Dagre Layout | Automatically positions nodes in Top-Bottom (TB) or Left-Right (LR) hierarchies. | applyDagreLayout |
| Branch Diffing | Visualizes added, modified, and deleted files when comparing branches. |
OverviewTab (graphDiff) |
| Pan & Zoom | High-precision navigation of large codebases. | useArchitecturePanZoom |
| Export System | Allows downloading diagrams as SVG, PNG, or Mermaid code. | useArchitectureExport |
| Stats Integration | Displays file/folder counts and commit density timelines. | OverviewTab |
Sources: server/graph/layout.test.ts:7-35, src/components/viz/OverviewTab.tsx:75-150, src/components/viz/ArchitectureView.tsx:180-200
The Workspace & Layout System provides the structural foundation for the gitSdm intelligence platform. By integrating complex graph layouts with modular analysis tabs, it allows developers to move from a high-level "Big Picture" understanding of a repository down to specific file-level details and AI-generated insights seamlessly.
Sources: README.md:158-166
Relevant source files
The following files were used as context for generating this wiki page:
The D3-Force Interactive Canvas is a core visualization component of gitSdm used to render repository structures as dynamic, force-directed networks. It enables developers to explore file relationships, dependency clusters, and module boundaries through an interactive 2D environment powered by @xyflow/react (React Flow) and D3 physics engines.
This system transforms static repository data into a "network" layout where nodes represent files or folders and edges represent imports or dependencies. It provides advanced features such as blast radius calculation, real-time filtering, and a synchronized minimap for navigation. Sources: README.md:139-144, src/features/graph/canvas/hooks/useForceCanvasState.ts:40-55
The canvas architecture relies on a unidirectional data flow where raw graph data is processed into force-directed simulation data, which is then rendered and synchronized with the global application state.
The useForceCanvasState hook acts as the primary orchestrator, bridging the gap between the useVizStore (Zustand) and the rendering engine. It handles node selection, hover states, and dynamic filtering based on file types or node categories.
Sources: src/features/graph/canvas/hooks/useForceCanvasState.ts:21-39
The following diagram illustrates how the different hooks and components interact to maintain the canvas state:
flowchart TD
Store[Zustand vizStore] -->|Filters/Selection| Hook[useForceCanvasState]
Hook -->|Simulation Data| D3[useD3Physics]
Hook -->|Export Actions| Export[useGraphExport]
Hook -->|Sync Viewport| Sync[useForceSync]
D3 -->|Update Positions| Canvas[React Flow Canvas]
Canvas -->|Viewport Data| Minimap[ForceMinimap]
The flow ensures that any change in global filters immediately updates the force simulation and the resulting visual layout. Sources: src/features/graph/canvas/hooks/useForceCanvasState.ts:153-176
The canvas utilizes D3-force simulations to calculate node positions dynamically when the network layout is active.
The useD3Physics hook configures the forces acting upon the nodes, including:
- Charge Force: Prevents nodes from overlapping.
- Link Force: Pulls connected nodes together based on dependency relationships.
- Center Force: Keeps the entire graph centered within the viewport.
While the project supports multiple layout types (like Dagre), the network mode triggers the specialized useD3Physics logic.
Sources: src/features/graph/canvas/hooks/useForceCanvasState.ts:147-151, src/features/graph/canvas/force/useD3Physics.ts:1-10
The D3-Force canvas supports several advanced interactive tools for codebase analysis:
| Feature | Description | Implementation Details |
|---|---|---|
| Blast Radius | Visualizes the "impact zone" of a file change. | Calculated via computeBlastRadius based on transitive dependencies. |
| Minimap | Provides a high-level overview of the graph. | Uses a secondary 2D canvas to render node dots and a viewport bounding box. |
| Node Filtering | Hides/shows nodes based on type. | Uses buildForceGraphData with nodeTypeFilters and fileTypeFilters. |
| Auto-Centering | Focuses the view on selected files. | Orchestrated by useForceSync to transition the camera to specific node coordinates. |
Sources: src/features/graph/canvas/hooks/useForceCanvasState.ts:65-75, src/features/graph/canvas/force/ForceMinimap.tsx:50-80, README.md:145-150
The blast radius feature allows users to see which files are affected if a specific node is modified.
sequenceDiagram
participant User
participant Store as vizStore
participant Logic as useForceCanvasState
participant Radius as forceGraphUtils
User->>Store: Toggle Blast Radius
Store->>Logic: blastRadiusActive = true
Logic->>Radius: computeBlastRadius(selectedNodeId, edges)
Radius-->>Logic: Set of affected node IDs
Logic->>Store: setHighlightedNodeIds(affectedNodes)
Store->>User: UI highlights impacted nodes
Sources: src/features/graph/canvas/hooks/useForceCanvasState.ts:81-85, src/features/graph/canvas/hooks/useForceCanvasState.ts:107-113
Navigation is supported by the ForceMinimap component, which renders a simplified version of the network on a separate HTML5 canvas element.
The minimap calculates the graph's bounding box to scale the entire network into a 200x150 preview window. It draws:
- Nodes: Represented as small arcs/dots colored by their category.
- Viewport Bounds: A translucent rectangle representing the area currently visible on the main canvas. Sources: src/features/graph/canvas/force/ForceMinimap.tsx:15-45
To calculate the viewport bounds, the minimap retrieves the current zoom and center coordinates from the force graph API:
const center = fg.centerAt();
const zoom = fg.zoom();
const halfWidthInD3 = (width / 2) / zoom;
const halfHeightInD3 = (height / 2) / zoom;Sources: src/features/graph/canvas/force/ForceMinimap.tsx:28-31, src/features/graph/canvas/force/ForceMinimap.tsx:71-74
The D3-Force Interactive Canvas is the primary interface for visual dependency exploration in gitSdm. By combining D3's physics-based layout engine with React Flow's canvas management, it provides a performant environment for analyzing complex codebases. Its tight integration with the global vizStore ensures that features like blast radius, filtering, and cross-branch comparisons are reflected instantly in the visual network.
Sources: README.md:139-150, src/features/graph/canvas/hooks/useForceCanvasState.ts:1-20
Relevant source files
The following files were used as context for generating this wiki page:
Mermaid Architecture Generation is a core feature within the gitSdm platform that provides users with two distinct methods for visualizing repository structure: Code Graph (programmatically generated from static analysis) and AI Enhanced (generated using Large Language Models). These visualizations are rendered as interactive Mermaid.js flowcharts, allowing developers to quickly grasp module boundaries, entry points, and system data flows.
The system orchestrates data from the repository analysis, applies layout logic, and utilizes a custom-themed Mermaid.js configuration to render high-quality SVGs. Users can interact with the resulting diagrams through pan/zoom controls and export them in multiple formats, including PNG, SVG, and raw Mermaid code.
Sources: src/components/viz/ArchitectureView.tsx:1-40, server/ai/tasks/diagram.ts:10-30
The generation process is managed through a combination of frontend hooks and backend AI tasks. The primary entry point is the ArchitectureView component, which toggles between manual programmatic generation and AI-driven generation modes.
| Mode | Generation Method | Logic Source | Use Case |
|---|---|---|---|
| Code Graph | Programmatic | mermaid-generator.ts |
Fast, deterministic mapping of all significant files and folders. |
| AI Enhanced | LLM Task | server/ai/tasks/diagram.ts |
High-level logical grouping and human-readable architecture summaries. |
Sources: src/components/viz/ArchitectureView.tsx:64-100, src/components/viz/architecture/hooks/useArchitectureState.ts:25-40
The following diagram illustrates the flow from repository analysis to the final rendered visualization in the UI.
flowchart TD
Analysis[Repo Analysis Data] --> ModeSelect{Mode Selection}
ModeSelect -- "Code Graph" --> Programmatic[generateProgrammaticMermaid]
ModeSelect -- "AI Enhanced" --> AI[AI Task: generateMermaidDiagram]
Programmatic --> Code[Raw Mermaid Code]
AI --> Code
Code --> Strip[stripMermaidFences]
Strip --> Render[Mermaid.render]
Render --> UI[ArchitectureView Canvas]
subgraph Frontend
ModeSelect
Programmatic
Strip
Render
UI
end
subgraph Backend
AI
end
Sources: src/components/viz/architecture/hooks/useArchitectureState.ts:28-60, server/ai/tasks/diagram.ts:10-50
The generateProgrammaticMermaid function creates a flowchart based on the file connectivity found in the RepoAnalysis object. It prioritizes nodes based on their importance and connectivity to ensure the diagram remains readable.
- Connectivity Calculation: Maps all nodes and edges to determine the degree (incoming + outgoing) of each file.
-
Node Scoring:
- Base score = degree.
-
+10for entry point files. -
+5for files marked as "important" in the analysis.
- Filtering: Selects the top 25 scored nodes to avoid clutter.
- Grouping: Organizes files into subgraphs based on their directory paths.
Sources: src/components/viz/architecture/mermaid-generator.ts:22-60
The generator applies specific CSS classes to nodes based on their classification in the repository:
-
entry: For entrypoint files. -
config: For configuration manifests. -
test: For test suites. -
service: Default for business logic files.
Sources: src/components/viz/architecture/mermaid-generator.ts:98-105
The AI-enhanced mode utilizes a specialized prompt to generate logical architectural summaries. This task is processed on the backend via the generateMermaidDiagram function.
The LLM is instructed to:
- Use a Left-to-Right layout (
graph LR). - Group components into logical subgraphs such as "Entry Points", "Services", and "Database".
- Limit the diagram to 15-20 nodes.
- Classify nodes using specific class lines (e.g.,
class NodeId router;).
Sources: server/ai/tasks/diagram.ts:20-55
sequenceDiagram
participant U as User Interface
participant H as useArchitectureState
participant S as AI Service (Backend)
participant M as Mermaid Engine
U->>H: Toggle to AI Mode
H->>S: Request: generateMermaidDiagram(owner, repo)
S-->>H: Return Mermaid Code Block
H->>H: stripMermaidFences()
H->>M: render(id, code)
M-->>H: Rendered SVG String
H->>U: Update View with SVG
Sources: src/components/viz/architecture/hooks/useArchitectureState.ts:21-70, server/ai/tasks/diagram.ts:10-20
The mermaid-config.ts file defines a custom dark-themed appearance for all diagrams. It uses a "base" theme with overrides to match the gitSdm UI aesthetic.
-
Background:
#09090b -
Primary Color:
#238636(GitHub-style green) -
Node Border:
#3f3f46 -
Text Color:
#f4f4f5
Sources: src/components/viz/architecture/mermaid-config.ts:11-30
The ArchitectureView provides a canvas with pan and zoom capabilities via useArchitecturePanZoom. The useArchitectureExport hook manages the extraction of the diagram in various formats:
- Copy Mermaid Code: Extracts the raw text and strips fences.
- Copy SVG: Serializes the DOM SVG element to a string.
-
Download PNG: Uses
html-to-imageto convert the SVG to a high-quality raster image.
Sources: src/components/viz/ArchitectureView.tsx:100-150, src/components/viz/architecture/hooks/useArchitectureExport.ts:27-105
Mermaid Architecture Generation provides a multi-faceted view of a codebase by bridging static analysis and AI insights. By utilizing programmatic scoring for detail and LLM-based grouping for conceptual understanding, it allows developers to interact with a repository's structure visually. The system ensures high performance through node filtering and professional presentation through extensive custom Mermaid.js styling.
Sources: src/components/viz/ArchitectureView.tsx:320-335, server/ai/tasks/diagram.ts:10-15
Relevant source files
The following files were used as context for generating this wiki page:
The gitSdm platform utilizes a robust configuration system primarily driven by environment variables to manage its multi-layered architecture. These configurations govern integration with external services, including the GitHub API and various Large Language Model (LLM) providers such as Google Gemini, OpenAI, and Anthropic.
The system is designed to be "zero-config" for basic development by defaulting to a mock provider when no API keys are present, while allowing granular control over model selection and API versions in production environments. Sources: README.md:106-119, server/ai/provider.ts:25-45
The backend services rely on several categories of environment variables defined in a .env file (copied from .env.example). These variables handle authentication, service selection, and model parameters. Sources: CONTRIBUTING.md:28-32
| Variable | Description | Default / Options |
|---|---|---|
GITHUB_TOKEN |
Optional. Increases GitHub API rate limits for public repositories. | N/A |
AI_PROVIDER |
Defines the active LLM service. |
mock, gemini, openai, anthropic
|
Sources: README.md:108-111
Each provider has specific configuration variables for API keys, model identifiers, and API base URLs.
| Provider | API Key Variable | Model Variable | Default Model |
|---|---|---|---|
| Gemini | GEMINI_API_KEY |
GEMINI_MODEL |
gemini-2.5-flash |
| OpenAI | OPENAI_API_KEY |
OPENAI_MODEL |
gpt-4o-mini |
| Anthropic | ANTHROPIC_API_KEY |
ANTHROPIC_MODEL |
claude-3-5-haiku-latest |
Sources: README.md:112-119, server/ai/provider.ts:55-57, server/ai/provider.ts:88-89, server/ai/provider.ts:114-115
The application uses an automated detection sequence to initialize the AIProvider. The priority is determined first by an explicit AI_PROVIDER variable, followed by the presence of specific API keys.
flowchart TD
Start[Get AI Provider] --> CheckOverride{Override Key Provided?}
CheckOverride -- Yes --> DetectType[Detect Type from Key Prefix]
CheckOverride -- No --> CheckEnvVar{AI_PROVIDER Env Set?}
DetectType --> CreateInstance[Create Specific Provider]
EnvCheckValue{Value?}
CheckEnvVar -- Yes --> EnvCheckValue
EnvCheckValue -- "gemini/openai/anthropic" --> CreateInstance
EnvCheckValue -- "mock" --> Mock[Create Mock Provider]
CheckEnvVar -- No --> KeyAutoDetect{Check API Keys}
KeyAutoDetect -- "GEMINI_API_KEY exists" --> G[Gemini]
KeyAutoDetect -- "OPENAI_API_KEY exists" --> O[OpenAI]
KeyAutoDetect -- "ANTHROPIC_API_KEY exists" --> A[Anthropic]
KeyAutoDetect -- "No keys found" --> Mock
G --> CreateInstance
O --> CreateInstance
A --> CreateInstance
This diagram shows the decision logic used to select the AI backend. Sources: server/ai/provider.ts:25-52
The system can infer the provider type from the format of an overrideKey:
-
sk-ant-: Identified asanthropic. -
sk-: Identified asopenai. - Otherwise defaults to
gemini.
Sources: server/ai/provider.ts:10-23
While most configuration resides on the server, the frontend manages user-interface preferences and synchronization with the server-side environment.
The application performs an initial theme check during the HTML document's head execution to prevent "Flash of Unstyled Content" (FOUC). It synchronizes the colorScheme based on localStorage.
// Logic inside index.html
var theme = localStorage.getItem('theme');
if (theme === 'light') {
document.documentElement.classList.add('light');
document.documentElement.style.colorScheme = 'light';
} else {
document.documentElement.style.colorScheme = 'dark';
}Sources: index.html:26-34
The frontend fetches application configuration from the /api/config endpoint (handled via fetchAppConfig) to maintain state awareness of active features and limits. Sources: src/components/home/HeroSection.tsx:18-23
If no environment variables are provided, the system defaults to a mock provider. This provider does not perform external network calls to LLMs but instead returns predefined responses based on the query content (e.g., "architecture", "suggest", "onboarding", "ELI5"). Sources: server/ai/provider.ts:133-219
The mock mode also utilizes local static data to simulate GitHub API responses for specific repositories like mbayue/gitSdm or a generic mock-todo-app. Sources: server/github/mock-data.ts:5-66
The gitSdm configuration system prioritizes flexibility and ease of setup. By leveraging environment variables, the platform can seamlessly switch between different LLM ecosystems or operate in a fully mocked local environment for development and testing. Key variables like AI_PROVIDER and GITHUB_TOKEN ensure the application scales from local prototypes to production-grade repository analysis.
Relevant source files
The following files were used as context for generating this wiki page:
Vercel Serverless Deployment serves as the primary hosting and execution infrastructure for the gitSdm platform. It utilizes Vercel's serverless functions to handle backend logic, including GitHub repository ingestion, AI-driven analysis, and dependency graph generation. This architecture allows the project to scale dynamically while maintaining a clear separation between the React-based frontend and the Node.js backend services.
The deployment infrastructure is defined through a combination of configuration files, such as vercel.json, and specialized handlers like server/vercel-handler.ts which bridge the application's internal API router with Vercel's execution environment. This setup supports the project's mission of providing instant repository intelligence without requiring persistent server management.
Sources: README.md:46-59, package.json:69-69, server/vercel-handler.ts
The system follows a modular architecture where the frontend is a React SPA and the backend consists of serverless handlers. These handlers orchestrate tasks between the GitHub API and various AI providers.
flowchart TD
User[Developer Browser] -->|Web Request| Vercel[Vercel Edge/Serverless]
Vercel -->|Static Assets| Frontend[React + Vite SPA]
Vercel -->|API Calls| ServerlessFunc[Serverless Functions]
ServerlessFunc -->|Route Handling| APIRouter[API Router]
APIRouter -->|Task Execution| AIService[AI Provider Manager]
APIRouter -->|Data Fetching| GitHubService[GitHub Tree Fetcher]
AIService -->|Queries| Gemini[Google Gemini]
AIService -->|Queries| OpenAI[OpenAI]
AIService -->|Queries| Anthropic[Anthropic Claude]
The diagram illustrates the request lifecycle from the browser through Vercel's serverless infrastructure to the internal service layers and external AI providers. Sources: README.md:46-60, server/ai/provider.ts:47-65
The backend is organized into modular serverless functions located in the api/ directory. These functions are mapped to specific routes via the Vercel deployment configuration.
The project utilizes vercel.json to define the runtime environment and routing rules. Key configurations include the use of the @vercel/node runtime for backend functions and the mapping of API paths.
| Configuration Key | Value / Description |
|---|---|
runtime |
@vercel/node (defined in package.json devDependencies) |
functions |
Configures memory and execution limits for handlers in api/
|
api/ai/* |
Handlers for AI tasks like architecture analysis and onboarding |
api/repo/* |
Handlers for repository ingestion and analysis |
Sources: vercel.json, package.json:88-88, README.md:47-47
Specific AI capabilities are exposed through dedicated serverless endpoints. For example, the api/ai/architecture.ts handler manages deep system architecture analysis by interfacing with the AI service layer.
sequenceDiagram
participant Client as Client Browser
participant Handler as api/ai/architecture.ts
participant Service as server/ai/service.ts
participant Provider as AI Provider (Gemini/OpenAI)
Client->>Handler: POST /api/ai/architecture
Handler->>Service: executeAiTask('architecture', ...)
Service->>Provider: complete(messages, options)
Provider-->>Service: JSON Architecture Data
Service-->>Handler: Parsed Task Result
Handler-->>Client: 200 OK (Architecture JSON)
This sequence shows how a specific AI architectural request is handled within the serverless environment. Sources: api/ai/architecture.ts, server/ai/provider.ts:74-92
The serverless environment dynamically detects and initializes AI providers based on environment variables. This allows the deployment to support multiple LLMs seamlessly.
The createProvider function in server/ai/provider.ts prioritizes the AI_PROVIDER environment variable, falling back to auto-detection based on available API keys.
// server/ai/provider.ts:47-65
if (process.env.AI_PROVIDER) {
const envProvider = process.env.AI_PROVIDER.toLowerCase();
if (envProvider === 'gemini' || envProvider === 'openai' || envProvider === 'anthropic' || envProvider === 'mock') {
providerType = envProvider as 'gemini' | 'openai' | 'anthropic' | 'mock';
}
} else if (process.env.GEMINI_API_KEY && process.env.GEMINI_API_KEY.trim()) {
providerType = 'gemini';
} else if (process.env.OPENAI_API_KEY && process.env.OPENAI_API_KEY.trim()) {
providerType = 'openai';
}| Variable | Purpose |
|---|---|
AI_PROVIDER |
Explicitly sets the provider (gemini, openai, anthropic, or mock) |
GEMINI_API_KEY |
Authentication for Google Gemini API |
OPENAI_API_KEY |
Authentication for OpenAI API |
ANTHROPIC_API_KEY |
Authentication for Anthropic SDK |
Sources: server/ai/provider.ts:25-65, README.md:121-131
To optimize performance within the serverless lifecycle, the project implements a caching layer. However, since Vercel serverless functions are ephemeral, the LRU Cache implementation (using lru-cache) primarily serves to optimize performance during a single execution or across warm starts.
- Memory Constraints: The README notes that the custom LRU cache lives in-memory, which means it may reset during "cold starts" in the Vercel environment.
-
Dependency Management: The project uses
Bunas a recommended runtime but maintains compatibility with Node.js 22 to align with Vercel's standard environments.
Sources: README.md:162-162, package.json:49-49, server/ai/provider.ts:256-271
The Vercel Serverless Deployment provides gitSdm with a scalable and cost-effective infrastructure for repository analysis. By leveraging @vercel/node and a modular routing system, the platform effectively bridges client-side React visualizations with intensive backend AI and GitHub API tasks, ensuring high availability and ease of deployment.
Relevant source files
The following files were used as context for generating this wiki page:
The gitSdm platform utilizes containerization and serverless infrastructure to provide a consistent, scalable environment for repository analysis and visualization. Docker is used to bundle the Vite-based frontend assets and the Node.js backend services into a single deployable unit, ensuring that the application behaves identically in development, staging, and production environments.
Google Cloud Run serves as the primary deployment target, offering a managed, auto-scaling environment that handles the execution of the containerized application. This infrastructure supports the application's reliance on external APIs, including the GitHub API and various AI providers (Google Gemini, OpenAI, and Anthropic), by securely managing environment variables and secrets.
Sources: README.md:148-154, README.md:183-195
The Docker implementation for gitSdm follows a pattern of bundling a small Node.js server to serve static files and handle API requests. The image builds the Vite application, stores the output in a dist/ directory, and utilizes a production-ready server entry point.
The container encapsulates the following components:
-
Static Assets: Pre-built Vite frontend files served from the
/distdirectory. - API Router: A Node.js Express server that handles backend logic, including GitHub ingestion and AI task orchestration.
-
Production Server: Managed via
server/prod-server.tsto coordinate static file serving and API routing.
The build process involves installing dependencies, transpiling TypeScript, and bundling the frontend.
# Build Docker image
docker build -t gitsdm .
# Run container with environment configuration
docker run -p 3000:3000 --env-file .env gitsdmSources: README.md:156-163, server/prod-server.ts:1-20
While the specific Dockerfile is used during build, its structure is typically reflected in the project's mock documentation and build scripts.
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 5173
CMD ["npm", "run", "dev"]Note: The production Docker image uses npm run build and serves via server/prod-server.ts rather than the dev server.
Sources: server/github/mock-data.ts:155-163, README.md:156-160
Google Cloud Run is the recommended platform for hosting gitSdm due to its support for stateless containers and integration with Google Cloud's ecosystem.
The deployment is performed directly from the source code, leveraging Google Cloud's build packs to containerize the application on the fly or using the pre-defined Docker configuration.
gcloud run deploy gitsdm \
--source . \
--region asia-southeast1 \
--allow-unauthenticated \
--env-vars-file .envSources: README.md:166-172
| Parameter | Value | Description |
|---|---|---|
| Region | asia-southeast1 |
Default deployment region for low latency. |
| Authentication | --allow-unauthenticated |
Permitting public access to the visualization tool. |
| Runtime | Node.js 22 | Optimized for Express and the AI SDKs. |
| Port Mapping | 3000 | The default internal port mapped to the service. |
Sources: README.md:168-171, README.md:204
When deployed via Docker or Cloud Run, the application operates in a hybrid mode where a single entry point manages both the user interface and the backend processing pipeline.
The following diagram illustrates how a request is handled within the deployed container:
flowchart TD
User[Developer Browser] -->|HTTP Request| Container[Cloud Run Container]
subgraph ContainerLogic [Internal Container Routing]
Container -->|URL Path /api/*| APIRouter[Express API Router]
Container -->|Root / Other Paths| StaticServer[Static File Server]
end
APIRouter -->|Analyze Repo| GitHub[GitHub API]
APIRouter -->|Generate Insights| AI[AI Provider SDK]
StaticServer -->|Serve Assets| Dist[dist/ Folder]
class User entry;
class Container service;
class APIRouter router;
class StaticServer util;
This diagram shows the dual-path routing logic where the production server distinguishes between frontend assets and backend API calls. Sources: server/prod-server.ts, server/ai/tasks/diagram.ts:32-40
Configuration in containerized environments is strictly managed via environment variables. This is critical for the AIProvider factory, which detects the available keys at runtime to determine which service (Gemini, OpenAI, or Anthropic) to instantiate.
| Variable | Requirement | Description |
|---|---|---|
GITHUB_TOKEN |
Recommended | Increases API rate limits for public repository analysis. |
AI_PROVIDER |
Optional | Specifies gemini, openai, anthropic, or mock. |
PORT |
Optional | The port the container listens on (defaults to 3000 in prod). |
Sources: README.md:126-145, server/ai/provider.ts:40-58
The deployment relies on a specific set of technologies to maintain the performance of the repository mapping and AI analysis features:
- Runtime: Node.js 22 (via Docker Alpine images).
- Deployment: Google Cloud Run for serverless execution.
- Package Management: Bun or pnpm for efficient dependency resolution during image builds.
- API Client: Octokit for GitHub communication.
Sources: README.md:191-209, CONTRIBUTING.md:15-18
Relevant source files
The following files were used as context for generating this wiki page:
Setting up the gitSdm environment allows developers to contribute to the graph-first repository analysis platform. The project utilizes a full-stack TypeScript architecture, requiring a Node.js or Bun runtime to manage a React frontend and an Express-based backend.
Local development supports both live API integrations (GitHub and various AI providers) and a robust Mock Mode for developers who wish to work offline or without API keys. This setup ensures that all core features, including interactive visualizations and AI-driven insights, can be tested locally.
Sources: CONTRIBUTING.md:1-10, README.md:1-20
Before starting the development server, ensure the following tools are installed:
- Node.js: Version ≥ 22.
-
Package Manager:
pnpm≥ 9 is recommended, thoughbun≥ 1.1 is supported for faster execution. - Git: Required for forking and cloning the repository.
Sources: CONTRIBUTING.md:12-14, README.md:65-70
The following sequence diagram outlines the initial steps to prepare the local environment:
sequenceDiagram
participant Dev as Developer
participant Git as GitHub
participant Local as Local Machine
Dev->>Git: Fork mbayue/gitSdm
Git-->>Dev: Fork created
Dev->>Local: git clone [fork-url]
Local->>Local: cd gitSdm
Dev->>Local: pnpm install
Local->>Local: cp .env.example .env
Sources: CONTRIBUTING.md:16-30, README.md:73-82
The application relies on environment variables defined in a .env file. These variables control API access and the behavior of the AI provider system.
The AI_PROVIDER variable determines which backend engine processes natural language tasks. If no provider is specified, the system defaults to mock.
| Variable | Values | Description |
|---|---|---|
AI_PROVIDER |
mock, gemini, openai, anthropic
|
The active AI service. |
GITHUB_TOKEN |
String | Personal Access Token to increase API rate limits. |
GEMINI_API_KEY |
String | Required if provider is gemini. |
OPENAI_API_KEY |
String | Required if provider is openai. |
ANTHROPIC_API_KEY |
String | Required if provider is anthropic. |
Sources: README.md:85-95, server/ai/provider.ts:25-50
The system uses an auto-detection mechanism to instantiate the correct provider based on available keys if AI_PROVIDER is not explicitly set.
flowchart TD
Start[Load .env] --> CheckExplicit{AI_PROVIDER set?}
CheckExplicit -- Yes --> ReturnExplicit[Use specified provider]
CheckExplicit -- No --> CheckGemini{GEMINI_API_KEY?}
CheckGemini -- Yes --> UseGemini[Use Gemini]
CheckGemini -- No --> CheckOpenAI{OPENAI_API_KEY?}
CheckOpenAI -- Yes --> UseOpenAI[Use OpenAI]
CheckOpenAI -- No --> CheckAnthropic{ANTHROPIC_API_KEY?}
CheckAnthropic -- Yes --> UseAnthropic[Use Anthropic]
CheckAnthropic -- No --> UseMock[Use Mock Provider]
Sources: server/ai/provider.ts:28-56
The project utilizes Vite for the frontend and Express for the backend. These can be run concurrently or separately.
-
Concurrent Mode:
pnpm devorbun devstarts both the Vite UI (port 5173) and the Express backend (port 3001). -
Separate Frontend:
pnpm dev:frontendruns only the Vite server. -
Separate Backend:
pnpm dev:backendruns only the Express server.
Sources: CONTRIBUTING.md:36-44, package.json:5-15
For developers without API access, gitSdm provides comprehensive mock data. When the owner of a repository is set to mock, the system fetches predefined structures and contents.
-
GitHub Data: Located in
server/github/mock-data.ts. It provides simulated file trees forgitsdmand a sampletodo-app. -
AI Responses: Handled via
mockFallbackfunctions in AI task files. For example,generateRepoRoastprovides specific "roast" text for mock repositories to test UI rendering without hitting an LLM.
Sources: server/github/mock-data.ts:5-50, server/ai/tasks/playground.ts:32-45
The project uses bun test for its test suite. Test files are co-located with source files to maintain modularity.
# Run all tests once
pnpm test
# Run tests in watch mode for TDD
pnpm test:watch
# Check linting rules
pnpm lintIf modifications are made to the directory structure or file exports, the codebase graph should be updated:
pnpm exec graphify update .Sources: CONTRIBUTING.md:47-66, package.json:11-14
Setting up the gitSdm local environment involves configuring a TypeScript-based stack with flexible AI provider support. By utilizing the provided pnpm or bun scripts and configuring the .env file, developers can toggle between live API analysis and a fully functional mock environment for rapid feature development and testing.
Sources: CONTRIBUTING.md:70-80, README.md:120-130
Relevant source files
The following files were used as context for generating this wiki page:
The testing strategy for gitSdm is built around high-velocity feedback and comprehensive coverage of its core repository analysis and AI-driven features. The project utilizes Bun as its primary test runner and framework, emphasizing co-location of test files with source code to ensure maintainability and clarity. The suite encompasses unit tests for parsers, integration tests for GitHub API interactions, and validation for semantic search utilities.
The testing architecture relies heavily on a robust mocking system to simulate GitHub repository structures and AI provider responses. This allows for reliable testing of the backend logic without incurring API costs or hitting rate limits during development.
Sources: README.md:164-168, CONTRIBUTING.md:46-55
The project utilizes the Bun runtime's native testing capabilities, which support TypeScript out of the box. Tests are identified by the .test.ts extension and are generally located alongside the modules they verify.
The following commands are defined in the project's configuration for managing the test lifecycle:
| Command | Description |
|---|---|
bun test |
Executes all test suites once. |
bun run test:coverage |
Runs tests and generates a code coverage report. |
bun run test:watch |
Enters watch mode, re-running tests on file changes. |
Sources: README.md:195-204, CONTRIBUTING.md:49-55
The project maintains over 25 test suites covering critical backend services. Key areas of focus include:
-
AI Services:
provider.test.ts,service.test.ts. -
GitHub Integration:
client.test.ts,fetch-tree.test.ts,parse-url.test.ts. -
Parsing & Analysis:
dependency-analyzer.test.ts,file-classifier.test.ts,manifest-parsers/index.test.ts. -
Search Engine:
chunker.test.ts,qa-engine.test.ts,vector-store.test.ts.
Sources: README.md:206-231
To facilitate isolated testing, gitSdm implements a dual-layer mocking strategy: module-level mocking using Bun's mock utility and a dedicated mock data provider.
The server/github/mock-data.ts file provides a synthetic environment for testing repository ingestion. It includes predefined file lists and contents for two primary scenarios: a "gitSdm" mock repo and a "todo-app" mock repo.
graph TD
subgraph TestExecution["Test Execution (fetch-tree.test.ts)"]
TC[Test Case] -->|resolveOctokit| M[Mocked Octokit]
TC -->|isMockRepo| MD[Mock Data Provider]
end
subgraph MockData["Mock Data Layer (mock-data.ts)"]
MD -->|returns| INFO[Repo Info]
MD -->|returns| TREE[Flat Tree Items]
MD -->|returns| CONT[File Contents]
end
M -->|Intercepts API| MD
This diagram illustrates how tests intercept GitHub API calls and redirect them to the local mock data provider.
Key verification points in server/github/mock-data.test.ts include:
- isMockRepo: Validates that repositories owned by "mock" are correctly identified.
-
fetchMockFileContents: Ensures the system returns specific content for files like
package.jsonor placeholders for source files. - fetchMockTimeline: Generates synthetic commit history for visualizing activity patterns.
Sources: server/github/mock-data.test.ts:13-75, server/github/mock-data.ts:5-10
AI task handlers (e.g., explain, roast, onboarding) implement a mockFallback function. When an AI provider is set to mock or fails, the system returns structured JSON data from these fallbacks, allowing UI components to be tested with realistic AI output without live API calls.
Sources: server/ai/tasks/playground.ts:31-42, server/ai/tasks/onboarding.ts:74-81
Tests in server/parser/manifest-parsers/index.test.ts verify the extraction of dependencies across multiple ecosystems. The parser registry is tested for its ability to route files to the correct specialized parser based on filename or regex patterns.
-
Ecosystems Covered: npm (
package.json), Go (go.mod), Python (requirements.txt,pyproject.toml), Rust (Cargo.toml), and Java (pom.xml). - Resilience: Tests verify that corrupted or invalid manifest files return empty arrays rather than throwing errors.
Sources: server/parser/manifest-parsers/index.test.ts:16-118
The file classifier is tested to ensure accurate categorization of nodes for the visualization graph.
| Category | Examples Verified in Tests |
|---|---|
| Entry |
src/index.ts, main.go, src/App.tsx
|
| Test |
*.test.tsx, *_spec.go, __tests__/*
|
| Config |
tsconfig.json, vite.config.ts, .env.*
|
| Doc |
README.md, LICENSE
|
| Asset |
assets/*.png, public/favicon.ico
|
Sources: server/parser/file-classifier.test.ts:7-38
The server/search/constants.test.ts file validates the foundational configuration for the semantic search engine, including:
-
Language Mapping: Ensures file extensions (like
.tsxor.py) map to correct highlight/AST languages. -
Cache Determinism: Verifies that
searchCacheKeyandindexCacheKeygenerate predictable strings for Vercel/LRU caching layers. -
Thresholds: Confirms
DEFAULT_MIN_SCOREandMAX_CHUNK_TOKENSare within valid operational ranges.
Sources: server/search/constants.test.ts:46-130
Integration tests for repository analysis simulate the full lifecycle from URL parsing to tree building.
sequenceDiagram
participant T as "fetch-tree.test.ts"
participant S as "fetch-tree.ts"
participant M as "mock-data.ts"
T->>S: fetchRepoInfo("mock-owner", "repo")
S->>M: isMockRepo("mock-owner")
M-->>S: true
S->>M: fetchMockRepoInfo(...)
M-->>S: Mock Repo Data (sha, stars, etc.)
S-->>T: RepoInfo Object
This sequence shows the redirection logic used during integration tests to avoid external network dependencies.
Sources: server/github/fetch-tree.test.ts:106-115
The gitSdm testing strategy is optimized for a serverless, AI-integrated environment. By combining strict unit testing of parsers with a comprehensive mocking layer for GitHub and AI providers, the project ensures that architectural insights and visualization components remain reliable. The use of Bun as a unified runner facilitates a fast, co-located testing workflow that supports continuous integration and development.