Skip to content

gitstq/CodeGraph-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

简体中文 | 繁體中文 | English


CodeGraph-Engine

轻量级本地代码语义图谱构建与AI上下文优化引擎

Python 3.8+ MIT License Zero Dependencies 6 Languages


简体中文

🎉 项目介绍

CodeGraph-Engine 是一款专为开发者打造的轻量级本地代码语义图谱构建与AI上下文优化引擎。它能够自动解析你的代码仓库,提取函数、类、模块之间的调用关系与依赖结构,构建出一张完整的代码语义图谱,并在此基础上提供智能搜索、上下文压缩、多格式导出等强大功能。

核心价值

在日常开发中,你是否经常遇到以下问题?

  • 接手一个陌生的大型项目,面对成千上万的文件,不知从何入手理解代码架构?
  • 使用AI辅助编程时,LLM的上下文窗口有限,无法一次性塞入整个项目的关键信息?
  • 排查跨模块的Bug,需要手动追踪函数调用链,效率低下且容易遗漏?
  • 代码Review时,难以快速把握变更涉及的影响范围和依赖关系?

CodeGraph-Engine 正是为了解决这些痛点而生。它通过构建代码语义图谱,让你能够以全局视角审视项目结构,精准定位代码实体,并为LLM提供最优的上下文信息,真正实现**"让AI理解你的代码"**。

自研差异化亮点

  • 零外部依赖:纯Python标准库实现,无需安装任何第三方包,即装即用,不会与你的项目依赖产生冲突
  • 本地化运行:所有计算均在本地完成,代码不会上传到任何服务器,保障代码安全与隐私
  • 混合搜索引擎:采用TF-IDF + BM25双算法融合,搜索精度远超单一算法方案
  • LLM上下文智能压缩:基于图谱结构的相关性排序,自动筛选最相关的代码片段,在有限的Token预算内最大化信息密度

✨ 核心特性

  • 🌐 多语言支持:原生支持 Python、JavaScript、TypeScript、Go、Rust、Java 六种主流编程语言,覆盖绝大多数开发场景
  • 🔍 TF-IDF + BM25 混合搜索引擎:融合两种经典信息检索算法的优势,实现精准的代码实体搜索,支持按名称、类型、内容等多维度检索
  • 🧠 LLM 上下文智能压缩:基于图谱拓扑结构进行相关性分析,自动为LLM筛选最相关的代码上下文,在有限Token预算内最大化信息密度,显著提升AI辅助编程的效果
  • 📊 交互式 TUI 仪表盘:基于 curses 库构建的终端交互界面,支持实时浏览图谱节点、查看关系网络、探索代码结构,无需离开终端
  • 📤 多格式导出:支持导出为 JSON(结构化数据)、Markdown(可读文档)、DOT(Graphviz可视化)三种格式,满足不同场景需求
  • 零依赖轻量架构:完全基于Python标准库实现(astrejsonmathcollectionscurses 等),安装包体积小巧,启动速度快
  • 🔒 完全本地化运行:所有解析、索引、搜索、压缩操作均在本地完成,代码不会离开你的机器,确保企业级代码安全
  • 📈 代码质量分析:自动计算函数复杂度、模块耦合度等指标,识别代码热点和潜在问题区域
  • 🎯 实体关系可视化:支持可视化任意代码实体的调用链和依赖关系图,快速理解代码间的关联

🚀 快速开始

环境要求

  • Python 3.8 或更高版本
  • 无需安装任何第三方依赖

安装

# 通过 pip 安装(推荐)
pip install codegraph-engine

# 或从源码安装
git clone https://github.com/gitstq/CodeGraph-Engine.git
cd CodeGraph-Engine
pip install .

一键运行

安装完成后,即可使用 codegraph 命令:

# 查看版本信息
codegraph --version

# 查看帮助信息
codegraph --help

快速体验

# 1. 扫描项目,构建语义图谱
codegraph scan /path/to/project

# 2. 搜索代码实体
codegraph search "UserService"

# 3. 为LLM压缩上下文
codegraph compress "authentication flow" --tokens 2000

# 4. 导出图谱
codegraph export json -o graph.json

# 5. 查看统计信息
codegraph stats

📖 详细使用指南

CLI 命令完整列表

命令 说明 示例
codegraph scan <path> 扫描代码仓库,构建语义图谱 codegraph scan ./my-project
codegraph search <query> 搜索代码实体 codegraph search "UserService"
codegraph compress <query> --tokens <n> 压缩代码上下文(为LLM优化) codegraph compress "auth" --tokens 2000
codegraph export <format> 导出图谱(json/markdown/dot) codegraph export json -o graph.json
codegraph stats 显示统计信息 codegraph stats --detail
codegraph tui 启动交互式仪表盘 codegraph tui --theme dark
codegraph viz <name> 可视化某个实体的关系 codegraph viz "UserService" -d 3

scan 命令详解

scan 是最核心的命令,用于扫描代码仓库并构建语义图谱。

# 基本用法:扫描当前目录
codegraph scan .

# 扫描指定路径并保存结果
codegraph scan /path/to/project -o result.json

# 扫描时排除特定目录
codegraph scan /path/to/project --exclude vendor tests

# 显示详细扫描信息
codegraph scan /path/to/project --verbose

参数说明:

参数 说明 默认值
path 要扫描的目录路径 .(当前目录)
-o, --output 输出文件路径(JSON格式) 无(不保存文件)
--exclude 额外排除的目录列表
--verbose 显示详细输出 false

search 命令详解

基于TF-IDF + BM25混合算法的代码实体搜索引擎。

# 基本搜索
codegraph search "UserService"

# 限定搜索结果数量
codegraph search "auth" -n 10

# 按节点类型过滤(只搜索函数)
codegraph search "login" -t Function

# 指定搜索路径
codegraph search "Database" -p /path/to/project

参数说明:

参数 说明 默认值
query 搜索关键词 必填
-p, --path 代码目录路径 .
-n, --limit 结果数量限制 20
-t, --type 限定节点类型 无(全部类型)

compress 命令详解

为LLM优化代码上下文,自动筛选最相关的代码片段。

# 基本用法:压缩认证相关上下文
codegraph compress "authentication flow"

# 限制Token数量
codegraph compress "user registration" --tokens 2000

# 保存压缩结果到文件
codegraph compress "payment processing" --tokens 4000 -o context.txt

参数说明:

参数 说明 默认值
query 查询关键词 必填
-p, --path 代码目录路径 .
--tokens 最大Token数 4096
-o, --output 输出文件路径 无(输出到终端)

export 命令详解

将构建好的语义图谱导出为不同格式。

# 导出为JSON格式
codegraph export json -o graph.json

# 导出为Markdown格式
codegraph export markdown -o graph.md

# 导出为DOT格式(可用Graphviz渲染)
codegraph export dot -o graph.dot

stats 命令详解

查看代码仓库的统计分析信息。

# 基本统计
codegraph stats

# 显示详细信息(含代码热点和耦合度分析)
codegraph stats --detail

# 指定项目路径
codegraph stats -p /path/to/project

tui 交互式仪表盘

基于终端的交互式图谱浏览界面。

# 启动仪表盘(暗色主题)
codegraph tui

# 使用亮色主题
codegraph tui --theme light

# 指定项目路径
codegraph tui -p /path/to/project

viz 实体关系可视化

可视化某个代码实体的调用链和依赖关系。

# 可视化UserService的关系(默认深度2)
codegraph viz "UserService"

# 指定遍历深度
codegraph viz "UserService" -d 3

# 指定项目路径
codegraph viz "Database" -p /path/to/project

配置文件

你可以在项目根目录创建 .codegraph.json 配置文件来自定义行为:

{
  "root_path": ".",
  "exclude_dirs": [".git", "node_modules", "vendor"],
  "exclude_files": [],
  "max_file_size": 1048576,
  "max_context_tokens": 4096,
  "output_dir": ".codegraph",
  "verbose": false
}

典型使用场景

场景一:新项目快速上手

# 扫描项目,了解整体结构
codegraph scan /path/to/new-project --verbose

# 查看统计信息,了解代码规模
codegraph stats --detail

# 启动TUI仪表盘,交互式浏览
codegraph tui

场景二:AI辅助编程增强

# 为特定功能压缩上下文,喂给LLM
codegraph compress "user authentication" --tokens 2000 -o llm_context.txt

# 将压缩后的上下文粘贴到AI对话中,获得更精准的代码建议

场景三:代码Review与架构分析

# 导出图谱,生成架构文档
codegraph export markdown -o architecture.md

# 查看模块间耦合度
codegraph stats --detail

# 可视化核心模块的依赖关系
codegraph viz "CoreModule" -d 3

💡 设计思路与迭代规划

设计理念

CodeGraph-Engine 的设计遵循以下核心理念:

  1. 极简主义:零外部依赖,一个Python环境即可运行,降低使用门槛
  2. 安全优先:完全本地化运行,代码不离开用户机器,适合处理企业级私有代码
  3. 实用导向:每个功能都围绕实际开发痛点设计,不做华而不实的功能堆砌
  4. 可扩展性:插件化的语言解析器架构,方便后续添加更多语言支持

技术选型原因

技术选择 原因
Python 标准库 ast 原生支持Python AST解析,零依赖
TF-IDF + BM25 两种经典算法互补,TF-IDF擅长关键词匹配,BM25擅长长文本相关性排序
curses 终端UI Python内置库,跨平台支持,适合开发者终端工作流
JSON 配置 通用、易读、工具链支持好

后续迭代计划

  • 增量扫描:支持基于Git diff的增量图谱更新,提升大型项目的扫描效率
  • 更多语言支持:计划添加 C/C++、Ruby、PHP 等语言解析器
  • Web UI:提供基于浏览器的可视化界面,支持更丰富的交互操作
  • LLM集成:直接对接OpenAI、Claude等API,实现对话式代码查询
  • 插件系统:支持自定义分析规则和导出格式的插件机制
  • CI/CD集成:提供GitHub Action,在PR中自动展示代码变更影响范围

📦 打包与部署指南

从 PyPI 安装(推荐)

pip install codegraph-engine

从源码安装

git clone https://github.com/gitstq/CodeGraph-Engine.git
cd CodeGraph-Engine
pip install .

开发模式安装

如果你希望参与开发,建议使用可编辑模式安装:

git clone https://github.com/gitstq/CodeGraph-Engine.git
cd CodeGraph-Engine
pip install -e .

验证安装

codegraph --version
# 输出: CodeGraph-Lite v1.0.0

🤝 贡献指南

我们欢迎并感谢所有形式的贡献!无论是提交Bug报告、功能建议,还是直接提交代码PR。

提交 Issue

  1. 在提交Issue之前,请先搜索已有的Issue列表,避免重复提交
  2. Bug报告请包含:复现步骤、预期行为、实际行为、运行环境信息
  3. 功能建议请详细描述使用场景和期望的行为

提交 Pull Request

  1. Fork 本仓库
  2. 创建特性分支:git checkout -b feature/your-feature-name
  3. 编写代码并添加相应的测试用例
  4. 确保所有测试通过:python -m pytest tests/
  5. 提交变更:git commit -m "feat: 描述你的变更"
  6. 推送分支:git push origin feature/your-feature-name
  7. 提交 Pull Request

Commit 规范

请遵循 Conventional Commits 规范:

  • feat: 新功能
  • fix: 修复Bug
  • docs: 文档更新
  • refactor: 代码重构
  • test: 测试相关
  • chore: 构建/工具变更

📄 开源协议说明

本项目基于 MIT License 开源,你可以自由地使用、复制、修改、合并、发布、分发、再授权和/或销售本软件的副本。

唯一的要求是:在所有副本或重要部分中包含版权声明和许可声明。

详见 LICENSE 文件。


繁體中文

🎉 專案介紹

CodeGraph-Engine 是一款專為開發者打造的輕量級本地程式碼語義圖譜建構與AI上下文最佳化引擎。它能夠自動解析你的程式碼倉庫,提取函數、類別、模組之間的呼叫關係與依賴結構,建構出一张完整的程式碼語義圖譜,並在此基礎上提供智慧搜尋、上下文壓縮、多格式匯出等強大功能。

核心價值

在日常開發中,你是否經常遇到以下問題?

  • 接手一個陌生的大型專案,面對成千上萬的檔案,不知從何入手理解程式碼架構?
  • 使用AI輔助程式設計時,LLM的上下文視窗有限,無法一次性塞入整個專案的關鍵資訊?
  • 排查跨模組的Bug,需要手動追蹤函數呼叫鏈,效率低下且容易遺漏?
  • 程式碼Review時,難以快速把握變更涉及的影響範圍和依賴關係?

CodeGraph-Engine 正是為了解決這些痛點而生。它透過建構程式碼語義圖譜,讓你能夠以全域視角審視專案結構,精準定位程式碼實體,並為LLM提供最佳的上下文資訊,真正實現**「讓AI理解你的程式碼」**。

自研差異化亮點

  • 零外部依賴:純Python標準函式庫實作,無需安裝任何第三方套件,即裝即用,不會與你的專案依賴產生衝突
  • 本地化執行:所有運算均在本地完成,程式碼不會上傳到任何伺服器,保障程式碼安全與隱私
  • 混合搜尋引擎:採用TF-IDF + BM25雙演算法融合,搜尋精度遠超單一演算法方案
  • LLM上下文智慧壓縮:基於圖譜結構的相關性排序,自動篩選最相關的程式碼片段,在有限的Token預算內最大化資訊密度

✨ 核心特性

  • 🌐 多語言支援:原生支援 Python、JavaScript、TypeScript、Go、Rust、Java 六種主流程式設計語言,涵蓋絕大多數開發場景
  • 🔍 TF-IDF + BM25 混合搜尋引擎:融合兩種經典資訊檢索演算法的優勢,實現精準的程式碼實體搜尋,支援按名稱、類型、內容等多維度檢索
  • 🧠 LLM 上下文智慧壓縮:基於圖譜拓撲結構進行相關性分析,自動為LLM篩選最相關的程式碼上下文,在有限Token預算內最大化資訊密度,顯著提升AI輔助程式設計的效果
  • 📊 互動式 TUI 儀表板:基於 curses 函式庫建構的終端互動介面,支援即時瀏覽圖譜節點、檢視關係網路、探索程式碼結構,無需離開終端
  • 📤 多格式匯出:支援匯出為 JSON(結構化資料)、Markdown(可讀文件)、DOT(Graphviz視覺化)三種格式,滿足不同場景需求
  • 零依賴輕量架構:完全基於Python標準函式庫實作(astrejsonmathcollectionscurses 等),安裝包體積小巧,啟動速度快
  • 🔒 完全本地化執行:所有解析、索引、搜尋、壓縮操作均在本地完成,程式碼不會離開你的機器,確保企業級程式碼安全
  • 📈 程式碼品質分析:自動計算函數複雜度、模組耦合度等指標,識別程式碼熱點和潛在問題區域
  • 🎯 實體關係視覺化:支援視覺化任意程式碼實體的呼叫鏈和依賴關係圖,快速理解程式碼間的關聯

🚀 快速開始

環境需求

  • Python 3.8 或更高版本
  • 無需安裝任何第三方依賴

安裝

# 透過 pip 安裝(推薦)
pip install codegraph-engine

# 或從原始碼安裝
git clone https://github.com/gitstq/CodeGraph-Engine.git
cd CodeGraph-Engine
pip install .

一鍵執行

安裝完成後,即可使用 codegraph 指令:

# 查看版本資訊
codegraph --version

# 查看說明資訊
codegraph --help

快速體驗

# 1. 掃描專案,建構語義圖譜
codegraph scan /path/to/project

# 2. 搜尋程式碼實體
codegraph search "UserService"

# 3. 為LLM壓縮上下文
codegraph compress "authentication flow" --tokens 2000

# 4. 匯出圖譜
codegraph export json -o graph.json

# 5. 查看統計資訊
codegraph stats

📖 詳細使用指南

CLI 指令完整列表

指令 說明 範例
codegraph scan <path> 掃描程式碼倉庫,建構語義圖譜 codegraph scan ./my-project
codegraph search <query> 搜尋程式碼實體 codegraph search "UserService"
codegraph compress <query> --tokens <n> 壓縮程式碼上下文(為LLM最佳化) codegraph compress "auth" --tokens 2000
codegraph export <format> 匯出圖譜(json/markdown/dot) codegraph export json -o graph.json
codegraph stats 顯示統計資訊 codegraph stats --detail
codegraph tui 啟動互動式儀表板 codegraph tui --theme dark
codegraph viz <name> 視覺化某個實體的關係 codegraph viz "UserService" -d 3

scan 指令詳解

scan 是最核心的指令,用於掃描程式碼倉庫並建構語義圖譜。

# 基本用法:掃描目前目錄
codegraph scan .

# 掃描指定路徑並儲存結果
codegraph scan /path/to/project -o result.json

# 掃描時排除特定目錄
codegraph scan /path/to/project --exclude vendor tests

# 顯示詳細掃描資訊
codegraph scan /path/to/project --verbose

參數說明:

參數 說明 預設值
path 要掃描的目錄路徑 .(目前目錄)
-o, --output 輸出檔案路徑(JSON格式) 無(不儲存檔案)
--exclude 額外排除的目錄列表
--verbose 顯示詳細輸出 false

search 指令詳解

基於TF-IDF + BM25混合演算法的程式碼實體搜尋引擎。

# 基本搜尋
codegraph search "UserService"

# 限制搜尋結果數量
codegraph search "auth" -n 10

# 按節點類型過濾(只搜尋函數)
codegraph search "login" -t Function

# 指定搜尋路徑
codegraph search "Database" -p /path/to/project

參數說明:

參數 說明 預設值
query 搜尋關鍵字 必填
-p, --path 程式碼目錄路徑 .
-n, --limit 結果數量限制 20
-t, --type 限定節點類型 無(全部類型)

compress 指令詳解

為LLM最佳化程式碼上下文,自動篩選最相關的程式碼片段。

# 基本用法:壓縮認證相關上下文
codegraph compress "authentication flow"

# 限制Token數量
codegraph compress "user registration" --tokens 2000

# 儲存壓縮結果到檔案
codegraph compress "payment processing" --tokens 4000 -o context.txt

參數說明:

參數 說明 預設值
query 查詢關鍵字 必填
-p, --path 程式碼目錄路徑 .
--tokens 最大Token數 4096
-o, --output 輸出檔案路徑 無(輸出到終端)

export 指令詳解

將建構好的語義圖譜匯出為不同格式。

# 匯出為JSON格式
codegraph export json -o graph.json

# 匯出為Markdown格式
codegraph export markdown -o graph.md

# 匯出為DOT格式(可用Graphviz渲染)
codegraph export dot -o graph.dot

stats 指令詳解

查看程式碼倉庫的統計分析資訊。

# 基本統計
codegraph stats

# 顯示詳細資訊(含程式碼熱點和耦合度分析)
codegraph stats --detail

# 指定專案路徑
codegraph stats -p /path/to/project

tui 互動式儀表板

基於終端的互動式圖譜瀏覽介面。

# 啟動儀表板(暗色主題)
codegraph tui

# 使用亮色主題
codegraph tui --theme light

# 指定專案路徑
codegraph tui -p /path/to/project

viz 實體關係視覺化

視覺化某個程式碼實體的呼叫鏈和依賴關係。

# 視覺化UserService的關係(預設深度2)
codegraph viz "UserService"

# 指定遍歷深度
codegraph viz "UserService" -d 3

# 指定專案路徑
codegraph viz "Database" -p /path/to/project

設定檔

你可以在專案根目錄建立 .codegraph.json 設定檔來自訂行為:

{
  "root_path": ".",
  "exclude_dirs": [".git", "node_modules", "vendor"],
  "exclude_files": [],
  "max_file_size": 1048576,
  "max_context_tokens": 4096,
  "output_dir": ".codegraph",
  "verbose": false
}

典型使用場景

場景一:新專案快速上手

# 掃描專案,了解整體結構
codegraph scan /path/to/new-project --verbose

# 查看統計資訊,了解程式碼規模
codegraph stats --detail

# 啟動TUI儀表板,互動式瀏覽
codegraph tui

場景二:AI輔助程式設計增強

# 為特定功能壓縮上下文,餵給LLM
codegraph compress "user authentication" --tokens 2000 -o llm_context.txt

# 將壓縮後的上下文貼到AI對話中,獲得更精準的程式碼建議

場景三:程式碼Review與架構分析

# 匯出圖譜,產生架構文件
codegraph export markdown -o architecture.md

# 查看模組間耦合度
codegraph stats --detail

# 視覺化核心模組的依賴關係
codegraph viz "CoreModule" -d 3

💡 設計思路與迭代規劃

設計理念

CodeGraph-Engine 的設計遵循以下核心理念:

  1. 極簡主義:零外部依賴,一個Python環境即可執行,降低使用門檻
  2. 安全優先:完全本地化執行,程式碼不離開使用者機器,適合處理企業級私有程式碼
  3. 實用導向:每個功能都圍繞實際開發痛點設計,不做華而不實的功能堆砌
  4. 可擴展性:外掛化的語言解析器架構,方便後續新增更多語言支援

技術選型原因

技術選擇 原因
Python 標準函式庫 ast 原生支援Python AST解析,零依賴
TF-IDF + BM25 兩種經典演算法互補,TF-IDF擅長關鍵字匹配,BM25擅長長文字相關性排序
curses 終端UI Python內建函式庫,跨平台支援,適合開發者終端工作流
JSON 設定 通用、易讀、工具鏈支援好

後續迭代計畫

  • 增量掃描:支援基於Git diff的增量圖譜更新,提升大型專案的掃描效率
  • 更多語言支援:計畫新增 C/C++、Ruby、PHP 等語言解析器
  • Web UI:提供基於瀏覽器的視覺化介面,支援更豐富的互動操作
  • LLM整合:直接對接OpenAI、Claude等API,實現對話式程式碼查詢
  • 外掛系統:支援自訂分析規則和匯出格式的外掛機制
  • CI/CD整合:提供GitHub Action,在PR中自動展示程式碼變更影響範圍

📦 打包與部署指南

從 PyPI 安裝(推薦)

pip install codegraph-engine

從原始碼安裝

git clone https://github.com/gitstq/CodeGraph-Engine.git
cd CodeGraph-Engine
pip install .

開發模式安裝

如果你希望參與開發,建議使用可編輯模式安裝:

git clone https://github.com/gitstq/CodeGraph-Engine.git
cd CodeGraph-Engine
pip install -e .

驗證安裝

codegraph --version
# 輸出: CodeGraph-Lite v1.0.0

🤝 貢獻指南

我們歡迎並感謝所有形式的貢獻!無論是提交Bug回報、功能建議,還是直接提交程式碼PR。

提交 Issue

  1. 在提交Issue之前,請先搜尋已有的Issue列表,避免重複提交
  2. Bug回報請包含:重現步驟、預期行為、實際行為、執行環境資訊
  3. 功能建議請詳細描述使用場景和期望的行為

提交 Pull Request

  1. Fork 本倉庫
  2. 建立特性分支:git checkout -b feature/your-feature-name
  3. 撰寫程式碼並新增相應的測試用例
  4. 確保所有測試通過:python -m pytest tests/
  5. 提交變更:git commit -m "feat: 描述你的變更"
  6. 推送分支:git push origin feature/your-feature-name
  7. 提交 Pull Request

Commit 規範

請遵循 Conventional Commits 規範:

  • feat: 新功能
  • fix: 修復Bug
  • docs: 文件更新
  • refactor: 程式碼重構
  • test: 測試相關
  • chore: 建構/工具變更

📄 開源協議說明

本專案基於 MIT License 開源,你可以自由地使用、複製、修改、合併、發佈、分發、再授權和/或銷售本軟體的副本。

唯一的要求是:在所有副本或重要部分中包含版權聲明和許可聲明。

詳見 LICENSE 檔案。


English

🎉 About the Project

CodeGraph-Engine is a lightweight, locally-run code semantic graph builder and AI context optimization engine designed for developers. It automatically parses your codebase, extracts call relationships and dependency structures between functions, classes, and modules, and constructs a comprehensive code semantic graph. On top of this foundation, it provides powerful features such as intelligent search, context compression, and multi-format export.

Core Value

Have you ever run into these challenges in your day-to-day development?

  • Onboarding onto an unfamiliar large project with thousands of files, unsure where to start understanding the architecture?
  • Using AI-assisted coding tools where the LLM's context window is too limited to fit all the critical information from your project?
  • Debugging cross-module issues by manually tracing function call chains, which is slow and error-prone?
  • Reviewing code changes where it is difficult to quickly grasp the blast radius and dependency implications?

CodeGraph-Engine was built to solve exactly these pain points. By constructing a code semantic graph, it gives you a bird's-eye view of your project structure, enables precise code entity lookup, and provides LLMs with optimally curated context -- truly delivering on the promise of "making AI understand your code."

What Sets Us Apart

  • Zero external dependencies: Built entirely with the Python standard library. No third-party packages to install, no dependency conflicts with your project.
  • Fully local execution: All computation happens on your machine. Your code never leaves your computer, ensuring enterprise-grade security and privacy.
  • Hybrid search engine: Combines TF-IDF and BM25 algorithms for significantly better search accuracy than either algorithm alone.
  • Intelligent LLM context compression: Leverages graph topology for relevance ranking, automatically selecting the most relevant code snippets to maximize information density within a limited token budget.

✨ Key Features

  • 🌐 Multi-language support: Native support for Python, JavaScript, TypeScript, Go, Rust, and Java -- covering the vast majority of development scenarios.
  • 🔍 TF-IDF + BM25 hybrid search engine: Merges the strengths of two classic information retrieval algorithms for precise code entity search, with multi-dimensional filtering by name, type, and content.
  • 🧠 Intelligent LLM context compression: Performs relevance analysis based on graph topology, automatically curating the most relevant code context for LLMs. Maximizes information density within token budgets, dramatically improving AI-assisted coding outcomes.
  • 📊 Interactive TUI dashboard: A terminal-based interface built with curses for real-time graph browsing, relationship exploration, and code structure navigation -- all without leaving your terminal.
  • 📤 Multi-format export: Export to JSON (structured data), Markdown (readable documentation), or DOT (Graphviz visualization) to suit different workflows.
  • Zero-dependency lightweight architecture: Entirely built on Python standard libraries (ast, re, json, math, collections, curses, etc.) for a small footprint and fast startup.
  • 🔒 Fully local execution: All parsing, indexing, searching, and compression happens locally. Your code never leaves your machine, ensuring enterprise-grade code security.
  • 📈 Code quality analysis: Automatically computes function complexity, module coupling, and other metrics to identify code hotspots and potential problem areas.
  • 🎯 Entity relationship visualization: Visualize call chains and dependency graphs for any code entity, helping you quickly understand how different parts of your codebase connect.

🚀 Quick Start

Prerequisites

  • Python 3.8 or later
  • No third-party dependencies required

Installation

# Install via pip (recommended)
pip install codegraph-engine

# Or install from source
git clone https://github.com/gitstq/CodeGraph-Engine.git
cd CodeGraph-Engine
pip install .

Up and Running

Once installed, the codegraph command is available:

# Check version
codegraph --version

# View help
codegraph --help

Quick Demo

# 1. Scan a project and build a semantic graph
codegraph scan /path/to/project

# 2. Search for code entities
codegraph search "UserService"

# 3. Compress context for LLM consumption
codegraph compress "authentication flow" --tokens 2000

# 4. Export the graph
codegraph export json -o graph.json

# 5. View statistics
codegraph stats

📖 Detailed Usage Guide

Complete CLI Reference

Command Description Example
codegraph scan <path> Scan a codebase and build a semantic graph codegraph scan ./my-project
codegraph search <query> Search for code entities codegraph search "UserService"
codegraph compress <query> --tokens <n> Compress code context (optimized for LLMs) codegraph compress "auth" --tokens 2000
codegraph export <format> Export the graph (json/markdown/dot) codegraph export json -o graph.json
codegraph stats Display statistics codegraph stats --detail
codegraph tui Launch the interactive dashboard codegraph tui --theme dark
codegraph viz <name> Visualize an entity's relationships codegraph viz "UserService" -d 3

scan Command

The scan command is the core command for scanning a codebase and building a semantic graph.

# Basic usage: scan the current directory
codegraph scan .

# Scan a specific path and save the result
codegraph scan /path/to/project -o result.json

# Exclude specific directories during scanning
codegraph scan /path/to/project --exclude vendor tests

# Show verbose output
codegraph scan /path/to/project --verbose

Parameter reference:

Parameter Description Default
path Directory path to scan . (current directory)
-o, --output Output file path (JSON format) None (no file saved)
--exclude Additional directories to exclude None
--verbose Show detailed output false

search Command

A code entity search engine powered by the TF-IDF + BM25 hybrid algorithm.

# Basic search
codegraph search "UserService"

# Limit the number of results
codegraph search "auth" -n 10

# Filter by node type (functions only)
codegraph search "login" -t Function

# Specify the search path
codegraph search "Database" -p /path/to/project

Parameter reference:

Parameter Description Default
query Search keyword Required
-p, --path Code directory path .
-n, --limit Maximum number of results 20
-t, --type Filter by node type None (all types)

compress Command

Optimizes code context for LLMs by automatically selecting the most relevant code snippets.

# Basic usage: compress authentication-related context
codegraph compress "authentication flow"

# Limit the token count
codegraph compress "user registration" --tokens 2000

# Save compressed output to a file
codegraph compress "payment processing" --tokens 4000 -o context.txt

Parameter reference:

Parameter Description Default
query Search keyword Required
-p, --path Code directory path .
--tokens Maximum token count 4096
-o, --output Output file path None (stdout)

export Command

Export the constructed semantic graph in various formats.

# Export as JSON
codegraph export json -o graph.json

# Export as Markdown
codegraph export markdown -o graph.md

# Export as DOT (renderable with Graphviz)
codegraph export dot -o graph.dot

stats Command

View statistical analysis of your codebase.

# Basic statistics
codegraph stats

# Show detailed info (including hotspots and coupling analysis)
codegraph stats --detail

# Specify the project path
codegraph stats -p /path/to/project

tui Interactive Dashboard

A terminal-based interactive graph browsing interface.

# Launch the dashboard (dark theme)
codegraph tui

# Use the light theme
codegraph tui --theme light

# Specify the project path
codegraph tui -p /path/to/project

viz Entity Relationship Visualization

Visualize the call chain and dependency graph for any code entity.

# Visualize UserService relationships (default depth: 2)
codegraph viz "UserService"

# Specify traversal depth
codegraph viz "UserService" -d 3

# Specify the project path
codegraph viz "Database" -p /path/to/project

Configuration File

You can create a .codegraph.json configuration file in your project root to customize behavior:

{
  "root_path": ".",
  "exclude_dirs": [".git", "node_modules", "vendor"],
  "exclude_files": [],
  "max_file_size": 1048576,
  "max_context_tokens": 4096,
  "output_dir": ".codegraph",
  "verbose": false
}

Typical Use Cases

Use Case 1: Rapid Onboarding to a New Project

# Scan the project to understand the overall structure
codegraph scan /path/to/new-project --verbose

# View statistics to gauge codebase size
codegraph stats --detail

# Launch the TUI dashboard for interactive exploration
codegraph tui

Use Case 2: Enhancing AI-Assisted Coding

# Compress context for a specific feature, then feed it to an LLM
codegraph compress "user authentication" --tokens 2000 -o llm_context.txt

# Paste the compressed context into your AI conversation for more accurate suggestions

Use Case 3: Code Review and Architecture Analysis

# Export the graph to generate architecture documentation
codegraph export markdown -o architecture.md

# Check inter-module coupling
codegraph stats --detail

# Visualize dependencies of a core module
codegraph viz "CoreModule" -d 3

💡 Design Philosophy & Roadmap

Design Principles

CodeGraph-Engine is guided by the following core principles:

  1. Minimalism: Zero external dependencies. A single Python environment is all you need, keeping the barrier to entry as low as possible.
  2. Security first: Fully local execution. Your code never leaves your machine, making it safe for enterprise proprietary codebases.
  3. Pragmatism: Every feature is designed around real developer pain points. No feature bloat for the sake of it.
  4. Extensibility: A pluggable language parser architecture makes it straightforward to add support for additional languages in the future.

Technical Choices

Choice Rationale
Python standard library ast Native Python AST parsing with zero dependencies
TF-IDF + BM25 Two complementary algorithms: TF-IDF excels at keyword matching, BM25 excels at long-text relevance ranking
curses terminal UI Built into Python, cross-platform, fits naturally into developer terminal workflows
JSON configuration Universal, human-readable, excellent tooling support

Roadmap

  • Incremental scanning: Git diff-based incremental graph updates for faster scans on large projects
  • Additional language support: Planned parsers for C/C++, Ruby, PHP, and more
  • Web UI: A browser-based visualization interface with richer interactive capabilities
  • LLM integration: Direct API integration with OpenAI, Claude, and others for conversational code querying
  • Plugin system: A plugin mechanism for custom analysis rules and export formats
  • CI/CD integration: A GitHub Action that automatically displays the impact scope of code changes in PRs

📦 Packaging & Deployment

Install from PyPI (Recommended)

pip install codegraph-engine

Install from Source

git clone https://github.com/gitstq/CodeGraph-Engine.git
cd CodeGraph-Engine
pip install .

Development Mode Installation

If you want to contribute to the project, install in editable mode:

git clone https://github.com/gitstq/CodeGraph-Engine.git
cd CodeGraph-Engine
pip install -e .

Verify Installation

codegraph --version
# Output: CodeGraph-Lite v1.0.0

🤝 Contributing

We welcome and appreciate contributions of all kinds -- whether that is filing bug reports, suggesting features, or submitting pull requests.

Filing Issues

  1. Before opening an issue, please search the existing issue list to avoid duplicates.
  2. Bug reports should include: steps to reproduce, expected behavior, actual behavior, and environment details.
  3. Feature requests should describe the use case and desired behavior in detail.

Submitting Pull Requests

  1. Fork this repository
  2. Create a feature branch: git checkout -b feature/your-feature-name
  3. Write your code and add corresponding test cases
  4. Ensure all tests pass: python -m pytest tests/
  5. Commit your changes: git commit -m "feat: describe your changes"
  6. Push the branch: git push origin feature/your-feature-name
  7. Open a Pull Request

Commit Convention

Please follow the Conventional Commits specification:

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation update
  • refactor: Code refactoring
  • test: Test-related changes
  • chore: Build/tooling changes

📄 License

This project is released under the MIT License. You are free to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of this software.

The only requirement is that the copyright notice and license notice are included in all copies or substantial portions of the software.

See the LICENSE file for details.


   ____                  _       ____  _       _ _        ____            _           
  / ___| _ __  _ __ ___ (_) __ _| __ )(_) __ _(_| |_ ___ |  _ \ _____  _| |_ _ __ ___ 
  \___ \| '_ \| '__/ _ \| |/ _` |  _ \| |/ _` | | __/ _ \| |_) / _ \ \/ / __| '__/ _ \
   ___) | |_) | | | (_) | | (_| | |_) | | (_| | | || (_) |  _ <  __/>  <| |_| | | (_) |
  |____/| .__/|_|  \___// |\__,_|____/|_|\__, |_|\__\___/|_| \_\___/_/\_\\__|_|  \___/
        |_|            |__/               |___/                                     

About

🕸️ Lightweight Local Code Semantic Graph Builder & AI Context Optimization Engine | 轻量级本地代码语义图谱构建与AI上下文优化引擎 - Zero Dependencies, 6 Languages, TF-IDF+BM25 Search, LLM Context Compression, TUI Dashboard

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages