DocTreeAI is a Rust-based command-line tool that validates README.md files against your codebase using hierarchical tree-based summarization with local Language Models (LLMs). The tool intelligently scans your codebase, generates summaries for files and directories in a bottom-up fashion, and validates that your README accurately reflects the current state of your code by suggesting updates when documentation becomes outdated.
- 🌳 Hierarchical Summarization: Uses tree-based analysis starting from individual files up to the project root
- 📦 Cache System: Efficient SHA-256 based caching to avoid redundant API calls
- 🔄 Smart README Validation: Validates README content against current codebase and suggests updates when needed
- 🚫 .gitignore Integration: Respects your project's ignore patterns
- 🔌 Local LLM Support: Works with any OpenAI-compatible local model server
- ⚡ Fast Performance: Concurrent processing and intelligent caching for speed
- 📊 Progress Tracking: Detailed logging and cache statistics
- Rust (latest stable edition)
- A running local LLM server compatible with OpenAI API (we strongly recommend OpenAI's GPT-OSS-20B model)
git clone <repository-url>
cd doctreeai
cargo build --release
The binary will be available at target/release/doctreeai
.
DocTreeAI uses environment variables for configuration. We highly recommend using OpenAI's GPT-OSS-20B model for optimal documentation generation:
# Required Configuration
export OPENAI_API_BASE="http://localhost:11434/v1" # Your LLM endpoint (required)
export OPENAI_MODEL_NAME="gpt-oss-20b" # Model name (required)
# Optional Configuration
export OPENAI_API_KEY="ollama" # API key (defaults to "ollama")
export DOCTREEAI_CACHE_DIR=".doctreeai_cache" # Cache directory (defaults to ".doctreeai_cache")
export DOCTREEAI_LOG_LEVEL="info" # Logging level (defaults to "info")
Note: Both OPENAI_API_BASE
and OPENAI_MODEL_NAME
are required. The tool will not use default values for these settings to ensure you explicitly configure your LLM endpoint and model.
We strongly recommend OpenAI's GPT-OSS-20B model for DocTreeAI because:
- 🧠 Superior Code Analysis: Excels at understanding and explaining code across all programming languages
- 📝 Documentation Excellence: Specifically optimized for generating clear, comprehensive technical documentation
- 🔧 Advanced Reasoning: Provides full chain-of-thought reasoning for better documentation quality
- ⚡ Efficient Performance: Only 3.6B active parameters per token, runs smoothly on 16GB consumer GPUs
- 🛠 Tool Integration: Native support for structured outputs and function calling
- 🎯 Cost-Effective: Optimized for local deployment with minimal resource requirements
- 📊 Proven Results: Matches or exceeds larger models on coding and technical analysis benchmarks
- 🔓 Open Source: Available under Apache 2.0 license for commercial and personal use
# Initialize DocTreeAI in a project
doctreeai init
# Validate README and suggest updates
doctreeai run
# Force regeneration (ignore cache)
doctreeai run --force
# Dry run (preview without changes)
doctreeai run --dry-run
# Show project and cache information
doctreeai info
# Test LLM connection
doctreeai test
# Clean cache
doctreeai clean
# Enable verbose logging
doctreeai -v run
- Initialize: Run
doctreeai init
to set up the cache and update .gitignore - Configure: Set your environment variables for the local LLM
- Validate: Run
doctreeai run
to validate your README.md and get update suggestions - Iterate: The tool will use cached summaries for unchanged files on subsequent runs
DocTreeAI performs a bottom-up analysis of your codebase:
- File Level: Each source code file is analyzed and summarized
- Directory Level: Directory summaries are created from child summaries
- Project Level: The root summary becomes your project overview
DocTreeAI uses a directory-mirrored cache structure for optimal performance:
- File-Level Caching: Each source file gets a corresponding
.summary
file in the cache - Directory Summaries: Directories have
.dir_summary
files containing their aggregated summaries - Structure Mirroring: Cache directory structure exactly matches your codebase structure
- SHA-256 Hashing: Files are hashed to detect changes and invalidate specific cache entries
- Incremental Updates: Only modified files trigger new LLM API calls
- Small Context Windows: Each cache file is independent, reducing memory usage
Example cache structure:
.doctreeai_cache/
├── src/
│ ├── main.rs.summary
│ ├── lib.rs.summary
│ ├── cache.rs.summary
│ └── .dir_summary
├── tests/
│ ├── integration_tests.rs.summary
│ └── .dir_summary
└── .dir_summary
- Line Mapping: Maps README content lines to relevant cached documentation
- Change Detection: Identifies when code changes affect specific README sections
- Smart Suggestions: Provides targeted update suggestions without modifying your files
- Mapping Persistence: Tracks line-to-cache mappings in
.doctreeai_cache/readme_mapping.json
DocTreeAI uses a sophisticated mapping system to ensure every line in your README that describes code can be validated:
- Content Analysis: The tool analyzes each line in your README to identify references to code components
- Cache Mapping: Lines mentioning modules, functions, files, or directories are mapped to relevant cache entries
- Change Tracking: When cached documentation is invalidated (due to code changes), the tool identifies affected README lines
- Validation Process:
- Compares current README content against the latest code summaries
- Detects outdated or inaccurate descriptions
- Generates specific suggestions for lines that need updating
- Non-Invasive: All suggestions are presented to the user without modifying the README file
This approach ensures your documentation stays accurate while giving you full control over what changes to accept.
DocTreeAI analyzes the following file types:
- Languages: Rust, Python, JavaScript/TypeScript, Go, Java, C/C++, C#, PHP, Ruby, Swift, Kotlin, and more
- Web: HTML, CSS, SCSS, Vue, Svelte
- Config: JSON, YAML, TOML, XML
- Documentation: Markdown, LaTeX, reStructuredText
- Scripts: Shell scripts, PowerShell
- Other: SQL, GraphQL, Protocol Buffers, Dockerfiles, Makefiles
The tool consists of several key modules:
- Scanner: Gitignore-aware directory traversal and file discovery
- Hasher: SHA-256 file content hashing for change detection
- Cache: Directory-mirrored caching system with individual summary files
- LLM Client: OpenAI-compatible API integration with retry logic
- Summarizer: Hierarchical tree-based summarization engine
- README Validator: Validates README against codebase and suggests updates
cargo test
cargo clippy
- Set up a local LLM server with GPT-OSS-20B:
# Using Ollama (recommended) ollama pull gpt-oss:20b ollama serve # Or using LM Studio - download openai/gpt-oss-20b from the model library
- Configure environment variables (see Configuration section)
- Run
cargo run -- test
to verify setup - Use
cargo run -- run --dry-run
to test without modifications
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Add tests for new functionality
- Run
cargo test
andcargo clippy
- Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License. See the LICENSE file for details.
LLM Connection Failed
- Ensure your local LLM server is running
- Verify the
OPENAI_API_BASE
URL is correct - Check that GPT-OSS-20B model is available:
ollama list
or check LM Studio model library - For first-time setup:
ollama pull gpt-oss:20b
Permission Denied
- Ensure the tool has write permissions for the target directory
- Check that
.doctreeai_cache
is not read-only
Out of Memory
- For very large codebases, try processing subdirectories individually
- Increase your local LLM's context window if possible
- Use
doctreeai info
to check configuration and cache status - Use
doctreeai test
to verify LLM connectivity - Enable verbose logging with
-v
flag for detailed output
Generated with DocTreeAI - AI-powered documentation that stays up-to-date 🤖