DocTreeAI

DocTreeAI is a Rust-based command-line tool that validates README.md files against your codebase using hierarchical tree-based summarization with local Language Models (LLMs). The tool intelligently scans your codebase, generates summaries for files and directories in a bottom-up fashion, and validates that your README accurately reflects the current state of your code by suggesting updates when documentation becomes outdated.

Features

🌳 Hierarchical Summarization: Uses tree-based analysis starting from individual files up to the project root
📦 Cache System: Efficient SHA-256 based caching to avoid redundant API calls
🔄 Smart README Validation: Validates README content against current codebase and suggests updates when needed
🚫 .gitignore Integration: Respects your project's ignore patterns
🔌 Local LLM Support: Works with any OpenAI-compatible local model server
⚡ Fast Performance: Concurrent processing and intelligent caching for speed
📊 Progress Tracking: Detailed logging and cache statistics

Installation

Prerequisites

Rust (latest stable edition)
A running local LLM server compatible with OpenAI API (we strongly recommend OpenAI's GPT-OSS-20B model)

Build from Source

git clone <repository-url>
cd doctreeai
cargo build --release

The binary will be available at target/release/doctreeai.

Configuration

DocTreeAI uses environment variables for configuration. We highly recommend using OpenAI's GPT-OSS-20B model for optimal documentation generation:

# Required Configuration
export OPENAI_API_BASE="http://localhost:11434/v1"  # Your LLM endpoint (required)
export OPENAI_MODEL_NAME="gpt-oss-20b"             # Model name (required)

# Optional Configuration
export OPENAI_API_KEY="ollama"                     # API key (defaults to "ollama")
export DOCTREEAI_CACHE_DIR=".doctreeai_cache"      # Cache directory (defaults to ".doctreeai_cache")
export DOCTREEAI_LOG_LEVEL="info"                  # Logging level (defaults to "info")

Note: Both OPENAI_API_BASE and OPENAI_MODEL_NAME are required. The tool will not use default values for these settings to ensure you explicitly configure your LLM endpoint and model.

Why GPT-OSS-20B?

We strongly recommend OpenAI's GPT-OSS-20B model for DocTreeAI because:

🧠 Superior Code Analysis: Excels at understanding and explaining code across all programming languages
📝 Documentation Excellence: Specifically optimized for generating clear, comprehensive technical documentation
🔧 Advanced Reasoning: Provides full chain-of-thought reasoning for better documentation quality
⚡ Efficient Performance: Only 3.6B active parameters per token, runs smoothly on 16GB consumer GPUs
🛠 Tool Integration: Native support for structured outputs and function calling
🎯 Cost-Effective: Optimized for local deployment with minimal resource requirements
📊 Proven Results: Matches or exceeds larger models on coding and technical analysis benchmarks
🔓 Open Source: Available under Apache 2.0 license for commercial and personal use

Usage

Commands

# Initialize DocTreeAI in a project
doctreeai init

# Validate README and suggest updates
doctreeai run

# Force regeneration (ignore cache)
doctreeai run --force

# Dry run (preview without changes)
doctreeai run --dry-run

# Show project and cache information
doctreeai info

# Test LLM connection
doctreeai test

# Clean cache
doctreeai clean

# Enable verbose logging
doctreeai -v run

Workflow

Initialize: Run doctreeai init to set up the cache and update .gitignore
Configure: Set your environment variables for the local LLM
Validate: Run doctreeai run to validate your README.md and get update suggestions
Iterate: The tool will use cached summaries for unchanged files on subsequent runs

How It Works

Hierarchical Analysis

DocTreeAI performs a bottom-up analysis of your codebase:

File Level: Each source code file is analyzed and summarized
Directory Level: Directory summaries are created from child summaries
Project Level: The root summary becomes your project overview

Caching Strategy

DocTreeAI uses a directory-mirrored cache structure for optimal performance:

File-Level Caching: Each source file gets a corresponding .summary file in the cache
Directory Summaries: Directories have .dir_summary files containing their aggregated summaries
Structure Mirroring: Cache directory structure exactly matches your codebase structure
SHA-256 Hashing: Files are hashed to detect changes and invalidate specific cache entries
Incremental Updates: Only modified files trigger new LLM API calls
Small Context Windows: Each cache file is independent, reducing memory usage

Example cache structure:

.doctreeai_cache/
├── src/
│   ├── main.rs.summary
│   ├── lib.rs.summary
│   ├── cache.rs.summary
│   └── .dir_summary
├── tests/
│   ├── integration_tests.rs.summary
│   └── .dir_summary
└── .dir_summary

Intelligent README Validation

Line Mapping: Maps README content lines to relevant cached documentation
Change Detection: Identifies when code changes affect specific README sections
Smart Suggestions: Provides targeted update suggestions without modifying your files
Mapping Persistence: Tracks line-to-cache mappings in .doctreeai_cache/readme_mapping.json

Validation Mapping System

DocTreeAI uses a sophisticated mapping system to ensure every line in your README that describes code can be validated:

Content Analysis: The tool analyzes each line in your README to identify references to code components
Cache Mapping: Lines mentioning modules, functions, files, or directories are mapped to relevant cache entries
Change Tracking: When cached documentation is invalidated (due to code changes), the tool identifies affected README lines
Validation Process:
- Compares current README content against the latest code summaries
- Detects outdated or inaccurate descriptions
- Generates specific suggestions for lines that need updating
Non-Invasive: All suggestions are presented to the user without modifying the README file

This approach ensures your documentation stays accurate while giving you full control over what changes to accept.

Supported File Types

DocTreeAI analyzes the following file types:

Languages: Rust, Python, JavaScript/TypeScript, Go, Java, C/C++, C#, PHP, Ruby, Swift, Kotlin, and more
Web: HTML, CSS, SCSS, Vue, Svelte
Config: JSON, YAML, TOML, XML
Documentation: Markdown, LaTeX, reStructuredText
Scripts: Shell scripts, PowerShell
Other: SQL, GraphQL, Protocol Buffers, Dockerfiles, Makefiles

Architecture

The tool consists of several key modules:

Scanner: Gitignore-aware directory traversal and file discovery
Hasher: SHA-256 file content hashing for change detection
Cache: Directory-mirrored caching system with individual summary files
LLM Client: OpenAI-compatible API integration with retry logic
Summarizer: Hierarchical tree-based summarization engine
README Validator: Validates README against codebase and suggests updates

Development

Running Tests

cargo test

Linting

cargo clippy

Local Development

Set up a local LLM server with GPT-OSS-20B:

# Using Ollama (recommended)
ollama pull gpt-oss:20b
ollama serve

# Or using LM Studio - download openai/gpt-oss-20b from the model library

Configure environment variables (see Configuration section)
Run cargo run -- test to verify setup
Use cargo run -- run --dry-run to test without modifications

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Add tests for new functionality
Run cargo test and cargo clippy
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License. See the LICENSE file for details.

Troubleshooting

Common Issues

LLM Connection Failed

Ensure your local LLM server is running
Verify the OPENAI_API_BASE URL is correct
Check that GPT-OSS-20B model is available: ollama list or check LM Studio model library
For first-time setup: ollama pull gpt-oss:20b

Permission Denied

Ensure the tool has write permissions for the target directory
Check that .doctreeai_cache is not read-only

Out of Memory

For very large codebases, try processing subdirectories individually
Increase your local LLM's context window if possible

Getting Help

Use doctreeai info to check configuration and cache status
Use doctreeai test to verify LLM connectivity
Enable verbose logging with -v flag for detailed output

Generated with DocTreeAI - AI-powered documentation that stays up-to-date 🤖

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DocTreeAI

Features

Installation

Prerequisites

Build from Source

Configuration

Why GPT-OSS-20B?

Usage

Commands

Workflow

How It Works

Hierarchical Analysis

Caching Strategy

Intelligent README Validation

Validation Mapping System

Supported File Types

Architecture

Development

Running Tests

Linting

Local Development

Contributing

License

Troubleshooting

Common Issues

Getting Help

About

Uh oh!

Releases

Packages

Languages

kstonekuan/docs-tree-ai

Folders and files

Latest commit

History

Repository files navigation

DocTreeAI

Features

Installation

Prerequisites

Build from Source

Configuration

Why GPT-OSS-20B?

Usage

Commands

Workflow

How It Works

Hierarchical Analysis

Caching Strategy

Intelligent README Validation

Validation Mapping System

Supported File Types

Architecture

Development

Running Tests

Linting

Local Development

Contributing

License

Troubleshooting

Common Issues

Getting Help

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages