ไธญๆ | English
A close reading of Claude Code (~540K LOC TypeScript) and Codex (~467K LOC Rust). What can we learn about building AI agents by studying these two production systems side-by-side?
Status: Work in progress. Some articles are deep, some are stubs being filled in. Findings get published here as they're written.
Claude Code (Anthropic, ~540K LOC TypeScript) and Codex (OpenAI, ~467K LOC Rust) are two of the most ambitious open coding-agent codebases shipped to date.
This repo is a long-running effort to read both line by line and write down what's actually going on โ the patterns, the trade-offs, and the design decisions that aren't obvious from using the tools.
A few things that became clear early:
- Agent โ LLM. A model that suggests is a fundamentally different product from a system that executes. Most of the interesting engineering lives in that gap.
- The bottleneck is the system, not the model. Tools, loops, context, and prompt assembly often matter more than which model you call.
- Intelligence amplification compounds, it doesn't add. Each mechanism (tools ร loops ร context ร prompt ร compaction) multiplies the others โ weakening any one drags the whole product down.
- The next big improvements are in the system layer, not the model layer.
These threads are explored in detail throughout the articles below.
Part 0: Cognitive Foundation (3 articles)
Build deep understanding of AI Agent intelligence essence
- 00. What is Intelligence? - Complete loop definition
- 01. Essence of Agent Intelligence - Amplification formula
- 02. Deep Principles of Five Mechanisms - Information theory, control theory
Part 1: Failure Cases (5 articles)
Learn from failures, understand necessity of each mechanism
- 03. Without Tools? - 10-20x efficiency drop
- 04. Without Loops? - Case 3272 compression failure
- 05. Without Context? - 50% rework rate
- 06. Without System Prompt? - 50% reliability
- 07. Without Compression? - 2-hour conversation limit
Part 2: Design Decisions (5 articles)
Deep dive into reasons behind key design decisions
- 08. Why 3 Retries? - Circuit breaker design
- 09. Why Tool Classification? - Concurrency safety
- 10. Why Segmented Caching? - 90% cost savings
- 11. Why 5 Permission Modes? - User distribution
- 12. Why 75% Threshold? - Compression timing
Part 3: Deep Comparison (10 articles)
Compare implementation approaches of Claude Code and Codex
Five Mechanisms Comparison:
- 13. Tool System - 52 vs 30 tools
- 14. Loop Mechanism - AsyncGenerator vs Rust
- 15. Context Injection - CLAUDE.md vs config.toml
- 16. System Prompt - Segmented assembly
- 17. Auto Compression - Local vs remote
System Design Comparison:
- 18. Tool Orchestration - Amdahl's law, 90/10 rule
- 19. Permission System - Bayesian trust model
- 20. Retry & Fallback - Exponential backoff
- 21. Cost Control - Prompt Cache
- 22. Performance - TypeScript vs Rust
Part 4: Case Studies (3 articles)
Real-world cases, learn how to build specific types of Agents
- 23. Code Review Agent - 81% accuracy
- 24. Test Generation Agent - 85%+ coverage
- 25. Refactoring Agent - 95% success rate
Part 5: Summary (3 articles)
Summarize core insights, guide practice
- 26. Essence of Intelligence Amplification - Multiplication vs addition
- 27. Build Your Own Agent - From MVP to production
- 28. Future of AI Agents - Where is the next 10x
Read first 3 articles to build cognitive foundation:
Read failure cases to understand necessity of each mechanism:
- Without Tools - 10-20x efficiency drop
- Without Loops - Case 3272 failure
- Without Context - 50% rework rate
Follow tutorial to build your first Agent:
- Build Your Own Agent - 100-line MVP
A rough side-by-side. Numbers are based on the source trees we read; speed/feature characterizations are our impressions, not benchmarks.
| Project | Language | Lines of Code | Architecture | Tooling |
|---|---|---|---|---|
| Claude Code | TypeScript | ~540K | Layered (โ6 layers) | 50+ built-in tools |
| Codex | Rust | ~467K | Centralized core | ~30 tools, Skills-based |
Recurring themes we keep coming back to (each gets its own article):
- Prompt cache is the single biggest cost lever in both systems
- Auto-compaction is what makes long sessions actually work
- Tool concurrency is bounded by correctness, not by hardware
- Permission modeling is more product design than security engineering
- Cognitive Foundation (3 articles)
- Failure Cases (5 articles)
- Build MVP (1 article)
- Cognitive Foundation (3 articles)
- Failure Cases (5 articles)
- Design Decisions (5 articles)
- Deep Comparison (10 articles)
- Case Studies (3 articles)
- Summary (3 articles)
- Read all articles
- Build 3 real-world Agents
- Deep dive into source code
- Case 3272 Failure - Why circuit breaker is needed
- Mathematics of Trust - 10 successes vs 1 failure
- 90/10 Rule - 90% tasks use only 10% tools
- Optimal Concurrency Point - Not more is better, it's 10
- System > Model - GPT-3.5 + system > GPT-4 alone
- Multiplication vs Addition - Intelligence amplification is mutual enhancement
- Completeness > Single Point - Barrel theory
- Next 10x - In system, not in model
- Language: TypeScript
- Runtime: Bun
- Architecture: Layered (6 layers)
- Tools: 52+
- Features: Enterprise-grade, feature-rich
- Language: Rust
- Architecture: Centralized core
- Tools: 30+ (Skills-based)
- Features: Lightweight, high-performance
- Articles drafted: 28 across 6 sections โ some are deep dives, some are stubs being filled in
- Runnable examples: 1 (TypeScript minimal agent), more on the roadmap
- Updates: roughly weekly as new findings get written up
- What's missing: most articles still need a second pass, more code citations, and runnable companion examples
Contributions welcome!
- ๐ Report bugs
- ๐ก Suggest ideas
- ๐ Improve docs
- ๐จ Share your Agent
MIT License - see LICENSE
- Claude Code by Anthropic
- Codex by OpenAI
- GitHub Issues: Ask questions
- Website: buildagent.dev
โญ If this project helps you, please star the repo!
Remember: The future is not waited for, it's built. Start building your Agent now!