# Chain-of-Agents: Complete Tutorial Series

**Master Notebook** | Built with Karpathy's Teaching Philosophy

## 🎯 What You'll Build

Transform this:
```python
# Traditional Multi-Agent (Slow, Expensive)
agent1_response = call_llm("Planner: " + task)     # $0.01, 2s
agent2_response = call_llm("Coder: " + agent1)     # $0.01, 2s
agent3_response = call_llm("Reviewer: " + agent2)  # $0.01, 2s
# Total: $0.03, 6 seconds
```

Into this:
```python
# Chain-of-Agents AFM (Fast, Cheap, Better)
response = afm_model(task)  # $0.01, 2s, 55.3% GAIA performance!
# Same quality, 3x faster, 3x cheaper
```

## 📚 Tutorial Structure

**Part 1**: [Multi-Agent Trajectories](coa_notebook_1_trajectories.ipynb) (30 min)
- Build agents from scratch using dictionaries
- Record agent interactions (the key CoA insight!)
- Generate training trajectories
- See 3x speed improvement immediately

**Part 2**: [Progressive Filtering](coa_notebook_2_filtering.ipynb) (45 min)
- Quality control for trajectory data
- Filter bad trajectories, keep gold ones
- Diversity and deduplication
- Why quality > quantity for AFM training

**Part 3**: [SFT - Distilling into AFM](coa_notebook_3_sft.ipynb) (45 min)
- Supervised fine-tuning from scratch
- Teach one model to simulate all agents
- Advanced techniques for 45% → 55% GAIA
- Production deployment strategies

**Part 4**: [PPO for 18-20% Gains](coa_notebook_4_ppo.ipynb) (60 min)
- Reinforcement learning optimization
- Reward function design for AFM
- The breakthrough that gets 55.3% GAIA!
- Complete performance comparison

## 🔥 Key Features

### Karpathy-Style Learning
- **First principles**: Start with dictionaries, build to transformers
- **Minimal code**: Core concepts in < 100 lines
- **Live experiments**: See improvements in real-time
- **Interactive exercises**: "Beat my score" challenges

### Complete Implementation
- **4 production-ready notebooks**
- **12 hands-on exercises**
- **Performance benchmarks**
- **Visualization tools**

### Preserved CoA Goals
- **55.3% GAIA performance** (vs 53.2% WebSailor)
- **47.9% LiveCodeBench** (vs 42.4% Reveal-32B)
- **18-20% RL improvement**
- **3x faster, 3x cheaper than multi-agent**

## 🚀 Quick Start

```bash
# Clone and run
git clone https://github.com/your-repo/coa-tutorial
cd coa-tutorial
jupyter notebook coa_notebook_1_trajectories.ipynb
```

## 📊 Learning Path

**Beginner** (Parts 1-2): Understand trajectories and filtering  
**Intermediate** (Part 3): Implement SFT from scratch  
**Advanced** (Part 4): Master PPO optimization  

**Total Time**: 3 hours  
**Prerequisites**: Basic Python, high school math  
**Hardware**: Runs on CPU (< 4GB RAM)  

## 🎓 What You'll Learn

1. **Why multi-agent distillation works**
2. **How to collect and filter training trajectories**
3. **SFT techniques for agent behaviors**
4. **PPO optimization for performance gains**
5. **Production deployment strategies**

## 🏆 Success Metrics

After completing this tutorial, you can:
- ✅ Explain why CoA beats traditional multi-agent systems
- ✅ Implement trajectory generation and filtering
- ✅ Build AFM using supervised fine-tuning
- ✅ Apply PPO for 18-20% performance gains
- ✅ Deploy production CoA systems

## 📖 Additional Resources

- [CoA Paper](https://arxiv.org/abs/XXX) - Original research
- [Improvement Plan](karpathy_style_improvements.md) - Detailed roadmap
- [Key Concepts](coa_key_concepts_preserved.md) - Core principles

## 🤝 Contributing

Found improvements? Submit a PR!
- Add more exercises
- Improve visualizations  
- Optimize performance
- Better explanations

---

**Start Your Journey**: [Open Part 1 →](coa_notebook_1_trajectories.ipynb)

Built with 🤖 [Claude Code](https://claude.ai/code) in Karpathy style