# Chain-of-Agents Part 1: Multi-Agent Trajectories from Scratch

**Time**: 30 minutes | **Level**: Beginner | **Author**: Karpathy-style CoA Tutorial

## The Problem

You have a complex task. Traditional approach: Call multiple AI agents, each specialized:
- Agent 1 plans → costs $0.01, takes 2 seconds
- Agent 2 codes → costs $0.01, takes 2 seconds  
- Agent 3 reviews → costs $0.01, takes 2 seconds

**Total: $0.03, 6 seconds per request**

What if ONE model could do all three in a single call? **$0.01, 2 seconds total.**

That's Chain-of-Agents. Let's build it from scratch.

## Step 1: The Simplest Possible Agent

No classes. No frameworks. Just a dictionary.

In [None]:
# An agent is just a dictionary with a role
def create_agent(name, role):
    return {
        "name": name,
        "role": role,
        "history": []  # We'll record what it does
    }

# Create our team
planner = create_agent("Planner", "Break down problems into steps")
coder = create_agent("Coder", "Write code to solve problems")
critic = create_agent("Critic", "Review and improve solutions")

print(f"Agent team created: {planner['name']}, {coder['name']}, {critic['name']}")

## Step 2: Make Agents "Think" (Without LLMs)

We'll simulate agent responses. In production, these would be LLM calls.

In [None]:
def agent_think(agent, task, previous_output=None):
    """Simulate an agent processing a task"""
    
    # Build the prompt (what the agent "sees")
    prompt = f"Role: {agent['role']}\nTask: {task}"
    if previous_output:
        prompt += f"\nPrevious agent output: {previous_output}"
    
    # Simulate different agent behaviors (in reality, this would be an LLM call)
    if agent['name'] == "Planner":
        output = f"1. Understand {task}\n2. Design solution\n3. Implement\n4. Test"
    elif agent['name'] == "Coder":
        output = f"def solve():\n    # Implementation for: {task[:30]}...\n    return result"
    else:  # Critic
        output = f"Review: Code looks good. Consider edge cases for {task[:20]}..."
    
    # Record what happened
    agent['history'].append({
        "input": prompt,
        "output": output
    })
    
    return output

# Test single agent
result = agent_think(planner, "Build a todo app")
print("Planner output:")
print(result)

## Step 3: Chain Agents Together

This is where the magic happens. Agents pass information to each other.

In [None]:
def run_agent_chain(agents, task):
    """Run a chain of agents, each building on the previous output"""
    
    trajectory = []  # This records the entire chain - KEY FOR COA!
    current_output = None
    
    print(f"🎯 Task: {task}\n")
    print("="*50)
    
    for agent in agents:
        # Agent processes the task
        output = agent_think(agent, task, current_output)
        
        # Record this step in the trajectory
        trajectory.append({
            "agent": agent['name'],
            "role": agent['role'],
            "input": current_output if current_output else task,
            "output": output
        })
        
        # Print what happened
        print(f"\n[{agent['name']}]")
        print(output[:100] + "..." if len(output) > 100 else output)
        
        # Pass output to next agent
        current_output = output
    
    print("\n" + "="*50)
    return trajectory

# Run the chain!
trajectory = run_agent_chain([planner, coder, critic], "Build a todo app")

## Step 4: Visualize the Trajectory

Let's see what we just recorded. This trajectory is what we'll use to train our AFM!

In [None]:
def visualize_trajectory(trajectory):
    """ASCII art visualization of agent chain"""
    
    print("\n🔗 CHAIN-OF-AGENTS TRAJECTORY:")
    print("="*60)
    
    for i, step in enumerate(trajectory):
        # Box for each agent
        print(f"\n┌─ Step {i+1}: {step['agent']} {'─'*(40-len(step['agent']))}┐")
        print(f"│ Role: {step['role'][:45]:<45} │")
        print(f"│ Output: {step['output'][:42]:<42}... │")
        print(f"└{'─'*50}┘")
        
        # Arrow to next step
        if i < len(trajectory) - 1:
            print(" "*25 + "↓")
    
    print("\n" + "="*60)
    print(f"Total steps: {len(trajectory)}")
    print(f"Agents involved: {', '.join(set(s['agent'] for s in trajectory))}")

visualize_trajectory(trajectory)

## Step 5: The Key Insight - Multiple Trajectories

To train an AFM, we need MANY trajectories. Let's generate them.

In [None]:
# Different tasks to solve
tasks = [
    "Build a todo app",
    "Create a REST API",
    "Design a database schema",
    "Implement user authentication",
    "Optimize a slow query"
]

# Generate trajectories for all tasks
all_trajectories = []

for task in tasks:
    print(f"\n📝 Generating trajectory for: {task}")
    trajectory = run_agent_chain([planner, coder, critic], task)
    all_trajectories.append({
        "task": task,
        "trajectory": trajectory,
        "num_steps": len(trajectory)
    })
    print(f"   ✓ Generated {len(trajectory)} steps")

print(f"\n✅ Total trajectories collected: {len(all_trajectories)}")
print(f"📊 Total agent steps recorded: {sum(t['num_steps'] for t in all_trajectories)}")

## Step 6: Trajectory Statistics

Let's understand what we've collected. This helps us see patterns.

In [None]:
def analyze_trajectories(trajectories):
    """Analyze trajectory patterns - useful for understanding CoA"""
    
    # Count agent appearances
    agent_counts = {}
    total_steps = 0
    
    for traj_data in trajectories:
        for step in traj_data['trajectory']:
            agent = step['agent']
            agent_counts[agent] = agent_counts.get(agent, 0) + 1
            total_steps += 1
    
    # Print analysis
    print("📊 TRAJECTORY ANALYSIS")
    print("="*40)
    print(f"Total trajectories: {len(trajectories)}")
    print(f"Total steps: {total_steps}")
    print(f"Avg steps per task: {total_steps/len(trajectories):.1f}")
    print("\nAgent participation:")
    for agent, count in agent_counts.items():
        percentage = (count/total_steps)*100
        bar = '█' * int(percentage/5)
        print(f"  {agent:10} {bar:20} {percentage:.1f}%")
    
    return agent_counts

stats = analyze_trajectories(all_trajectories)

## Step 7: Simulate Traditional vs CoA Performance

Let's see why CoA is revolutionary.

In [None]:
import time

def simulate_traditional_multi_agent(task, agents):
    """Simulate traditional approach: multiple LLM calls"""
    start = time.time()
    
    costs = []
    for agent in agents:
        # Simulate API call
        time.sleep(0.5)  # Simulate network latency
        costs.append(0.01)  # $0.01 per call
    
    total_time = time.time() - start
    total_cost = sum(costs)
    
    return {
        "approach": "Traditional Multi-Agent",
        "time": total_time,
        "cost": total_cost,
        "api_calls": len(agents)
    }

def simulate_coa_afm(task):
    """Simulate CoA approach: single AFM call"""
    start = time.time()
    
    # Single API call to AFM
    time.sleep(0.5)  # Simulate network latency
    cost = 0.01  # Single call cost
    
    total_time = time.time() - start
    
    return {
        "approach": "CoA (AFM)",
        "time": total_time,
        "cost": cost,
        "api_calls": 1
    }

# Compare approaches
print("⚡ PERFORMANCE COMPARISON\n" + "="*40)

traditional = simulate_traditional_multi_agent("Build app", [planner, coder, critic])
coa = simulate_coa_afm("Build app")

for approach in [traditional, coa]:
    print(f"\n{approach['approach']}:")
    print(f"  Time: {approach['time']:.2f}s")
    print(f"  Cost: ${approach['cost']:.2f}")
    print(f"  API calls: {approach['api_calls']}")

# Show improvement
speedup = traditional['time'] / coa['time']
cost_reduction = (1 - coa['cost']/traditional['cost']) * 100

print(f"\n🚀 CoA IMPROVEMENTS:")
print(f"  {speedup:.1f}x faster")
print(f"  {cost_reduction:.0f}% cheaper")
print(f"  {traditional['api_calls'] - coa['api_calls']} fewer API calls")

## Step 8: Prepare Trajectories for Training

Transform our trajectories into training data for the AFM.

In [None]:
def trajectory_to_training_data(trajectory_data):
    """Convert trajectory to format suitable for training"""
    
    # Build the full conversation
    conversation = f"Task: {trajectory_data['task']}\n\n"
    
    for step in trajectory_data['trajectory']:
        # Add agent marker (this teaches the model to simulate different agents)
        conversation += f"[{step['agent']}]: {step['output']}\n\n"
    
    # Create training example
    return {
        "input": trajectory_data['task'],
        "output": conversation,
        "metadata": {
            "num_agents": len(set(s['agent'] for s in trajectory_data['trajectory'])),
            "num_steps": len(trajectory_data['trajectory'])
        }
    }

# Convert all trajectories
training_data = [trajectory_to_training_data(t) for t in all_trajectories]

# Show example
print("📚 TRAINING DATA EXAMPLE")
print("="*60)
example = training_data[0]
print(f"Input: {example['input']}")
print(f"\nOutput (what AFM will learn to generate):")
print("-"*40)
print(example['output'][:300] + "...")
print("-"*40)
print(f"\nMetadata: {example['metadata']}")

## Exercise 1: Beat My Implementation 🏆

Can you create a better agent chain?

In [None]:
# TODO: Create your own agent chain that performs better
# Hints:
# 1. Add more specialized agents
# 2. Change the order of agents
# 3. Make agents more specific to certain tasks

def your_agent_chain():
    """Create your improved agent chain"""
    
    # Your code here
    # Example: Add a "Debugger" agent, or "Optimizer" agent
    
    pass

# Baseline to beat: 3 agents, 3 steps
print("Baseline: 3 agents, sequential execution")
print("Can you design a more efficient chain?")

## Exercise 2: Trajectory Quality Score 📊

Not all trajectories are good for training. Implement a quality scorer.

In [None]:
def score_trajectory(trajectory):
    """Score trajectory quality (0-100)"""
    
    score = 0
    
    # TODO: Implement scoring logic
    # Ideas:
    # - Longer outputs might be better (+points)
    # - All agents participating is good (+points)  
    # - Repetitive outputs are bad (-points)
    # - Clear task completion is good (+points)
    
    # Your implementation here
    
    return score

# Test your scorer
for traj in all_trajectories[:3]:
    score = score_trajectory(traj['trajectory'])
    print(f"Task: {traj['task'][:30]}... Score: {score}")

## Exercise 3: Minimal CoA Implementation 🎯

Implement the core CoA concept in under 50 lines.

In [None]:
# Challenge: Implement core CoA in < 50 lines
# Must include:
# 1. Agent creation
# 2. Trajectory recording
# 3. Training data generation

def minimal_coa(task):
    """Your minimal Chain-of-Agents implementation"""
    
    # Your code here (keep it under 50 lines!)
    
    pass

# Line count check
import inspect
lines = len(inspect.getsource(minimal_coa).split('\n'))
print(f"Your implementation: {lines} lines")
print(f"Target: < 50 lines")
print(f"Status: {'✅ PASS' if lines < 50 else '❌ TOO LONG'}")

## Key Takeaways 🎓

1. **Trajectories are key**: Recording agent interactions is what makes CoA possible
2. **Simple data structure**: Trajectories are just lists of (agent, input, output)
3. **Massive efficiency gains**: 3x faster, 66% cheaper than traditional multi-agent
4. **Training data**: Each trajectory becomes an example for the AFM to learn

## What's Next?

In Part 2, we'll implement **Progressive Filtering** to select only high-quality trajectories for training. This is crucial for getting that 55.3% GAIA performance!

## Your Homework 📝

1. Generate 100 trajectories with different tasks
2. Implement a trajectory visualizer using matplotlib (not just ASCII)
3. Try different agent configurations and measure the difference
4. Read the CoA paper section on trajectory generation

Remember: **We just replaced 3 LLM calls with training data for 1 model!** 🚀