# Debug: Agent-Model API Mismatch Bug

**CRITICAL FINDING**: This is NOT just a prompt assignment issue - the actual API calls are going to the WRONG model!

## Evidence from `strong_first/budget_20000/comp_1_0`

**experiment_results.json (agent_performance) shows CORRECT assignment:**
```json
"Agent_Alpha": {"model": "claude-opus-4-5-thinking-32k"}
"Agent_Beta": {"model": "gpt-5-nano"}
```

**But token_usage patterns in all_interactions.json show SWAPPED API calls:**
- `Agent_Alpha`: `{"total_tokens": 1153}` - NO reasoning_tokens → **GPT-5 pattern**
- `Agent_Beta`: `{"input_tokens": ..., "reasoning_tokens": 3904}` → **Claude pattern**

## Bug Summary

1. The agent **objects** have the correct model names (verified via `agent.model_name`)
2. But the actual **API calls** are going to the wrong endpoints
3. This affects BOTH `weak_first` and `strong_first` experiments

## Key Files to Investigate
1. `strong_models_experiment/agents/agent_factory.py` - Creates agents (lines 55-76)
2. `strong_models_experiment/experiment.py` - Manages agent order (lines 131-147, 207-218)
3. `negotiation/llm_agents.py` - Actual API calls
4. `scripts/generate_ttc_configs.sh` - SLURM script model ordering (lines 436-440)

In [1]:
# Simulate the exact flow that happens in the experiment

# Step 1: Config generator (generate_ttc_experiments.py line 191)
# Always passes: '--models', reasoning_model, baseline_model
reasoning_model = "claude-opus-4-5-thinking-32k"  # The reasoning model
baseline_model = "gpt-5-nano"  # The baseline (non-reasoning) model

# Models list as passed to run_strong_models_experiment.py
models = [reasoning_model, baseline_model]
print(f"Models list from config generator: {models}")
print(f"  models[0] = {models[0]} (this becomes Agent_Alpha)")
print(f"  models[1] = {models[1]} (this becomes Agent_Beta)")

Models list from config generator: ['claude-opus-4-5-thinking-32k', 'gpt-5-nano']
  models[0] = claude-opus-4-5-thinking-32k (this becomes Agent_Alpha)
  models[1] = gpt-5-nano (this becomes Agent_Beta)


In [2]:
# Step 2: Agent Factory (agent_factory.py lines 55-67)
# Creates agents in order of the models list

agent_names = ["Alpha", "Beta"]
agent_mapping = {}

for i, model_name in enumerate(models):
    agent_id = f"Agent_{agent_names[i]}"
    agent_mapping[agent_id] = model_name
    
print("Agent Factory creates:")
for agent_id, model in agent_mapping.items():
    print(f"  {agent_id} -> {model}")

Agent Factory creates:
  Agent_Alpha -> claude-opus-4-5-thinking-32k
  Agent_Beta -> gpt-5-nano


In [3]:
# Step 3: Experiment code determines reasoning agent (experiment.py lines 207-218)
# This is where the bug occurs!

def determine_reasoning_agent_ids(model_order: str, reasoning_budget: int) -> list:
    """
    This is the BUGGY logic from experiment.py lines 207-218.
    
    The comments say:
    - weak_first: models = [baseline, reasoning] -> Agent_Beta is reasoning
    - strong_first: models = [reasoning, baseline] -> Agent_Alpha is reasoning
    
    BUT the config generator ALWAYS passes [reasoning, baseline]!
    """
    if not reasoning_budget:
        return []
    
    # This logic assumes the models list order changes based on model_order
    # But it doesn't! The config generator always passes [reasoning, baseline]
    if model_order == "weak_first":
        # Expects: models = [baseline, reasoning]
        # Actual:  models = [reasoning, baseline]
        reasoning_agent_ids = ["Agent_Beta"]  # WRONG! This is the baseline!
    else:  # strong_first
        # Expects: models = [reasoning, baseline]  
        # Actual:  models = [reasoning, baseline]
        reasoning_agent_ids = ["Agent_Alpha"]  # Correct!
    
    return reasoning_agent_ids

# Test with different model_order values
for model_order in ["weak_first", "strong_first"]:
    reasoning_ids = determine_reasoning_agent_ids(model_order, budget=5000)
    
    print(f"\nmodel_order = '{model_order}':")
    print(f"  Reasoning prompt assigned to: {reasoning_ids}")
    for agent_id in reasoning_ids:
        actual_model = agent_mapping[agent_id]
        is_reasoning_model = actual_model == reasoning_model
        status = "✓ CORRECT" if is_reasoning_model else "✗ BUG! Assigned to baseline!"
        print(f"  {agent_id} = {actual_model}")
        print(f"  {status}")

  return result


TypeError: determine_reasoning_agent_ids() got an unexpected keyword argument 'budget'

In [None]:
# Step 4: Verify with actual experiment data
# Load a run_*_all_interactions.json file to confirm

import json
from pathlib import Path

# Find an experiment with weak_first order
experiment_dir = Path("../experiments/results/ttc_scaling_20260124_224426")
weak_first_dir = experiment_dir / "claude-opus-4-5-thinking-32k_vs_gpt-5-nano" / "weak_first" / "budget_20000" / "comp_1_0"

if weak_first_dir.exists():
    interactions_file = weak_first_dir / "run_1_all_interactions.json"
    if interactions_file.exists():
        with open(interactions_file) as f:
            interactions = json.load(f)
        
        print(f"Loaded {len(interactions)} interactions from {interactions_file.name}")
        print("\nSample interactions showing token_usage:")
        
        for interaction in interactions[:6]:
            agent_id = interaction.get("agent_id")
            phase = interaction.get("phase")
            token_usage = interaction.get("token_usage", {})
            has_reasoning = "reasoning_tokens" in token_usage and token_usage["reasoning_tokens"]
            
            print(f"\n  {agent_id} ({phase}):")
            print(f"    token_usage: {token_usage}")
            print(f"    has reasoning_tokens: {has_reasoning}")
else:
    print(f"Directory not found: {weak_first_dir}")

## The Fix

There are two ways to fix this bug:

### Option A: Fix in experiment.py (lines 207-218)
Change the logic to check the actual model, not assume based on position:

```python
# Instead of assuming position, check which agent has the reasoning model
if reasoning_config.get("budget"):
    # Find which agent is the reasoning model by checking model names
    reasoning_agent_ids = []
    for agent in agents:
        # Check if this agent's model is a reasoning model
        model_name = agent.config._actual_model_id  # or similar
        if is_reasoning_model(model_name):  # Need to implement this check
            reasoning_agent_ids.append(agent.agent_id)
```

### Option B: Fix in config generator (generate_ttc_experiments.py line 191)
Pass models in the correct order based on model_order:

```python
if order == "weak_first":
    cmd_models = [baseline_model, reasoning_model]  # baseline first
else:  # strong_first
    cmd_models = [reasoning_model, baseline_model]  # reasoning first

cmd = [
    'python', 'run_strong_models_experiment.py',
    '--models', *cmd_models,
    ...
]
```

### Option C: Fix in experiment.py (lines 131-147)
Actually reverse the models list when model_order is weak_first:

```python
if model_order == "weak_first":
    # Reverse so baseline is first (Agent_Alpha), reasoning is second (Agent_Beta)
    models = models[::-1]
```

In [None]:
# Demonstrate the fix (Option C - reverse models for weak_first)

def determine_reasoning_agent_ids_FIXED(model_order: str, models_input: list, reasoning_budget: int) -> tuple:
    """
    FIXED version: Properly handles model ordering.
    
    Returns (reasoning_agent_ids, actual_models_order)
    """
    if not reasoning_budget:
        return [], models_input
    
    # THE FIX: Actually reverse the models list for weak_first
    if model_order == "weak_first":
        # Config generator passes [reasoning, baseline]
        # We need [baseline, reasoning] for weak_first
        models = models_input[::-1]  # Reverse!
        reasoning_agent_ids = ["Agent_Beta"]  # Now correct: Beta = reasoning
    else:  # strong_first
        models = models_input  # Keep as-is
        reasoning_agent_ids = ["Agent_Alpha"]  # Correct: Alpha = reasoning
    
    return reasoning_agent_ids, models

# Test the fix
print("=" * 60)
print("FIXED VERSION")
print("=" * 60)

for model_order in ["weak_first", "strong_first"]:
    reasoning_ids, actual_models = determine_reasoning_agent_ids_FIXED(
        model_order, 
        [reasoning_model, baseline_model],  # Config generator always passes this
        budget=5000
    )
    
    # Rebuild agent mapping with the (potentially reversed) models
    fixed_agent_mapping = {
        f"Agent_{agent_names[i]}": actual_models[i]
        for i in range(len(actual_models))
    }
    
    print(f"\nmodel_order = '{model_order}':")
    print(f"  Actual models order: {actual_models}")
    print(f"  Agent mapping:")
    for agent_id, model in fixed_agent_mapping.items():
        print(f"    {agent_id} -> {model}")
    print(f"  Reasoning prompt assigned to: {reasoning_ids}")
    
    for agent_id in reasoning_ids:
        actual_model = fixed_agent_mapping[agent_id]
        is_reasoning_model = actual_model == reasoning_model
        status = "✓ CORRECT" if is_reasoning_model else "✗ STILL WRONG!"
        print(f"    {agent_id} = {actual_model} {status}")

## Files to Modify

The minimal fix should be in **one** of these locations:

1. **`strong_models_experiment/experiment.py`** lines 131-147
   - Add model reversal for `weak_first`
   - This is the cleanest fix since it keeps the rest of the pipeline unchanged

2. **`scripts/generate_ttc_experiments.py`** line 191
   - Pass models in the correct order based on `model_order`
   - Would require regenerating all configs

**Recommended**: Fix in `experiment.py` by adding the model reversal logic.

## Minimal Test Case: Verify Agent-to-Model Mapping

The test below creates agents exactly as the experiment does, then checks which model each agent actually calls.

In [None]:
# Minimal test: Create agents and verify their actual model configuration
import sys
import os
sys.path.insert(0, '..')
os.chdir('..')  # Change to project root

from strong_models_experiment.agents import StrongModelAgentFactory
from strong_models_experiment.configs import STRONG_MODELS_CONFIG
import asyncio

async def test_agent_creation():
    """Test agent creation to verify model mapping."""
    factory = StrongModelAgentFactory()
    
    # This is what SLURM passes for strong_first
    models = ['claude-opus-4-5-thinking-32k', 'gpt-5-nano']
    config = {
        'model_order': 'strong_first',
        'max_tokens_default': 1000,
    }
    
    print("=== INPUT ===")
    print(f"models list: {models}")
    print(f"model_order: {config['model_order']}")
    print()
    
    # Create agents
    agents = await factory.create_agents(models, config)
    
    print("=== CREATED AGENTS ===")
    for agent in agents:
        print(f"\n{agent.agent_id}:")
        print(f"  type: {type(agent).__name__}")
        print(f"  model_name: {getattr(agent, 'model_name', 'N/A')}")
        
        # Check the actual API configuration
        if hasattr(agent, 'config'):
            cfg = agent.config
            print(f"  config.model_type: {getattr(cfg, 'model_type', 'N/A')}")
            print(f"  config._actual_model_id: {getattr(cfg, '_actual_model_id', 'N/A')}")
        
        # Check API client
        if hasattr(agent, 'client'):
            print(f"  client type: {type(agent.client).__name__}")

# Run the test
# await test_agent_creation()  # Uncomment to run

print("Run the cell above with 'await test_agent_creation()' to test agent creation")

## ACTUAL ROOT CAUSE (Discovered 2026-01-25)

**The bug was NOT about API calls going to wrong endpoints!**

### What the token_usage patterns actually mean:

1. **Agent_Alpha**: `{"total_tokens": 851}` - Anthropic-style response (no nested "usage" dict)
2. **Agent_Beta**: `{"input_tokens": 557, ..., "reasoning_tokens": 2048}` - OpenAI-style response (has nested "usage" dict)

The **format difference** is due to how each agent type returns metadata:
- Anthropic agents return `input_tokens`, `output_tokens` directly in metadata (no nested "usage")
- OpenAI agents return a nested `"usage"` dict in metadata

### The REAL Bug

The `reasoning_token_budget` was being applied to **ALL agents** instead of just the reasoning agent!

In `agent_factory.py` lines 72-76 (BEFORE FIX):
```python
agent = self._create_agent_by_type(
    ...
    reasoning_token_budget=reasoning_token_budget  # SAME budget for ALL agents!
)
```

This caused:
- **Claude (Agent_Alpha)**: Extended thinking enabled, but `thinking_tokens=0` for some reason
- **GPT-5-nano (Agent_Beta)**: `reasoning_effort="low"` incorrectly enabled, returns `reasoning_tokens` in response!

### The Fix (Implemented)

In `agent_factory.py`, only apply `reasoning_token_budget` to the reasoning agent:

```python
# Determine which agent index should receive the reasoning budget
model_order = config.get("model_order", "weak_first")
if model_order == "weak_first":
    reasoning_agent_index = 1  # Agent_Beta is reasoning
else:  # strong_first
    reasoning_agent_index = 0  # Agent_Alpha is reasoning

# Only apply to reasoning agent
agent_reasoning_budget = reasoning_token_budget if i == reasoning_agent_index else None

agent = self._create_agent_by_type(
    ...
    reasoning_token_budget=agent_reasoning_budget
)
```

In [None]:
# Verify the fix: Test agent creation with reasoning budget
import sys
import os
sys.path.insert(0, '..')
os.chdir('..')

from strong_models_experiment.agents import StrongModelAgentFactory
import asyncio

async def verify_reasoning_budget_fix():
    """Verify that reasoning_token_budget is only applied to the reasoning agent."""
    factory = StrongModelAgentFactory()
    
    test_cases = [
        ("strong_first", ['claude-opus-4-5-thinking-32k', 'gpt-5-nano']),
        ("weak_first", ['gpt-5-nano', 'claude-opus-4-5-thinking-32k']),
    ]
    
    for model_order, models in test_cases:
        print(f"\n{'='*60}")
        print(f"Testing: {model_order}")
        print(f"{'='*60}")
        
        config = {
            'model_order': model_order,
            'max_tokens_default': 1000,
            'reasoning_token_budget': 5000,
        }
        
        agents = await factory.create_agents(models, config)
        
        # Determine expected reasoning agent
        reasoning_agent = "Agent_Alpha" if model_order == "strong_first" else "Agent_Beta"
        
        print(f"\nExpected reasoning agent: {reasoning_agent}")
        print(f"\nAgent reasoning configuration:")
        
        all_correct = True
        for agent in agents:
            thinking_budget = agent.config.custom_parameters.get("thinking_budget_tokens", None)
            reasoning_effort = agent.config.custom_parameters.get("reasoning_effort", None)
            has_reasoning = thinking_budget is not None or reasoning_effort is not None
            should_have = agent.agent_id == reasoning_agent
            
            status = "✓" if has_reasoning == should_have else "✗ BUG!"
            if has_reasoning != should_have:
                all_correct = False
            
            print(f"  {agent.agent_id} ({type(agent).__name__}):")
            print(f"    thinking_budget: {thinking_budget}")
            print(f"    reasoning_effort: {reasoning_effort}")
            print(f"    has_reasoning={has_reasoning}, should_have={should_have} {status}")
        
        if all_correct:
            print(f"\n✓ Fix verified for {model_order}!")
        else:
            print(f"\n✗ Bug still present for {model_order}!")

# Uncomment to run:
# await verify_reasoning_budget_fix()
print("Uncomment 'await verify_reasoning_budget_fix()' to test the fix")