# Reasoning & Rationale Extraction

**Purpose:** Demonstrate thinking budget and rationale extraction features

**Date:** 2025-01-27  
**Author:** Leif Haven Martinson  

## Features

1. **Thinking Budget** - Allocate extra tokens for internal reasoning
2. **Rationale Extraction** - Get models to explain their reasoning
3. **Multi-Agent Reasoning** - See how each agent reasons through problems

## When to Use

- ✅ Complex problems requiring multi-step reasoning
- ✅ Strategic decisions with trade-offs
- ✅ Understanding why models give certain answers
- ✅ Debugging incorrect answers
- ✅ Building transparent AI systems

In [None]:
# 1. Add repository root AND parent directory to path
import sys
sys.path.append('../../../')  # Go up to repo root (for harness)
sys.path.append('../')          # Go up to multi-agent/ (for multi_agent module)

# 2. Import multi-agent specific functions
from multi_agent import (
    llm_call_with_rationale,    # LLM call that extracts reasoning
    ask_with_reasoning,          # Get both reasoning and answer
    run_strategy_with_rationale, # Run strategy with reasoning extraction
    run_strategy,                # Run any multi-agent strategy
    STRATEGIES                   # Available strategies
)

# 3. Import core harness functions
from harness import (
    llm_call,                    # Basic LLM call
    get_model_config             # Load pre-configured model settings
)
from harness.defaults import DEFAULT_MODEL, DEFAULT_PROVIDER

# 4. Show configuration and allow override
print("="*70)
print("🔧 REASONING NOTEBOOK CONFIGURATION")
print("="*70)
print(f"📍 Default Provider: {DEFAULT_PROVIDER}")
print(f"🤖 Default Model: {DEFAULT_MODEL or '(default for provider)'}")
print("="*70)

# 5. TO CHANGE: Uncomment and edit these lines:
# PROVIDER = "mlx"  # Options: "mlx", "ollama", "anthropic", "openai"
# MODEL = "your-model-name"

# Use defaults if not overridden
try:
    PROVIDER
except NameError:
    PROVIDER = DEFAULT_PROVIDER
    MODEL = DEFAULT_MODEL

print(f"\n✅ Using: {PROVIDER} / {MODEL}")
print("="*70 + "\n")

print("✓ Imports successful")


In [None]:
# ═══════════════════════════════════════════════════════════════════════# ⚙️  CONFIGURATION - EDIT THIS CELL TO CHANGE SETTINGS# ═══════════════════════════════════════════════════════════════════════PROVIDER = 'ollama'  # Options: 'mlx', 'ollama', 'anthropic', 'openai'MODEL = None         # Examples:                     # MLX: 'mlx-community/Llama-3.2-3B-Instruct-4bit'                     # Ollama: 'llama3.2:latest', 'qwen2.5:latest'                     # Anthropic: 'claude-3-5-sonnet-20241022'                     # OpenAI: 'gpt-4o'# Use defaults if not specifiedif MODEL is None:    MODEL = DEFAULT_MODEL# ═══════════════════════════════════════════════════════════════════════print("Current Configuration:")print("="*70)print(f"📍 Provider: {PROVIDER}")print(f"🤖 Model: {MODEL or '(default for provider)'}")print("="*70)

## Example 1: Thinking Budget

Compare responses with and without thinking budget

In [None]:
# 1. Define a math problem that benefits from reasoning
problem = """Janet's ducks lay 16 eggs per day. She eats three for breakfast every morning 
and bakes muffins for her friends every day with four. She sells the remainder 
at the farmers' market daily for $2 per fresh duck egg. How much in dollars 
does she make every day at the farmers' market?"""

# 2. First, generate WITHOUT thinking budget
print("🔵 WITHOUT thinking budget:")
print("="*80)

response_basic = llm_call(
    problem,
    provider=PROVIDER,
    model=MODEL,
    temperature=0.7
)

# 3. Print basic response details
print(response_basic.text)
print(f"\nTokens: {response_basic.tokens_out}")
print(f"Latency: {response_basic.latency_s:.2f}s")

# 4. Now generate WITH thinking budget (extra reasoning tokens)
print("\n\n🟢 WITH thinking budget (2000 tokens):")
print("="*80)

response_thinking = llm_call(
    problem,
    provider=PROVIDER,
    model=MODEL,
    temperature=0.7,
    thinking_budget=2000  # Allocate 2000 extra tokens for reasoning
)

# 5. Print response with thinking budget
print(response_thinking.text)
print(f"\nTokens: {response_thinking.tokens_out}")
print(f"Latency: {response_thinking.latency_s:.2f}s")

# 6. Compare the overhead of thinking budget
print("\n" + "="*80)
print("💡 Key Insight: Thinking budget may improve accuracy on complex problems")
print(f"   Token overhead: {response_thinking.tokens_out - response_basic.tokens_out} tokens")
print(f"   Latency overhead: {response_thinking.latency_s - response_basic.latency_s:.2f}s")

## Example 2: Rationale Extraction

Get the model to explain its reasoning before giving the answer

In [None]:
# Strategic decision problem
decision = """Our startup has $500k runway for 6 months. Should we:
A) Hire 3 engineers and build faster
B) Hire 1 engineer + 1 sales person to get revenue
C) Keep current team and extend runway to 12 months
"""

print("🧠 Getting reasoning + answer...\n")
print("="*80)

result = ask_with_reasoning(
    decision,
    provider=PROVIDER,
    model=MODEL,
    thinking_budget=2000
)

print("REASONING:")
print("─"*80)
print(result.rationale)

print("\n\nFINAL ANSWER:")
print("─"*80)
print(result.answer)

print("\n\n" + "="*80)
print(f"💾 Metadata:")
print(f"   Tokens: {result.llm_response.tokens_out}")
print(f"   Latency: {result.llm_response.latency_s:.2f}s")
print(f"   Provider: {result.llm_response.provider}")
print(f"   Model: {result.llm_response.model}")

## Example 3: Multi-Agent Reasoning

See how each agent in a multi-agent strategy reasons through the problem

In [None]:
# Technical decision
tech_decision = """Should we use React or Vue for building our new dashboard?

Context:
- Team: 2 frontend developers (both know React, 1 knows Vue)
- Timeline: 3 months to MVP
- Dashboard will have real-time data visualization, complex forms, and user management
- We prioritize development speed and maintainability
"""

print("🤝 Running adaptive team with reasoning...\n")
print("="*80)

result = run_strategy_with_rationale(
    "adaptive_team",
    tech_decision,
    n_experts=3,
    refinement_rounds=1,
    thinking_budget=1500,
    provider=PROVIDER,
    model=MODEL,
    verbose=True  # See each expert's analysis in real-time
)

print("\n\n" + "="*80)
print("📊 FINAL REASONING:")
print("="*80)
print(result.rationale)

print("\n\n" + "="*80)
print("✅ FINAL RECOMMENDATION:")
print("="*80)
print(result.answer)

## Example 4: Pre-configured Reasoning Models

Use pre-configured model setups optimized for reasoning

In [None]:
# Load pre-configured reasoning model
reasoning_config = get_model_config("gpt-oss-20b-reasoning")

if reasoning_config:
    print(f"✅ Loaded config: {reasoning_config.name}")
    print(f"   Model: {reasoning_config.model}")
    print(f"   Thinking budget: {reasoning_config.thinking_budget}")
    print(f"   Temperature: {reasoning_config.temperature}")
    print(f"   Description: {reasoning_config.description}")
    
    # Use it
    print("\n" + "="*80)
    print("Using pre-configured reasoning model...\n")
    
    result = llm_call(
        "What is 15% of 240?",
        **reasoning_config.to_kwargs()
    )
    
    print(result.text)
    print(f"\nTokens: {result.tokens_out}")
else:
    print("⚠️  Config not found. Run this to create defaults:")
    print("   from harness import get_config_manager")
    print("   get_config_manager().create_default_configs()")

## Example 5: Comparing Strategies with Reasoning

Test how different strategies reason through the same problem

In [None]:
# run_strategy is already imported in cell 1
problem = """A farmer has 12 chickens and rabbits combined. The animals have 38 legs total.
How many chickens and how many rabbits does the farmer have?"""

strategies = [
    ("single", {}),
    ("self_consistency", {"n_samples": 3}),
    ("adaptive_team", {"n_experts": 2, "refinement_rounds": 0})
]

for strategy_name, kwargs in strategies:
    print(f"\n{'='*80}")
    print(f"🔄 Strategy: {strategy_name}")
    print(f"{'='*80}\n")
    
    result = run_strategy(
        strategy_name,
        problem,
        provider=PROVIDER,
        model=MODEL,
        thinking_budget=1000,  # Give all strategies thinking budget
        **kwargs
    )
    
    print(result.output)
    print(f"\n⏱️  Latency: {result.latency_s:.2f}s")
    print(f"💰 Tokens: {result.tokens_out}")

## Summary

### Key Takeaways

1. **Thinking Budget** (`thinking_budget=N`)
   - Allocates extra tokens for internal reasoning
   - Useful for complex, multi-step problems
   - Trade-off: 2-3x more tokens, but potentially better accuracy

2. **Rationale Extraction** (`ask_with_reasoning`, `llm_call_with_rationale`)
   - Get models to explain their reasoning explicitly
   - Separates reasoning from final answer
   - Useful for transparency and debugging

3. **Multi-Agent Reasoning** (`run_strategy_with_rationale`)
   - See how each agent reasons through problems
   - Understand consensus formation
   - Identify where agents agree/disagree

### When to Use

| Feature | Best For | Avoid For |
|---------|----------|----------|
| Thinking Budget | Math, logic, multi-step reasoning | Simple factoid questions |
| Rationale Extraction | Transparency, debugging, complex decisions | Speed-critical applications |
| Multi-Agent Reasoning | Strategic decisions, complex problems | Well-defined, single-domain questions |

### Cost Impact

- **Thinking budget**: 2-3x token cost
- **Rationale extraction**: 1.5-2x token cost (explicit reasoning)
- **Combined**: 3-5x baseline cost

**Recommendation**: Use for high-value decisions where accuracy matters more than cost.

### Next Steps

1. Test on your specific problem domains
2. Tune `thinking_budget` (500-4000 tokens)
3. Measure accuracy improvement vs. cost increase
4. Use rationale to improve prompts and strategies