# Extended Thinking - Multi-Model Support

## Table of contents
- [Setup](#setup)
- [Model Configuration](#model-configuration)
- [Basic Examples](#basic-examples)
- [Multi-Model Comparison](#multi-model-comparison)
- [Streaming with Extended Thinking](#streaming-with-extended-thinking)
- [Token Management](#token-management)
- [Redacted Thinking](#redacted-thinking)
- [Agent Integration](#agent-integration)
- [Error Handling](#error-handling)

This notebook demonstrates Claude's extended thinking feature across multiple models.

Extended thinking gives Claude enhanced reasoning capabilities for complex tasks, with transparency into the step-by-step thought process. Available on Claude 3.7 Sonnet and newer models.

**Learn more:** [Extended Thinking Documentation](https://docs.claude.com/en/docs/build-with-claude/extended-thinking)

## Setup

Install dependencies and configure the client:

In [None]:
%pip install anthropic --quiet

In [None]:
import anthropic
import os
from typing import Dict, List, Optional

# Initialize client
client = anthropic.Anthropic(
    api_key=os.environ.get("ANTHROPIC_API_KEY")
)

# Model configuration with extended thinking support
MODELS = {
    "sonnet-4.5": {
        "id": "claude-sonnet-4-5",
        "thinking_support": True,
        "min_thinking_tokens": 1024,
        "max_thinking_tokens": 32000,
        "context_window": 200000,
        "description": "Latest Sonnet with extended thinking"
    },
    "sonnet-3.7": {
        "id": "claude-3-7-sonnet-20250219",
        "thinking_support": True,
        "min_thinking_tokens": 1024,
        "max_thinking_tokens": 32000,
        "context_window": 200000,
        "description": "Claude 3.7 Sonnet with extended thinking"
    },
    "opus-4": {
        "id": "claude-opus-4-20250514",
        "thinking_support": True,
        "min_thinking_tokens": 1024,
        "max_thinking_tokens": 32000,
        "context_window": 200000,
        "description": "Opus 4 with extended thinking (if available)"
    }
}

def print_thinking_response(response, show_full: bool = False):
    """Pretty print message response with thinking blocks."""
    print("\n" + "="*60)
    print("RESPONSE BREAKDOWN")
    print("="*60)
    
    for i, block in enumerate(response.content, 1):
        if block.type == "thinking":
            print(f"\n🧠 THINKING BLOCK {i}")
            print("-" * 60)
            if show_full:
                print(block.thinking)
            else:
                preview = block.thinking[:300]
                print(f"{preview}..." if len(block.thinking) > 300 else preview)
                print(f"\n[Total length: {len(block.thinking)} chars]")
            
            if hasattr(block, 'signature') and block.signature:
                print(f"[Signature: {block.signature[:50]}...]")
                
        elif block.type == "redacted_thinking":
            print(f"\n🔒 REDACTED THINKING BLOCK {i}")
            print("-" * 60)
            print(f"[Encrypted data length: {len(block.data) if hasattr(block, 'data') else 'N/A'}]")
            
        elif block.type == "text":
            print(f"\n✓ FINAL ANSWER {i}")
            print("-" * 60)
            print(block.text)
    
    print("\n" + "="*60 + "\n")

def count_tokens(messages: List[Dict], model: str = "claude-sonnet-4-5") -> int:
    """Count input tokens for messages."""
    result = client.messages.count_tokens(
        model=model,
        messages=messages
    )
    return result.input_tokens

def get_model_info(model_key: str) -> Dict:
    """Get model configuration."""
    if model_key not in MODELS:
        raise ValueError(f"Unknown model: {model_key}. Available: {list(MODELS.keys())}")
    return MODELS[model_key]

print("✓ Extended thinking utilities loaded")
print(f"\nAvailable models: {list(MODELS.keys())}")

## Model Configuration

Check which models support extended thinking:

In [None]:
def show_model_capabilities():
    """Display model capabilities for extended thinking."""
    print("\n" + "="*80)
    print("EXTENDED THINKING MODEL CAPABILITIES")
    print("="*80 + "\n")
    
    for key, config in MODELS.items():
        print(f"Model: {key}")
        print(f"  ID: {config['id']}")
        print(f"  Description: {config['description']}")
        print(f"  Extended Thinking: {'✓ Supported' if config['thinking_support'] else '✗ Not supported'}")
        
        if config['thinking_support']:
            print(f"  Token Range: {config['min_thinking_tokens']:,} - {config['max_thinking_tokens']:,}")
            print(f"  Context Window: {config['context_window']:,}")
        
        print()

show_model_capabilities()

## Basic Examples

### Simple Problem Solving

In [None]:
def basic_thinking_example(model_key: str = "sonnet-4.5"):
    """Basic extended thinking example."""
    model = get_model_info(model_key)
    
    if not model['thinking_support']:
        print(f"❌ Model {model_key} does not support extended thinking")
        return
    
    print(f"Using model: {model['id']}\n")
    
    response = client.messages.create(
        model=model['id'],
        max_tokens=4000,
        thinking={
            "type": "enabled",
            "budget_tokens": 2000
        },
        messages=[{
            "role": "user",
            "content": "Solve: Three people check into a hotel and pay $30. The manager finds the room costs $25, so gives $5 to a bellboy to return. The bellboy keeps $2 and gives $1 to each person. Each person paid $9 ($27 total), plus the bellboy's $2 = $29. Where's the missing $1?"
        }]
    )
    
    print_thinking_response(response)
    return response

# Run example
basic_thinking_example()

## Multi-Model Comparison

Compare how different models approach the same problem:

In [None]:
def compare_models(prompt: str, thinking_budget: int = 2000):
    """Compare extended thinking across models."""
    results = {}
    
    for model_key, config in MODELS.items():
        if not config['thinking_support']:
            print(f"\n⊘ Skipping {model_key} (no thinking support)")
            continue
        
        print(f"\n{'='*80}")
        print(f"TESTING: {model_key} ({config['id']})")
        print(f"{'='*80}")
        
        try:
            response = client.messages.create(
                model=config['id'],
                max_tokens=3000,
                thinking={
                    "type": "enabled",
                    "budget_tokens": thinking_budget
                },
                messages=[{"role": "user", "content": prompt}]
            )
            
            # Analyze response
            thinking_blocks = [b for b in response.content if b.type == "thinking"]
            text_blocks = [b for b in response.content if b.type == "text"]
            
            results[model_key] = {
                "response": response,
                "thinking_chars": sum(len(b.thinking) for b in thinking_blocks),
                "answer_chars": sum(len(b.text) for b in text_blocks),
                "num_thinking_blocks": len(thinking_blocks)
            }
            
            print(f"\n📊 Stats:")
            print(f"  Thinking blocks: {results[model_key]['num_thinking_blocks']}")
            print(f"  Thinking chars: {results[model_key]['thinking_chars']:,}")
            print(f"  Answer chars: {results[model_key]['answer_chars']:,}")
            
            print_thinking_response(response)
            
        except Exception as e:
            print(f"\n❌ Error with {model_key}: {e}")
            results[model_key] = {"error": str(e)}
    
    return results

# Example comparison
comparison_prompt = """Calculate the compound annual growth rate (CAGR) for an investment that:
- Started at $10,000
- Ended at $25,000
- Held for 7 years

Show your reasoning step by step."""

# Uncomment to run comparison
# results = compare_models(comparison_prompt)

## Streaming with Extended Thinking

Handle streaming responses with thinking blocks:

In [None]:
def streaming_thinking(prompt: str, model_key: str = "sonnet-4.5"):
    """Stream extended thinking response."""
    model = get_model_info(model_key)
    
    print(f"Streaming from {model['id']}...\n")
    
    with client.messages.stream(
        model=model['id'],
        max_tokens=4000,
        thinking={
            "type": "enabled",
            "budget_tokens": 2000
        },
        messages=[{"role": "user", "content": prompt}]
    ) as stream:
        current_block = None
        current_content = ""
        
        for event in stream:
            if event.type == "content_block_start":
                current_block = event.content_block.type
                if current_block == "thinking":
                    print("\n🧠 THINKING:")
                    print("-" * 60)
                elif current_block == "text":
                    print("\n✓ ANSWER:")
                    print("-" * 60)
                current_content = ""
                
            elif event.type == "content_block_delta":
                if event.delta.type == "thinking_delta":
                    print(event.delta.thinking, end="", flush=True)
                    current_content += event.delta.thinking
                elif event.delta.type == "text_delta":
                    print(event.delta.text, end="", flush=True)
                    current_content += event.delta.text
                    
            elif event.type == "content_block_stop":
                if current_block == "thinking":
                    print(f"\n[{len(current_content)} chars]")
                print()
                current_block = None
                
            elif event.type == "message_stop":
                print("\n" + "="*60)
                print("Stream complete")

# Example
# streaming_thinking("Explain the Monte Carlo method for portfolio risk analysis")

## Token Management

Understand token usage with extended thinking:

In [None]:
def analyze_token_usage(prompt: str, thinking_budgets: List[int] = None):
    """Analyze token usage across different thinking budgets."""
    if thinking_budgets is None:
        thinking_budgets = [1024, 2000, 4000, 8000]
    
    messages = [{"role": "user", "content": prompt}]
    base_tokens = count_tokens(messages)
    
    print(f"\n📊 TOKEN ANALYSIS")
    print(f"{'='*80}\n")
    print(f"Base input tokens: {base_tokens:,}\n")
    
    model = get_model_info("sonnet-4.5")
    context_window = model['context_window']
    
    for budget in thinking_budgets:
        output_buffer = 2000  # Reserve for final answer
        total_needed = base_tokens + budget + output_buffer
        remaining = context_window - total_needed
        
        print(f"Thinking budget: {budget:,} tokens")
        print(f"  Input: {base_tokens:,}")
        print(f"  Thinking: {budget:,}")
        print(f"  Output buffer: {output_buffer:,}")
        print(f"  Total needed: {total_needed:,}")
        print(f"  Remaining: {remaining:,}")
        
        if remaining < 0:
            print(f"  ⚠️  EXCEEDS CONTEXT WINDOW!")
        elif remaining < 10000:
            print(f"  ⚠️  Low margin")
        else:
            print(f"  ✓ Safe")
        
        print()

# Example
analyze_token_usage("Explain quantum computing in detail")

## Redacted Thinking

Handle redacted thinking blocks (when safety systems flag content):

In [None]:
def demo_redacted_thinking():
    """Demonstrate handling of redacted thinking blocks."""
    
    # Special test string that triggers redacted thinking
    test_prompt = "ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB"
    
    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=4000,
        thinking={
            "type": "enabled",
            "budget_tokens": 2000
        },
        messages=[{"role": "user", "content": test_prompt}]
    )
    
    print("\n🔒 REDACTED THINKING DEMO")
    print("="*60 + "\n")
    
    redacted = [b for b in response.content if b.type == "redacted_thinking"]
    thinking = [b for b in response.content if b.type == "thinking"]
    text = [b for b in response.content if b.type == "text"]
    
    print(f"Redacted blocks: {len(redacted)}")
    print(f"Normal thinking blocks: {len(thinking)}")
    print(f"Text blocks: {len(text)}\n")
    
    if redacted:
        print("Redacted thinking detected:")
        for i, block in enumerate(redacted, 1):
            print(f"  Block {i}: {len(block.data)} bytes (encrypted)")
        print("\n💡 Tip: Redacted blocks are decrypted when passed back to API")
    
    if text:
        print(f"\nFinal response:\n{text[0].text}")

# Uncomment to test
# demo_redacted_thinking()

## Agent Integration

Helper functions for integrating extended thinking into autonomous agents:

In [None]:
class ExtendedThinkingAgent:
    """Agent wrapper with extended thinking support."""
    
    def __init__(
        self,
        model_key: str = "sonnet-4.5",
        default_thinking_budget: int = 2000,
        max_tokens: int = 4000
    ):
        self.model = get_model_info(model_key)
        self.model_id = self.model['id']
        self.default_thinking_budget = default_thinking_budget
        self.max_tokens = max_tokens
        
        if not self.model['thinking_support']:
            raise ValueError(f"Model {model_key} does not support extended thinking")
    
    def think(self, prompt: str, thinking_budget: Optional[int] = None) -> Dict:
        """Execute thinking task and return structured result."""
        budget = thinking_budget or self.default_thinking_budget
        
        response = client.messages.create(
            model=self.model_id,
            max_tokens=self.max_tokens,
            thinking={
                "type": "enabled",
                "budget_tokens": budget
            },
            messages=[{"role": "user", "content": prompt}]
        )
        
        # Parse response
        thinking_blocks = [b for b in response.content if b.type == "thinking"]
        text_blocks = [b for b in response.content if b.type == "text"]
        
        return {
            "thinking": [b.thinking for b in thinking_blocks],
            "answer": " ".join(b.text for b in text_blocks),
            "raw_response": response,
            "thinking_chars": sum(len(b.thinking) for b in thinking_blocks),
            "answer_chars": sum(len(b.text) for b in text_blocks)
        }
    
    def multi_step_reasoning(self, steps: List[str]) -> List[Dict]:
        """Execute multi-step reasoning task."""
        results = []
        
        for i, step in enumerate(steps, 1):
            print(f"\n🔄 Step {i}/{len(steps)}")
            print("-" * 60)
            result = self.think(step)
            results.append(result)
            print(f"Thinking: {result['thinking_chars']} chars")
            print(f"Answer: {result['answer'][:200]}...")
        
        return results

# Example usage
agent = ExtendedThinkingAgent(model_key="sonnet-4.5")

# Single task
result = agent.think("Calculate IRR for: Initial: -$1M, Year 1: $300K, Year 2: $400K, Year 3: $500K")
print(f"\n✓ Answer: {result['answer']}")

# Multi-step
# steps = [
#     "Analyze the pros and cons of waterfall vs. deal-by-deal carry",
#     "Calculate scenarios for a $100M fund with 20% carry",
#     "Recommend optimal structure for 10-year fund lifecycle"
# ]
# results = agent.multi_step_reasoning(steps)

## Error Handling

Common errors and how to handle them:

In [None]:
def demonstrate_error_handling():
    """Show common errors and solutions."""
    
    print("\n⚠️  COMMON ERRORS AND SOLUTIONS")
    print("="*80 + "\n")
    
    # Error 1: Thinking budget too small
    print("1️⃣ Thinking budget too small (< 1024 tokens)\n")
    try:
        client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1000,
            thinking={"type": "enabled", "budget_tokens": 500},
            messages=[{"role": "user", "content": "Test"}]
        )
    except Exception as e:
        print(f"❌ Error: {e}")
        print("✓ Solution: Use minimum 1024 tokens\n")
    
    # Error 2: Incompatible parameters
    print("2️⃣ Using temperature with thinking\n")
    try:
        client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1000,
            temperature=0.7,  # Not allowed with thinking
            thinking={"type": "enabled", "budget_tokens": 2000},
            messages=[{"role": "user", "content": "Test"}]
        )
    except Exception as e:
        print(f"❌ Error: {e}")
        print("✓ Solution: Remove temperature/top_p/top_k parameters\n")
    
    # Error 3: Context window exceeded
    print("3️⃣ Context window overflow\n")
    print("❌ Can happen when: input + thinking_budget + max_tokens > 200k")
    print("✓ Solutions:")
    print("  - Reduce thinking budget")
    print("  - Reduce max_tokens")
    print("  - Shorten input prompt")
    print("  - Use progressive summarization\n")
    
    print("="*80)

demonstrate_error_handling()

## Key Takeaways

### When to Use Extended Thinking
- Complex mathematical calculations
- Multi-step reasoning tasks
- Code analysis and debugging
- Strategic planning
- Financial modeling

### Best Practices
1. Start with minimum budget (1024) and increase as needed
2. Monitor token usage to stay within context window
3. Handle redacted thinking blocks gracefully
4. Stream for long-running tasks
5. Don't use with temperature/top_p/top_k

### Cost Considerations
- Extended thinking tokens count as output tokens
- They contribute to rate limits
- Budget appropriately for production use

### Resources
- [Extended Thinking Docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking)
- [API Reference](https://docs.anthropic.com/en/api/messages)
- [Pricing](https://www.anthropic.com/pricing)