<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [None]:
# Start with imports

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [None]:
# Always remember to do this!
load_dotenv(override=True)

In [None]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

In [None]:
# Generate a challenging question

request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(f"Generated Question: {question}")

## Intelligent Orchestrator Pattern

This pattern combines:
1. **Orchestrator-Workers** - Breaking down complex tasks
2. **Intelligent Routing** - Matching models to their strengths
3. **Synthesis** - Combining specialized responses

In [None]:
# STEP 1: Orchestrator breaks down the question and assigns models based on their strengths

orchestrator_prompt = f"""You are an intelligent orchestrator AI. Analyze this complex question and:

1. Break it down into 3-4 simpler sub-questions
2. For each sub-question, recommend which type of AI model would be best suited

Available models and their strengths:
- gpt-5-nano: Excellent at reasoning, complex logic, and nuanced analysis
- claude-sonnet-4-5: Strong at creative writing, empathy, and ethical reasoning
- gemini-2.5-flash: Fast at factual retrieval, technical explanations, and structured data
- deepseek-chat: Great at code generation, mathematical problems, and technical documentation
- openai/gpt-oss-120b: Good general purpose, cost-effective for straightforward tasks
- llama3.2: Privacy-focused local model, good for sensitive data and general tasks

Original question: {question}

Respond with JSON only, in this format:
{{
    "sub_questions": [
        {{
            "question": "the sub-question text",
            "reasoning": "why this model is best for this sub-question",
            "recommended_model": "model_name"
        }},
        ...
    ]
}}"""

orchestrator_messages = [{"role": "user", "content": orchestrator_prompt}]

response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=orchestrator_messages,
)
orchestration_plan = json.loads(response.choices[0].message.content)

print("üéØ Orchestrator's Intelligent Routing Plan:\n")
for i, item in enumerate(orchestration_plan["sub_questions"], 1):
    print(f"{i}. SUB-QUESTION: {item['question']}")
    print(f"   üìç ASSIGNED TO: {item['recommended_model']}")
    print(f"   üí° REASONING: {item['reasoning']}\n")

## For Ollama setup

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+`) and run `ollama serve`

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [None]:
!ollama pull llama3.2

In [None]:
# STEP 2: Initialize all model clients

claude = Anthropic()
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

# Map model names to their API clients
model_clients = {
    "gpt-5-nano": ("openai", openai),
    "claude-sonnet-4-5": ("claude", claude),
    "gemini-2.5-flash": ("gemini", gemini),
    "deepseek-chat": ("deepseek", deepseek),
    "openai/gpt-oss-120b": ("groq", groq),
    "llama3.2": ("ollama", ollama)
}

print("‚úÖ All model clients initialized")

In [None]:
# STEP 3: Execute sub-questions with orchestrator's model recommendations

sub_answers = {}

for idx, item in enumerate(orchestration_plan["sub_questions"], 1):
    sub_q = item["question"]
    recommended_model = item["recommended_model"]
    
    print(f"\nü§ñ Task {idx}: Using {recommended_model}")
    print(f"üìù Question: {sub_q[:80]}...")
    
    messages = [{"role": "user", "content": sub_q}]
    
    # Route to the appropriate client
    client_type, client = model_clients.get(recommended_model, ("openai", openai))
    
    try:
        if client_type == "claude":
            response = client.messages.create(
                model=recommended_model, 
                messages=messages, 
                max_tokens=800
            )
            answer = response.content[0].text
        else:
            response = client.chat.completions.create(
                model=recommended_model, 
                messages=messages
            )
            answer = response.choices[0].message.content
        
        sub_answers[sub_q] = {
            "model": recommended_model,
            "answer": answer,
            "reasoning": item["reasoning"]
        }
        print(f"‚úÖ Completed successfully\n")
        
    except Exception as e:
        print(f"‚ùå Error with {recommended_model}: {str(e)}")
        # Fallback to GPT-5-mini
        response = openai.chat.completions.create(
            model="gpt-5-mini", 
            messages=messages
        )
        answer = response.choices[0].message.content
        sub_answers[sub_q] = {
            "model": "gpt-5-mini (fallback)",
            "answer": answer,
            "reasoning": "Fallback due to error"
        }

In [None]:
# Display the sub-answers

for sub_q, data in sub_answers.items():
    display(Markdown(f"### Sub-Question: {sub_q}"))
    display(Markdown(f"**Model Used:** {data['model']}"))
    display(Markdown(f"**Answer:** {data['answer']}"))
    print("\n" + "="*80 + "\n")

In [None]:
# STEP 4: Synthesis - Combine all specialized responses

synthesis_prompt = f"""You are a synthesis AI combining specialized responses into a comprehensive answer.

ORIGINAL QUESTION: {question}

The orchestrator intelligently routed sub-questions to models based on their strengths:

"""

for sub_q, data in sub_answers.items():
    synthesis_prompt += f"\n{'='*60}\n"
    synthesis_prompt += f"SUB-QUESTION: {sub_q}\n"
    synthesis_prompt += f"ASSIGNED TO: {data['model']}\n"
    synthesis_prompt += f"SELECTION REASONING: {data['reasoning']}\n"
    synthesis_prompt += f"ANSWER: {data['answer']}\n"

synthesis_prompt += f"\n{'='*60}\n"
synthesis_prompt += "\nSynthesize these specialized responses into one coherent, comprehensive answer to the original question."
synthesis_prompt += "\nHighlight how different model strengths contributed to the final answer."

synthesis_messages = [{"role": "user", "content": synthesis_prompt}]
response = openai.chat.completions.create(
    model="gpt-5-nano",
    messages=synthesis_messages,
)
synthesized_answer = response.choices[0].message.content

display(Markdown("## üéØ Intelligently Orchestrated & Synthesized Answer:"))
display(Markdown(synthesized_answer))

## Pattern Analysis

In [None]:
# Display pattern analysis

model_list = '\n'.join(f'- **{data["model"]}**: {data["reasoning"]}' for data in sub_answers.values())

analysis = f"""
## üìä Pattern Analysis

### Patterns Used from Anthropic's Building Effective Agents:

1. **Orchestrator-Workers Pattern** ‚úÖ
   - One LLM coordinates the workflow
   - Breaks complex tasks into subtasks
   - Distributes work to specialized workers
   - Synthesizes results into coherent output

2. **Intelligent Routing Pattern** ‚úÖ
   - Matches models to their specific strengths
   - Dynamic model selection based on task requirements
   - Optimizes for quality by leveraging specialization

3. **Implicit Parallelization** ‚ö°
   - Sub-questions can be executed in parallel
   - Independent tasks distributed across models

### Key Innovations:

**Capability-Aware Orchestration**: This is more sophisticated than simple task distribution. 
The orchestrator:
- Understands each model's strengths and weaknesses
- Makes intelligent routing decisions
- Documents its reasoning for transparency
- Enables cost optimization (expensive models only where needed)

### Models Used in This Run:
{model_list}

### Total API Calls:
- 1 orchestrator call (question decomposition)
- {len(sub_answers)} worker calls (sub-question answering)
- 1 synthesizer call (final answer composition)
- **Total: {len(sub_answers) + 2} API calls**
"""

display(Markdown(analysis))

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Try modifying the orchestrator prompt to include cost considerations. Add a 'budget' field for each model and have the orchestrator balance quality vs. cost when making routing decisions.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">The Intelligent Orchestrator pattern is critical for production systems where:
            <ul>
                <li><b>Cost optimization</b> matters - use expensive models only where their strengths are needed</li>
                <li><b>Quality is paramount</b> - leverage specialization for each aspect of complex tasks</li>
                <li><b>Scalability is required</b> - easily add new models and define their capabilities</li>
                <li><b>Transparency is valued</b> - document routing decisions and reasoning</li>
            </ul>
            This pattern mirrors how you'd assemble a team of specialists for a complex project, making it intuitive for business stakeholders to understand.
            </span>
        </td>
    </tr>
</table>