## Lab2 - Multi-Model Comparison (Executable Version)

This is an executable version of Lab2 that includes error handling and works with or without API keys.

**What this lab demonstrates:**
- Multi-Model Comparison pattern
- Working with multiple AI provider APIs
- Using AI judges to evaluate responses
- Graceful error handling

**Requirements:**
- At least one API key in .env file (or use Ollama for free local models)
- Internet connection for API calls

In [11]:
# Start with imports
import os
import json
from dotenv import load_dotenv
from openai import OpenAI
try:
    from anthropic import Anthropic
    anthropic_available = True
except ImportError:
    print("⚠️ Anthropic not installed (optional)")
    anthropic_available = False

print("✅ Imports successful")

✅ Imports successful


In [12]:
# Load environment variables
load_dotenv(override=True)
print("✅ Environment loaded")

✅ Environment loaded


In [13]:
# Check API keys and show status
def check_key(key_name, prefix_len=8):
    key = os.getenv(key_name)
    if key and key != f'your_{key_name.lower()}_here':
        print(f"✅ {key_name} exists and begins {key[:prefix_len]}...")
        return True
    else:
        print(f"❌ {key_name} not set or using placeholder")
        return False

print("🔑 API Key Status:")
openai_ok = check_key('OPENAI_API_KEY')
anthropic_ok = check_key('ANTHROPIC_API_KEY', 7)
google_ok = check_key('GOOGLE_API_KEY', 2)
deepseek_ok = check_key('DEEPSEEK_API_KEY', 3)
groq_ok = check_key('GROQ_API_KEY', 4)

if not any([openai_ok, anthropic_ok, google_ok, deepseek_ok, groq_ok]):
    print("\n⚠️ No API keys found. This will use fallback data or fail gracefully.")
    print("💡 For best results, add at least one API key to your .env file")

🔑 API Key Status:
✅ OPENAI_API_KEY exists and begins sk-proj-...
✅ ANTHROPIC_API_KEY exists and begins sk-ant-...
❌ GOOGLE_API_KEY not set or using placeholder
❌ DEEPSEEK_API_KEY not set or using placeholder
✅ GROQ_API_KEY exists and begins gsk_...


In [14]:
# Generate a challenging question
if openai_ok:
    try:
        request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
        request += "Answer only with the question, no explanation."
        messages = [{"role": "user", "content": request}]

        openai_client = OpenAI()
        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
        )
        question = response.choices[0].message.content
        print(f"✅ Generated question: {question}")
    except Exception as e:
        print(f"❌ Error generating question: {e}")
        question = "Imagine you are tasked with explaining the concept of emergence to someone who has never encountered it before. How would you illustrate this concept using three different examples from completely different domains (biological, social, and technological), and what underlying principles connect these seemingly disparate phenomena?"
        print(f"📋 Using fallback question: {question}")
else:
    question = "Imagine you are tasked with explaining the concept of emergence to someone who has never encountered it before. How would you illustrate this concept using three different examples from completely different domains (biological, social, and technological), and what underlying principles connect these seemingly disparate phenomena?"
    print(f"📋 Using fallback question (no OpenAI key): {question}")

✅ Generated question: If you could redesign a societal institution (such as education, healthcare, or governance) to better address the complexities of human well-being in the 21st century, what foundational principles would you prioritize, and how would you ensure that these changes are equitable and sustainable across diverse populations?


In [15]:
# Initialize tracking lists
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

print(f"🎯 Testing {len([k for k in [openai_ok, anthropic_ok, google_ok, deepseek_ok, groq_ok] if k])} model(s) with the question...")

🎯 Testing 3 model(s) with the question...


In [16]:
# Test OpenAI GPT-4o-mini
if openai_ok:
    try:
        model_name = "gpt-4o-mini"
        print(f"🤖 Testing {model_name}...")

        response = openai_client.chat.completions.create(model=model_name, messages=messages)
        answer = response.choices[0].message.content

        print(f"✅ {model_name} responded successfully")
        print(f"📝 Preview: {answer[:100]}...")
        competitors.append(model_name)
        answers.append(answer)
    except Exception as e:
        print(f"❌ {model_name} failed: {e}")
else:
    print("⏭️ Skipping OpenAI (no API key)")

🤖 Testing gpt-4o-mini...
✅ gpt-4o-mini responded successfully
📝 Preview: Redesigning a societal institution to better address human well-being in the 21st century requires c...


In [17]:
# Test Anthropic Claude
if anthropic_ok and anthropic_available:
    try:
        model_name = "claude-3-5-sonnet-20241022"
        print(f"🤖 Testing {model_name}...")

        claude = Anthropic()
        response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
        answer = response.content[0].text

        print(f"✅ {model_name} responded successfully")
        print(f"📝 Preview: {answer[:100]}...")
        competitors.append(model_name)
        answers.append(answer)
    except Exception as e:
        print(f"❌ {model_name} failed: {e}")
else:
    print("⏭️ Skipping Anthropic (no API key or not installed)")

🤖 Testing claude-3-5-sonnet-20241022...
✅ claude-3-5-sonnet-20241022 responded successfully
📝 Preview: I aim to explore this complex question while remaining aware of my role and limitations as an AI. I'...


In [18]:
# Test Groq Llama
if groq_ok:
    try:
        model_name = "llama-3.3-70b-versatile"
        print(f"🤖 Testing {model_name}...")

        groq_api_key = os.getenv('GROQ_API_KEY')
        groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
        response = groq.chat.completions.create(model=model_name, messages=messages)
        answer = response.choices[0].message.content

        print(f"✅ {model_name} responded successfully")
        print(f"📝 Preview: {answer[:100]}...")
        competitors.append(model_name)
        answers.append(answer)
    except Exception as e:
        print(f"❌ {model_name} failed: {e}")
else:
    print("⏭️ Skipping Groq (no API key)")

🤖 Testing llama-3.3-70b-versatile...
✅ llama-3.3-70b-versatile responded successfully
📝 Preview: If I could redesign a societal institution to better address the complexities of human well-being in...


In [19]:
# Show current results
print(f"\n📊 Results so far: {len(competitors)} models responded successfully")
for i, competitor in enumerate(competitors):
    print(f"   {i+1}. {competitor}")

if len(competitors) == 0:
    print("\n❌ No models responded. Check your .env file or install Ollama for local testing.")
elif len(competitors) == 1:
    print("\n⚠️ Only one model responded. Need at least 2 for judging.")
else:
    print("\n✅ Multiple models responded. Ready for judging!")


📊 Results so far: 3 models responded successfully
   1. gpt-4o-mini
   2. claude-3-5-sonnet-20241022
   3. llama-3.3-70b-versatile

✅ Multiple models responded. Ready for judging!


In [20]:
# Show full responses
if len(competitors) > 0:
    print("\n" + "="*60)
    print(" FULL RESPONSES")
    print("="*60)

    for competitor, answer in zip(competitors, answers):
        print(f"\n🤖 {competitor}:")
        print("-" * 40)
        print(answer)
        print("-" * 40)


 FULL RESPONSES

🤖 gpt-4o-mini:
----------------------------------------
Redesigning a societal institution to better address human well-being in the 21st century requires careful consideration of foundational principles that can promote equity, sustainability, and adaptability. Here’s a vision focusing particularly on the **education system**, which has wide-ranging implications for personal and societal development.

### Foundational Principles

1. **Holistic Well-Being**: Education should prioritize the overall well-being of individuals, encompassing physical, mental, social, and emotional health. Curriculum redesign should incorporate life skills, mental health awareness, and physical education as core components to foster resilience and well-rounded growth.

2. **Equity and Accessibility**: Education must be equitable, ensuring all students, regardless of socioeconomic status, geography, or background, have equal access to quality resources and opportunities. This can involve tie

In [21]:
# Judge the responses (if we have multiple models and OpenAI)
if len(competitors) >= 2 and openai_ok:
    try:
        print("\n⚖️ JUDGING PHASE")
        print("=================")

        # Prepare responses for judging
        together = ""
        for index, answer in enumerate(answers):
            together += f"# Response from competitor {index+1}\n\n"
            together += answer + "\n\n"

        judge_prompt = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""

        judge_messages = [{"role": "user", "content": judge_prompt}]

        print("🤖 Asking o3-mini to judge the responses...")
        response = openai_client.chat.completions.create(
            model="o3-mini",
            messages=judge_messages,
        )
        results = response.choices[0].message.content

        print(f"📊 Raw judgment: {results}")

        # Parse results
        results_dict = json.loads(results)
        ranks = results_dict["results"]

        print("\n🏆 FINAL RANKINGS:")
        for index, result in enumerate(ranks):
            competitor = competitors[int(result)-1]
            print(f"   Rank {index+1}: {competitor}")

    except Exception as e:
        print(f"❌ Error during judging: {e}")
        print("💡 Judgment failed, but you can still compare the responses manually above!")

elif len(competitors) < 2:
    print("\n⚠️ Need at least 2 model responses for judging.")
elif not openai_ok:
    print("\n⚠️ OpenAI API key required for judging. You can still compare responses manually above!")


⚖️ JUDGING PHASE
🤖 Asking o3-mini to judge the responses...
📊 Raw judgment: {"results": ["1", "3", "2"]}

🏆 FINAL RANKINGS:
   Rank 1: gpt-4o-mini
   Rank 2: llama-3.3-70b-versatile
   Rank 3: claude-3-5-sonnet-20241022


## 🎯 Lab2 Complete!

**What you just experienced:**
- **Multi-Model Comparison Pattern**: Multiple AI models tackling the same challenge
- **API Integration**: Working with different AI provider APIs
- **Response Evaluation**: Using an AI judge to rank outputs
- **Error Handling**: Graceful degradation when services aren't available

**Commercial Applications:**
- Quality assurance in AI systems
- Model selection for specific tasks
- Ensemble methods for better accuracy
- A/B testing AI models in production

**Next Steps:**
1. Try adding more API keys to test additional models
2. Experiment with different questions
3. Modify the judging criteria
4. Use this pattern in your own projects!

In [3]:
%pip install anthropic

Collecting anthropic
  Downloading anthropic-0.60.0-py3-none-any.whl.metadata (27 kB)
Downloading anthropic-0.60.0-py3-none-any.whl (293 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m293.1/293.1 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: anthropic
Successfully installed anthropic-0.60.0


In [7]:
import os

# Define the path for the .env file
dotenv_path = '/content/.env'

# Check if the file already exists to avoid overwriting
if not os.path.exists(dotenv_path):
    # Create the .env file
    with open(dotenv_path, 'w') as f:
        # You can add placeholder content or leave it empty
        f.write("# Add your API keys here, e.g.:\n")
        f.write("# OPENAI_API_KEY=your_openai_key\n")
        f.write("# ANTHROPIC_API_KEY=your_anthropic_key\n")
        f.write("# GOOGLE_API_KEY=your_google_key\n")
        f.write("# DEEPSEEK_API_KEY=your_deepseek_key\n")
        f.write("# GROQ_API_KEY=your_groq_key\n")

    print(f"✅ Created empty .env file at {dotenv_path}")
else:
    print(f"ℹ️ .env file already exists at {dotenv_path}")

✅ Created empty .env file at /content/.env


In [8]:
import os

dotenv_path = '/content/.env'

# Write the API keys to the .env file, overwriting existing content
with open(dotenv_path, 'w') as f:
    f.write("OPENAI_API_KEY = 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\n")
    f.write("ANTHROPIC_API_KEY = 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\n")
    f.write("GROQ_API_KEY = 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\n")

print(f"✅ API keys written to {dotenv_path}")

✅ API keys written to /content/.env
