# Lab 3: Cloud LLM Integration and Testing

**Purpose:** Learn how to call the Azure cloud-based LLM for more complex tasks. This lab ensures that the Azure Foundry deployment is accessible and can handle demanding queries, albeit with higher latency than local. It highlights how to handle *heavy* tasks like summarization using the cloud model.

## Overview

In this lab, we'll:
- Connect to Azure OpenAI service through Azure AI Foundry
- Test complex queries that require more sophisticated reasoning
- Compare cloud vs local performance characteristics
- Understand when to use cloud models in our hybrid architecture

## Step 3.1: Load Configuration and Connect to Azure OpenAI

In [None]:
import os
import time
from dotenv import load_dotenv
from openai import AzureOpenAI

# Load configuration from Lab 1
load_dotenv()

# Local model configuration
LOCAL_ENDPOINT = os.environ["LOCAL_MODEL_ENDPOINT"]
LOCAL_MODEL_ALIAS = os.environ["LOCAL_MODEL_NAME"]

# Azure OpenAI configuration
AZURE_OPENAI_ENDPOINT = os.getenv('AZURE_OPENAI_ENDPOINT')
AZURE_OPENAI_KEY = os.getenv('AZURE_OPENAI_KEY')
AZURE_OPENAI_DEPLOYMENT = os.getenv('AZURE_DEPLOYMENT_NAME')
AZURE_OPENAI_API_VERSION = os.getenv('AZURE_OPENAI_API_VERSION')

print(f"Azure endpoint: {AZURE_OPENAI_ENDPOINT}")
print(f"Azure deployment: {AZURE_OPENAI_DEPLOYMENT}")
print(f"API version: {AZURE_OPENAI_API_VERSION}")

# Verify configuration
if not all([AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_KEY, AZURE_OPENAI_DEPLOYMENT]):
    print("❌ Missing Azure OpenAI configuration. Please check your .env file.")
else:
    print("✅ Azure OpenAI configuration loaded")

## Step 3.2: Initialize Azure OpenAI Client

Configure the Azure OpenAI client for accessing the cloud-based model:

In [None]:
# Initialize Azure OpenAI client
azure_client = AzureOpenAI(
    api_key=AZURE_OPENAI_KEY,
    api_version=AZURE_OPENAI_API_VERSION,
    azure_endpoint=AZURE_OPENAI_ENDPOINT
)

print("✅ Azure OpenAI client initialized")

def query_azure_model(prompt, chat_history=None, max_tokens=500):
    """Send a query to the Azure model and return the response with timing."""
    if chat_history is None:
        chat_history = []
    
    # Add current prompt to history
    messages = chat_history + [{"role": "user", "content": prompt}]
    
    try:
        start_time = time.time()
        
        response = azure_client.chat.completions.create(
            model=AZURE_OPENAI_DEPLOYMENT,
            messages=messages,
            max_tokens=max_tokens,
            temperature=0.7
        )
        
        end_time = time.time()
        response_time = end_time - start_time
        
        content = response.choices[0].message.content
        return content, response_time, True
        
    except Exception as e:
        return f"Error: {str(e)}", 0, False

print("✅ Azure query function defined")

## Step 3.3: Test Basic Azure Connectivity

Let's start with a simple test to ensure our Azure connection is working:

In [None]:
# Test basic connectivity with a simple query
test_prompt = "Hello, Azure! Please respond with a brief greeting."

print("Testing Azure OpenAI connectivity...")
print("=" * 50)

response, response_time, success = query_azure_model(test_prompt, max_tokens=50)

if success:
    print(f"Query: {test_prompt}")
    print(f"Azure response: {response}")
    print(f"Response time: {response_time:.3f} seconds")
    print("✅ Azure OpenAI connection successful!")
else:
    print(f"❌ Connection failed: {response}")
    print("Please check your Azure configuration and try again.")

## Step 3.4: Test Complex Tasks - Document Summarization

Now let's test the Azure model with a complex task that would challenge a local model:

In [None]:
# Sample long text for summarization (simulating a document)
long_document = """
Artificial Intelligence (AI) is transforming industries across the globe at an unprecedented pace. 
From healthcare to finance, from transportation to entertainment, AI technologies are reshaping 
how we work, live, and interact with the world around us.

In healthcare, AI is enabling more accurate diagnostics through medical imaging analysis, 
drug discovery acceleration, and personalized treatment recommendations. Machine learning 
algorithms can now detect patterns in medical data that might be missed by human practitioners, 
leading to earlier disease detection and improved patient outcomes.

The financial sector has embraced AI for fraud detection, algorithmic trading, and risk assessment. 
Banks use machine learning models to analyze transaction patterns in real-time, identifying 
suspicious activities and protecting customers from financial crimes. Robo-advisors powered by 
AI are democratizing investment management, making sophisticated financial strategies accessible 
to retail investors.

Transportation is being revolutionized by autonomous vehicles and AI-powered traffic management 
systems. Self-driving cars promise to reduce accidents caused by human error, while intelligent 
traffic systems optimize routes and reduce congestion in urban areas.

However, the rapid adoption of AI also raises important ethical and societal questions. 
Concerns about job displacement, privacy, algorithmic bias, and the concentration of AI 
capabilities in the hands of a few large corporations are driving discussions about AI governance 
and regulation.

As we move forward, it's crucial to develop AI systems that are not only powerful and efficient 
but also transparent, fair, and aligned with human values. The future of AI depends on our 
ability to harness its potential while addressing its challenges responsibly.
"""

# Test document summarization
summarization_prompt = f"Please provide a concise summary of the following document:\n\n{long_document}"

print("Testing Azure model with document summarization:")
print("=" * 60)
print(f"Document length: {len(long_document)} characters")
print(f"Document word count: ~{len(long_document.split())} words")
print("\nRequesting summarization...")

summary, response_time, success = query_azure_model(summarization_prompt, max_tokens=300)

if success:
    print(f"\n📝 Azure Summary:")
    print(summary)
    print(f"\n⏱️  Response time: {response_time:.3f} seconds")
    print(f"📊 Summary length: {len(summary)} characters")
    print(f"🔄 Compression ratio: {len(long_document)/len(summary):.1f}:1")
else:
    print(f"❌ Summarization failed: {summary}")

## Step 3.5: Test Creative and Analytical Tasks

Let's test the Azure model with tasks that require creativity and deep analysis:

In [None]:
# Test creative writing
creative_tasks = [
    {
        "name": "Creative Writing",
        "prompt": "Write a short poem about the intersection of technology and nature, focusing on harmony rather than conflict.",
        "max_tokens": 200
    },
    {
        "name": "Business Analysis",
        "prompt": "Analyze the potential business impact of hybrid AI systems (combining local and cloud processing) for enterprise customers. Include benefits, challenges, and market opportunities.",
        "max_tokens": 400
    },
    {
        "name": "Technical Explanation",
        "prompt": "Explain quantum computing and its implications for current encryption methods. Make it accessible to a business audience without deep technical background.",
        "max_tokens": 350
    }
]

print("Testing Azure model with complex creative and analytical tasks:")
print("=" * 70)

total_time = 0
successful_tasks = 0

for i, task in enumerate(creative_tasks, 1):
    print(f"\n{i}. {task['name']}")
    print("-" * 40)
    print(f"Prompt: {task['prompt']}")
    print("\nGenerating response...")
    
    response, response_time, success = query_azure_model(
        task['prompt'], 
        max_tokens=task['max_tokens']
    )
    
    if success:
        successful_tasks += 1
        total_time += response_time
        
        print(f"\n🤖 Azure Response:")
        print(response)
        print(f"\n⏱️  Time: {response_time:.3f} seconds")
        print(f"📏 Length: {len(response)} characters")
    else:
        print(f"❌ Task failed: {response}")
    
    print("\n" + "="*70)

if successful_tasks > 0:
    avg_response_time = total_time / successful_tasks
    print(f"\n📊 Complex Task Performance Summary:")
    print(f"Successful tasks: {successful_tasks}/{len(creative_tasks)}")
    print(f"Average response time: {avg_response_time:.3f} seconds")
    print(f"Total processing time: {total_time:.3f} seconds")

## Step 3.6: Compare Local vs Cloud Performance

Let's directly compare the same queries on both local and cloud models:

In [None]:
from foundry_local import FoundryLocalManager

# Initialize and optionally bootstrap with a model
manager = FoundryLocalManager(alias_or_model_id=None, bootstrap=True)

# List models in cache
local_models = manager.list_cached_models()
print(f"Models in cache: {local_models}")

print(f"Local model alias: {local_models[0].alias}")

print(f"Local model ID: {local_models[0].id}")

In [None]:
# Load local model functions from Lab 2
from openai import OpenAI

try:
    # Recreate local client
    local_client = OpenAI(
        base_url=f"{LOCAL_ENDPOINT}/v1",
        api_key="not-needed"
    )
    
    local_available = True
    print("✅ Local model configuration loaded")
except Exception as e:
    local_available = False
    print(f"⚠️  Local model not available: {e}")
    print("Comparison will focus on Azure model only")

def query_local_model_simple(prompt):
    """Simple local model query function."""
    if not local_available:
        return "Local model not available", 0, False
    
    try:
        start_time = time.time()
        response = local_client.chat.completions.create(
            model=local_models[0].id,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=150,
            temperature=0.7
        )
        end_time = time.time()
        
        content = response.choices[0].message.content
        return content, end_time - start_time, True
    except Exception as e:
        return f"Error: {str(e)}", 0, False

In [None]:
# Comparison queries
comparison_queries = [
    "What is the capital of France?",
    "Explain machine learning in simple terms.",
    "Write a brief summary of renewable energy benefits."
]

print("🔄 Comparing Local vs Azure Model Performance:")
print("=" * 60)

for i, query in enumerate(comparison_queries, 1):
    print(f"\nQuery {i}: {query}")
    print("-" * 40)
    
    # Test local model
    if local_available:
        local_response, local_time, local_success = query_local_model_simple(query)
        print(f"🏠 LOCAL ({local_time:.3f}s): {local_response[:100]}{'...' if len(local_response) > 100 else ''}")
    else:
        print(f"🏠 LOCAL: Not available")
    
    # Test Azure model
    azure_response, azure_time, azure_success = query_azure_model(query, max_tokens=150)
    if azure_success:
        print(f"☁️  AZURE ({azure_time:.3f}s): {azure_response[:100]}{'...' if len(azure_response) > 100 else ''}")
    else:
        print(f"☁️  AZURE: {azure_response}")
    
    # Compare performance if both succeeded
    if local_available and local_success and azure_success:
        speedup = azure_time / local_time if local_time > 0 else float('inf')
        print(f"⚡ Speed comparison: Local is {speedup:.1f}x faster than Azure")
    
    print()

## Step 3.7: Error Handling and Rate Limiting

Let's test Azure's error handling and understand potential limitations:

In [None]:
# Test edge cases and error handling
edge_cases = [
    {
        "name": "Very Long Input",
        "prompt": "Explain this: " + "AI is transforming the world. " * 100,  # Repeat to test limits
        "max_tokens": 100
    },
    {
        "name": "Empty Input",
        "prompt": "",
        "max_tokens": 50
    },
    {
        "name": "Large Token Request",
        "prompt": "Write a comprehensive essay about artificial intelligence.",
        "max_tokens": 2000  # Test if this exceeds limits
    }
]

print("Testing Azure model error handling and limits:")
print("=" * 50)

for test_case in edge_cases:
    print(f"\n🧪 Testing: {test_case['name']}")
    print(f"Prompt length: {len(test_case['prompt'])} characters")
    print(f"Requested max tokens: {test_case['max_tokens']}")
    
    response, response_time, success = query_azure_model(
        test_case['prompt'], 
        max_tokens=test_case['max_tokens']
    )
    
    if success:
        print(f"✅ Success ({response_time:.3f}s)")
        print(f"Response length: {len(response)} characters")
        if len(response) > 200:
            print(f"Response preview: {response[:200]}...")
        else:
            print(f"Full response: {response}")
    else:
        print(f"❌ Failed: {response}")
    
    print("-" * 30)

## Step 3.8: Create Helper Functions for Future Labs

Let's create reusable functions for the Azure model that we'll use in subsequent labs:

In [None]:
def get_azure_client():
    """Get configured Azure OpenAI client."""
    return AzureOpenAI(
        api_key=AZURE_OPENAI_KEY,
        api_version=AZURE_OPENAI_API_VERSION,
        azure_endpoint=AZURE_OPENAI_ENDPOINT
    )

def query_azure_with_history(prompt, chat_history=None, max_tokens=500):
    """Query Azure model with optional chat history."""
    if chat_history is None:
        chat_history = []
    
    # Add current prompt to history
    messages = chat_history + [{"role": "user", "content": prompt}]
    
    try:
        start_time = time.time()
        
        response = azure_client.chat.completions.create(
            model=AZURE_OPENAI_DEPLOYMENT,
            messages=messages,
            max_tokens=max_tokens,
            temperature=0.7
        )
        
        end_time = time.time()
        response_time = end_time - start_time
        
        content = response.choices[0].message.content
        return content, response_time, True
        
    except Exception as e:
        return f"Error: {str(e)}", 0, False

print("✅ Azure helper functions created and saved for future labs")

## Step 3.9: Performance Summary and Analysis

Let's summarize what we've learned about the Azure model:

In [None]:
# Create a simple performance summary
print("📊 Azure Model Performance Analysis")
print("=" * 50)

azure_strengths = [
    "✅ Excellent at complex reasoning and analysis",
    "✅ High-quality creative content generation",
    "✅ Comprehensive document summarization",
    "✅ Detailed technical explanations",
    "✅ Large context window for long inputs",
    "✅ Latest knowledge and capabilities"
]

azure_considerations = [
    "⚠️  Higher latency due to network calls (typically 2-5 seconds)",
    "⚠️  Requires internet connectivity",
    "⚠️  Usage costs per token/request",
    "⚠️  Data privacy considerations (data sent to cloud)",
    "⚠️  Potential rate limiting during high usage"
]

print("🎯 Azure Model Strengths:")
for strength in azure_strengths:
    print(f"   {strength}")

print("\n🤔 Considerations:")
for consideration in azure_considerations:
    print(f"   {consideration}")

print("\n💡 Optimal Use Cases for Azure Model:")
optimal_cases = [
    "📝 Document analysis and summarization",
    "🎨 Creative writing and content generation",
    "🧠 Complex reasoning and problem-solving",
    "📊 Business analysis and strategic planning",
    "🔬 Technical explanations and tutorials",
    "🌐 Multi-language tasks and translations"
]

for case in optimal_cases:
    print(f"   {case}")

print("\n🔄 This analysis will guide our routing logic in Lab 4!")

## 🎉 Lab 3 Complete!

### What You've Accomplished:
- ✅ Successfully connected to Azure OpenAI through Azure AI Foundry
- ✅ Tested complex tasks like document summarization and creative writing
- ✅ Compared performance characteristics with local models
- ✅ Understood error handling and service limitations
- ✅ Created reusable functions for the hybrid system

### Key Findings:
- **Quality**: Azure models excel at complex reasoning and high-quality generation
- **Capability**: Can handle long documents, creative tasks, and detailed analysis
- **Latency**: Higher response times (2-5 seconds) due to network and computation
- **Cost**: Pay-per-use model with token-based pricing
- **Reliability**: Robust error handling but subject to rate limits

### Hybrid Architecture Insights:
1. **Azure models excel at**: Complex analysis, creative tasks, summarization, detailed explanations
2. **Trade-offs**: Higher quality and capability vs. increased latency and cost
3. **Complementary to local**: Local provides speed, Azure provides sophistication

### Next Steps:
- Proceed to Lab 4 to implement intelligent routing between local and Azure models
- The performance characteristics discovered here will inform our routing decisions
- Both local and Azure helper functions are now ready for integration

### Routing Strategy Preview:
Based on our testing, we'll route queries to:
- **Local**: Simple questions, greetings, basic calculations, quick responses
- **Azure**: Summarization, analysis, creative writing, complex reasoning

Ready to build the intelligent router in Lab 4! 🚀