# üîê Admin Demo: Azure OpenAI Integration

**‚ö†Ô∏è ADMIN ONLY - Do not share this notebook with workshop participants**

This notebook demonstrates how LlamaStack can abstract multiple inference providers.

## What You'll Demonstrate
1. LlamaStack with 2 LLM providers (vLLM + Azure OpenAI)
2. Switching between local and cloud models with the same API
3. The power of provider abstraction - clients don't need to change

In [None]:
# Install required packages
%pip install -q requests

In [None]:
import requests
import json

# Configuration - UPDATE THIS with your admin namespace!
ADMIN_NAMESPACE = "admin-workshop"  # <-- Change to your admin namespace

# LlamaStack endpoint (internal OpenShift service)
LLAMASTACK_URL = f"http://lsd-genai-playground-service.{ADMIN_NAMESPACE}.svc.cluster.local:8321"

print(f"Admin Namespace: {ADMIN_NAMESPACE}")
print(f"LlamaStack URL: {LLAMASTACK_URL}")

## 1. Show Available Models (2 Providers)

After applying the Phase 2 config, LlamaStack should have 2 LLM models:
- `llama-32-3b-instruct` (local vLLM)
- `gpt-4.1-mini` (Azure OpenAI)

In [None]:
response = requests.get(f"{LLAMASTACK_URL}/v1/models", timeout=10)
models = response.json().get("data", [])

# Filter to LLM models only
llm_models = [m for m in models if m.get("model_type") == "llm"]

print(f"ü§ñ LLM Models Available: {len(llm_models)}")
print("=" * 60)
for m in llm_models:
    provider = m.get('provider_id')
    icon = "‚òÅÔ∏è" if "azure" in provider.lower() else "üñ•Ô∏è"
    print(f"  {icon} {m.get('identifier')}")
    print(f"     Provider: {provider}")
    print()

## 2. Compare: Local vLLM vs Azure OpenAI

Let's ask the same question to both models and compare responses.

In [None]:
# Define model IDs
VLLM_MODEL = "vllm-inference/llama-32-3b-instruct"  # Local model
AZURE_MODEL = "azure-openai/gpt-4.1-mini"  # Azure OpenAI

# Test question
TEST_QUESTION = "What are the key benefits of using AI in healthcare? List 3 points briefly."

print(f"üìù Question: {TEST_QUESTION}")
print("=" * 70)

In [None]:
# Query LOCAL vLLM model
print("\nüñ•Ô∏è LOCAL MODEL (vLLM - Llama 3.2-3B)")
print("-" * 70)

payload = {
    "model": VLLM_MODEL,
    "messages": [
        {"role": "user", "content": TEST_QUESTION}
    ],
    "temperature": 0.7,
    "max_tokens": 300
}

response = requests.post(
    f"{LLAMASTACK_URL}/v1/openai/v1/chat/completions",
    json=payload,
    timeout=60
)

if response.status_code == 200:
    result = response.json()
    content = result.get("choices", [{}])[0].get("message", {}).get("content", "")
    print(content)
else:
    print(f"‚ùå Error: {response.status_code} - {response.text}")

In [None]:
# Query AZURE OpenAI model
print("\n‚òÅÔ∏è CLOUD MODEL (Azure OpenAI - GPT-4.1-mini)")
print("-" * 70)

payload = {
    "model": AZURE_MODEL,
    "messages": [
        {"role": "user", "content": TEST_QUESTION}
    ],
    "temperature": 0.7,
    "max_tokens": 300
}

response = requests.post(
    f"{LLAMASTACK_URL}/v1/openai/v1/chat/completions",
    json=payload,
    timeout=60
)

if response.status_code == 200:
    result = response.json()
    content = result.get("choices", [{}])[0].get("message", {}).get("content", "")
    print(content)
else:
    print(f"‚ùå Error: {response.status_code} - {response.text}")

## 3. Key Takeaways for Participants

**Show this summary to participants:**

| Aspect | What Changed | What Stayed Same |
|--------|--------------|------------------|
| Config | Added Azure provider to ConfigMap | LlamaStack API unchanged |
| Code | Just change `model` parameter | Same endpoint, same format |
| Secrets | Azure keys in admin namespace only | Users never see API keys |

**The Power of LlamaStack:**
- ‚úÖ Same API for local and cloud models
- ‚úÖ Switch providers by changing one parameter
- ‚úÖ Secrets managed centrally by admin
- ‚úÖ No code changes needed in client applications

## 4. (Optional) Show Tool Calling with Azure

Demonstrate that Azure OpenAI can also use the MCP tools.

In [None]:
# List available tools
response = requests.get(f"{LLAMASTACK_URL}/v1/tools", timeout=10)
data = response.json()
tools = data if isinstance(data, list) else data.get("data", [])

mcp_tools = [t for t in tools if t.get("toolgroup_id", "").startswith("mcp::")]
print(f"üõ†Ô∏è MCP Tools Available: {len(mcp_tools)}")
for t in mcp_tools:
    print(f"  ‚Ä¢ {t.get('toolgroup_id')}/{t.get('name')}")

In [None]:
# Azure OpenAI with tool calling (if tools are available)
print("‚òÅÔ∏è Azure OpenAI with MCP Tools")
print("-" * 70)

payload = {
    "model": AZURE_MODEL,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant with access to weather and HR tools."},
        {"role": "user", "content": "What's the weather in Tokyo?"}
    ],
    "temperature": 0.7,
    "max_tokens": 500
}

response = requests.post(
    f"{LLAMASTACK_URL}/v1/openai/v1/chat/completions",
    json=payload,
    timeout=60
)

if response.status_code == 200:
    result = response.json()
    content = result.get("choices", [{}])[0].get("message", {}).get("content", "")
    print(content)
else:
    print(f"‚ùå Error: {response.status_code} - {response.text}")

---

## üìã Admin Setup Reminder

Before running this demo, ensure you have:

1. **Created Azure OpenAI secret:**
```bash
oc create secret generic azure-openai-secret \
  --from-literal=endpoint="https://YOUR-RESOURCE.openai.azure.com/" \
  --from-literal=api-key="YOUR-API-KEY" \
  --from-literal=api-version="2024-12-01-preview" \
  -n admin-workshop
```

2. **Applied Phase 2 LlamaStack config:**
```bash
oc apply -f manifests/llamastack/llama-stack-config-phase2.yaml -n admin-workshop
oc delete pod -l app=lsd-genai-playground -n admin-workshop
```

3. **Verified 2 models are available:**
```bash
oc exec deployment/lsd-genai-playground -n admin-workshop -- \
  curl -s http://localhost:8321/v1/models | python3 -c "
import json,sys
data=json.load(sys.stdin)
llms=[m for m in data.get('data',[]) if m.get('model_type')=='llm']
print(f'LLM Models: {len(llms)}')
for m in llms:
    print(f\"  - {m.get('identifier')} ({m.get('provider_id')})\")
"
```