# Observability: Tracing and Logging

----

This notebook focuses on **observability patterns** for AI applications.

You will learn:

- **Distributed Tracing**: Track requests across services
- **Structured Logging**: Log effectively for debugging
- **Metrics Collection**: Monitor key performance indicators
- **Azure Monitor Integration**: Use Azure-native observability

## Table of Contents

- [Why Observability Matters](#why-observability-matters)
- [Setup](#setup)
- [Part 1: OpenTelemetry + Azure Monitor Setup](#part-1-opentelemetry--azure-monitor-setup)
- [Part 2: Real LLM Calls with Tracing](#part-2-real-llm-calls-with-opentelemetry-tracing)
- [Part 3: Azure Managed Grafana Dashboard](#part-3-azure-managed-grafana-dashboard)
- [Best Practices Summary](#best-practices-summary)
- [Cleanup Resources](#cleanup-resources)
- [Wrap-up](#wrap-up)

## Why Observability Matters

### The Three Pillars of Observability

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                    Observability Pillars                        ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ      LOGS         ‚îÇ       TRACES        ‚îÇ       METRICS       ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ What happened?    ‚îÇ Where did it go?    ‚îÇ How is it doing?    ‚îÇ
‚îÇ ‚Ä¢ Error details   ‚îÇ ‚Ä¢ Request flow      ‚îÇ ‚Ä¢ Latency           ‚îÇ
‚îÇ ‚Ä¢ Debug info      ‚îÇ ‚Ä¢ Service hops      ‚îÇ ‚Ä¢ Error rates       ‚îÇ
‚îÇ ‚Ä¢ Audit trail     ‚îÇ ‚Ä¢ Timing breakdown  ‚îÇ ‚Ä¢ Throughput        ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### Essential AI Metrics

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                    Essential AI Metrics                         ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ                                                                 ‚îÇ
‚îÇ  Performance:          Quality:           Cost:                 ‚îÇ
‚îÇ  ‚Ä¢ TTFT               ‚Ä¢ Success rate     ‚Ä¢ Tokens used          ‚îÇ
‚îÇ  ‚Ä¢ Total latency      ‚Ä¢ Error types      ‚Ä¢ $ per request        ‚îÇ
‚îÇ  ‚Ä¢ Tokens/second      ‚Ä¢ Retry count      ‚Ä¢ $ per token          ‚îÇ
‚îÇ                                                                 ‚îÇ
‚îÇ  Capacity:            Reliability:                              ‚îÇ
‚îÇ  ‚Ä¢ RPM usage          ‚Ä¢ 429 rate                                ‚îÇ
‚îÇ  ‚Ä¢ TPM usage          ‚Ä¢ 5xx rate                                ‚îÇ
‚îÇ  ‚Ä¢ Queue depth        ‚Ä¢ Circuit state                           ‚îÇ
‚îÇ                                                                 ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### Environment Variables Required

* You need to create an Azure Application Insights resource via Azure portal

| Variable | Description |
|----------|-------------|
| `APPLICATIONINSIGHTS_CONNECTION_STRING` | **Required**. Azure Application Insights connection string |
| `APPLICATIONINSIGHTS_RESOURCE_ID` | **Required**. Azure Application Insights resource ID |
| `AZURE_OPENAI_ENDPOINT` | Azure OpenAI endpoint URL |
| `AZURE_OPENAI_API_KEY` | Azure OpenAI API key |
| `AZURE_OPENAI_CHAT_DEPLOYMENT_NAME` | Deployment name for chat model |


### AI-Specific Observability Needs

| Aspect | What to Monitor | Why |
|--------|-----------------|-----|
| **Token Usage** | Input/output tokens | Cost tracking |
| **Latency** | TTFT, total time | User experience |
| **Quality** | Response relevance | Model performance |
| **Errors** | Rate limits, failures | Reliability |
| **Cost** | $ per request | Budget management |

## Setup

This notebook reuses the configuration file (`.foundry_config.json`) created by `0_setup/1_setup.ipynb`.

- If the file is missing, run the setup notebook first.
- Make sure you can authenticate (e.g., `az login`), so `DefaultAzureCredential` can work.

In [60]:
# Environment setup and PATH configuration
import json
import os
import subprocess
import time
import uuid
import logging
from datetime import datetime
from typing import List, Dict, Any, Optional, Callable
from dataclasses import dataclass, field, asdict
from contextlib import contextmanager
from functools import wraps
from dotenv import load_dotenv

load_dotenv(override=True)

# Ensure the notebook kernel can find Azure CLI (`az`) on PATH
possible_paths = [
    '/opt/homebrew/bin',   # macOS (Apple Silicon)
    '/usr/local/bin',      # macOS (Intel) / Linux
    '/usr/bin',            # Linux / Codespaces
    '/home/linuxbrew/.linuxbrew/bin',  # Linux Homebrew
]

az_path = None
try:
    result = subprocess.run(['which', 'az'], capture_output=True, text=True)
    if result.returncode == 0:
        az_path = os.path.dirname(result.stdout.strip())
        print(f'üîç Azure CLI found: {result.stdout.strip()}')
except Exception:
    pass

paths_to_add: list[str] = []
if az_path and az_path not in os.environ.get('PATH', ''):
    paths_to_add.append(az_path)
else:
    for path in possible_paths:
        if os.path.exists(path) and path not in os.environ.get('PATH', ''):
            paths_to_add.append(path)

if paths_to_add:
    os.environ['PATH'] = ':'.join(paths_to_add) + ':' + os.environ.get('PATH', '')
    print(f"‚úÖ Added to PATH: {', '.join(paths_to_add)}")
else:
    print('‚úÖ PATH looks good already')

print(f"\nPATH (first 150 chars): {os.environ['PATH'][:150]}...")

üîç Azure CLI found: /anaconda/envs/azureml_py38/bin//az
‚úÖ PATH looks good already

PATH (first 150 chars): /anaconda/envs/azureml_py38/bin/:/afh/code/agent-operator-lab/.venv/bin:/home/azureuser/.vscode-server/cli/servers/Stable-c9d77990917f3102ada88be140d2...


In [None]:
# Load Foundry project settings from .foundry_config.json
from azure.identity import DefaultAzureCredential

config_file = '../0_setup/.foundry_config.json'
try:
    with open(config_file, 'r', encoding='utf-8') as f:
        config = json.load(f)
except FileNotFoundError as e:
    print(f"‚ö†Ô∏è Could not find '{config_file}'.")
    print('üí° Run 0_setup/1_setup.ipynb first to create it.')
    raise e

# Project variables from config
FOUNDRY_NAME = config.get('FOUNDRY_NAME')
RESOURCE_GROUP = config.get('RESOURCE_GROUP')
LOCATION = config.get('LOCATION')
AZURE_AI_PROJECT_ENDPOINT = config.get('AZURE_AI_PROJECT_ENDPOINT')

# Azure OpenAI variables from env
AZURE_OPENAI_ENDPOINT = os.environ.get("AZURE_OPENAI_ENDPOINT")
AZURE_OPENAI_API_KEY = os.environ.get("AZURE_OPENAI_API_KEY")
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME = os.environ.get("AZURE_OPENAI_CHAT_DEPLOYMENT_NAME")
AZURE_OPENAI_API_VERSION = os.environ.get("AZURE_OPENAI_API_VERSION")

os.environ['FOUNDRY_NAME'] = FOUNDRY_NAME or ''
os.environ['LOCATION'] = LOCATION or ''
os.environ['RESOURCE_GROUP'] = RESOURCE_GROUP or ''
os.environ['AZURE_SUBSCRIPTION_ID'] = config.get('AZURE_SUBSCRIPTION_ID', '')

print(f"‚úÖ Loaded settings from '{config_file}'.")
print(f"\nüìå Foundry name: {FOUNDRY_NAME}")
print(f"üìå Resource group: {RESOURCE_GROUP}")
print(f"üìå Location: {LOCATION}")
print(f"üìå Azure OpenAI endpoint: {AZURE_OPENAI_ENDPOINT}")
print(f"üìå Chat deployment: {AZURE_OPENAI_CHAT_DEPLOYMENT_NAME}")

# Initialize credential for Azure services
credential = DefaultAzureCredential()

## Part 1: OpenTelemetry + Azure Application Insights Setup

This section configures **OpenTelemetry** with **Azure Application Insights**:
- Uses `azure-monitor-opentelemetry` distro for automatic instrumentation
- Standard OpenTelemetry APIs for traces, metrics, and logs
- Automatic export to Azure Monitor

> **Prerequisites**: `APPLICATIONINSIGHTS_CONNECTION_STRING` `APPLICATIONINSIGHTS_RESOURCE_ID` environment variables must be set.

> **Why OpenTelemetry?** It's the industry standard for observability, vendor-neutral, and deeply integrated with Azure Monitor.

In [62]:
# Part 1: OpenTelemetry + Azure Monitor Setup
# =============================================
# Using azure-monitor-opentelemetry for Azure Application Insights
import sys
import httpx
from openai import AzureOpenAI, RateLimitError, APIStatusError

# OpenTelemetry imports
from opentelemetry import trace, metrics
from opentelemetry.trace import Status, StatusCode
from azure.monitor.opentelemetry import configure_azure_monitor

# -------------------------------------------------------------------
# Helper: Create mock HTTP response for error injection
# -------------------------------------------------------------------

def create_mock_response(status_code: int, json_body: dict, headers: dict = None) -> httpx.Response:
    """Create a mock httpx.Response with a request attached for OpenAI exceptions."""
    request = httpx.Request("POST", "https://mock-api.openai.com/v1/chat/completions")
    return httpx.Response(status_code, headers=headers or {}, json=json_body, request=request)


# -------------------------------------------------------------------
# Azure Monitor OpenTelemetry Configuration (Required)
# -------------------------------------------------------------------

SERVICE_NAME = "ai-gateway-observability"
CONNECTION_STRING = os.environ.get("APPLICATIONINSIGHTS_CONNECTION_STRING")

if not CONNECTION_STRING:
    raise ValueError(
        "‚ùå APPLICATIONINSIGHTS_CONNECTION_STRING environment variable is required.\n"
        "   Please set it in your .env file or environment.\n"
        "   You can find it in Azure Portal ‚Üí Application Insights ‚Üí Overview ‚Üí Connection String"
    )

APPLICATIONINSIGHTS_RESOURCE_ID = os.environ.get("APPLICATIONINSIGHTS_RESOURCE_ID")

if not APPLICATIONINSIGHTS_RESOURCE_ID:
    raise ValueError(
        "‚ùå APPLICATIONINSIGHTS_RESOURCE_ID environment variable is required.\n"
        "   Please set it in your .env file or environment.\n"
        "   You can find it in Azure Portal ‚Üí Application Insights ‚Üí Properties ‚Üí Resource ID"
    )

print("üîß Configuring Azure Monitor OpenTelemetry...")
configure_azure_monitor(
    connection_string=CONNECTION_STRING,
    logger_name="ai_gateway",
    enable_live_metrics=True,
)
print(f"‚úÖ Azure Monitor configured successfully")

# Get tracer and meter instances
tracer = trace.get_tracer(SERVICE_NAME, "1.0.0")
meter = metrics.get_meter(SERVICE_NAME, "1.0.0")

# -------------------------------------------------------------------
# Create OpenTelemetry Metrics Instruments
# -------------------------------------------------------------------

# Counters
request_counter = meter.create_counter(
    name="ai_requests_total",
    description="Total number of AI requests",
    unit="1"
)
token_counter = meter.create_counter(
    name="ai_tokens_total", 
    description="Total tokens used",
    unit="tokens"
)
error_counter = meter.create_counter(
    name="ai_errors_total",
    description="Total AI errors",
    unit="1"
)

# Histograms
latency_histogram = meter.create_histogram(
    name="ai_latency_ms",
    description="AI request latency in milliseconds",
    unit="ms"
)


# -------------------------------------------------------------------
# Helper Functions for Recording Metrics
# -------------------------------------------------------------------

def record_success(model: str, input_tokens: int, output_tokens: int, latency_ms: float, span=None):
    """Record a successful AI request to OpenTelemetry metrics."""
    request_counter.add(1, {"model": model, "status": "success"})
    token_counter.add(input_tokens, {"model": model, "type": "input"})
    token_counter.add(output_tokens, {"model": model, "type": "output"})
    latency_histogram.record(latency_ms, {"model": model})
    
    if span:
        span.set_attribute("ai.input_tokens", input_tokens)
        span.set_attribute("ai.output_tokens", output_tokens)
        span.set_attribute("ai.latency_ms", latency_ms)
        span.set_attribute("ai.model", model)


def record_error(model: str, error_type: str, span=None):
    """Record an AI error to OpenTelemetry metrics."""
    error_counter.add(1, {"model": model, "error_type": error_type})
    request_counter.add(1, {"model": model, "status": "error"})
    
    if span:
        span.set_attribute("error.type", error_type)


print("\n‚úÖ Part 1: OpenTelemetry + Azure Monitor Setup Complete")
print("=" * 60)
print("üì¶ Configured components:")
print(f"   ‚Ä¢ OpenTelemetry Tracer: {tracer}")
print(f"   ‚Ä¢ OpenTelemetry Meter: {meter}")
print(f"   ‚Ä¢ Azure Application Insights: Connected ‚úÖ")
print("\nüìä Metrics instruments created:")
print("   ‚Ä¢ ai_requests_total (Counter)")
print("   ‚Ä¢ ai_tokens_total (Counter)")
print("   ‚Ä¢ ai_errors_total (Counter)")
print("   ‚Ä¢ ai_latency_ms (Histogram)")
print("\nüì§ All telemetry will be exported to Azure Application Insights")

Overriding of current MeterProvider is not allowed
Overriding of current TracerProvider is not allowed
Overriding of current LoggerProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented


üîß Configuring Azure Monitor OpenTelemetry...
‚úÖ Azure Monitor configured successfully

‚úÖ Part 1: OpenTelemetry + Azure Monitor Setup Complete
üì¶ Configured components:
   ‚Ä¢ OpenTelemetry Tracer: <opentelemetry.sdk.trace.Tracer object at 0x74b572888750>
   ‚Ä¢ OpenTelemetry Meter: <opentelemetry.sdk.metrics._internal.Meter object at 0x74b599e9a850>
   ‚Ä¢ Azure Application Insights: Connected ‚úÖ

üìä Metrics instruments created:
   ‚Ä¢ ai_requests_total (Counter)
   ‚Ä¢ ai_tokens_total (Counter)
   ‚Ä¢ ai_errors_total (Counter)
   ‚Ä¢ ai_latency_ms (Histogram)

üì§ All telemetry will be exported to Azure Application Insights


## Part 2: Real LLM Calls with OpenTelemetry Tracing

Using OpenTelemetry to trace real LLM calls including:
- **Basic Chat Completion**: Simple Q&A requests
- **Function Calling (Tool Use)**: Calculate monthly computing usage
- **Remote MCP Calls**: Query Azure documentation via MCP server
- **Error injection** to demonstrate error tracing (429, 500, 502, 503)

### Request Flow with Tracing

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                    Multi-Modal LLM Request Flow                     ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ                                                                     ‚îÇ
‚îÇ  user-1~4: Basic Chat Completion (with error injection)            ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê                      ‚îÇ
‚îÇ  ‚îÇ  User    ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ   LLM    ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Response ‚îÇ                      ‚îÇ
‚îÇ  ‚îÇ  Query   ‚îÇ    ‚îÇ  (GPT-4) ‚îÇ    ‚îÇ or Error ‚îÇ                      ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò                      ‚îÇ
‚îÇ       ‚îÇ              Error Injection: 429 ‚Üí 500 ‚Üí 502 ‚Üí 503        ‚îÇ
‚îÇ       ‚îÇ                                                             ‚îÇ
‚îÇ  user-5~8: More Chat Completions (success after errors)            ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê                      ‚îÇ
‚îÇ  ‚îÇ  User    ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ   LLM    ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Response ‚îÇ                      ‚îÇ
‚îÇ  ‚îÇ  Query   ‚îÇ    ‚îÇ  (GPT-4) ‚îÇ    ‚îÇ          ‚îÇ                      ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò                      ‚îÇ
‚îÇ                                                                     ‚îÇ
‚îÇ  user-9: Function Calling (Tool Use)                               ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îÇ
‚îÇ  ‚îÇ  User    ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ   LLM    ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Function ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Response ‚îÇ     ‚îÇ
‚îÇ  ‚îÇ  Query   ‚îÇ    ‚îÇ  (GPT-4) ‚îÇ    ‚îÇ  Call    ‚îÇ    ‚îÇ          ‚îÇ     ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îÇ
‚îÇ                                                                     ‚îÇ
‚îÇ  user-10: MCP Server Call (Azure Documentation)                    ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îÇ
‚îÇ  ‚îÇ  User    ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ  Agent   ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ   MCP    ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Response ‚îÇ     ‚îÇ
‚îÇ  ‚îÇ  Query   ‚îÇ    ‚îÇ          ‚îÇ    ‚îÇ  Server  ‚îÇ    ‚îÇ          ‚îÇ     ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îÇ
‚îÇ                                                                     ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### Error Scenarios Tested

| Error Code | Description | Use Case |
|------------|-------------|----------|
| **429** | Rate Limit Exceeded | Too many requests |
| **500** | Internal Server Error | Backend failure |
| **502** | Bad Gateway | Proxy/gateway issue |
| **503** | Service Unavailable | Service down |

In [63]:
# Part 2: Real LLM Calls with OpenTelemetry Tracing
# ===================================================
# Includes: Basic Chat, Function Calling, MCP Server Calls, and Error Scenarios

import asyncio
from agent_framework import MCPStreamableHTTPTool
from agent_framework.azure import AzureAIClient

# -------------------------------------------------------------------
# Function Definitions for Tool Use (Function Calling)
# -------------------------------------------------------------------

def calculate_monthly_usage(num_users: int, morning_hours: float, afternoon_hours: float, days_per_month: int = 22) -> dict:
    """
    Calculate monthly computing usage based on user activity patterns.
    
    Args:
        num_users: Number of active users
        morning_hours: Average usage hours per user in the morning (9AM-12PM)
        afternoon_hours: Average usage hours per user in the afternoon (1PM-6PM)
        days_per_month: Working days per month (default: 22)
    
    Returns:
        Dictionary with usage statistics
    """
    daily_hours_per_user = morning_hours + afternoon_hours
    monthly_hours_per_user = daily_hours_per_user * days_per_month
    total_monthly_hours = monthly_hours_per_user * num_users
    
    # Estimate compute units (1 hour = 1 vCPU-hour)
    peak_concurrent_users = int(num_users * 0.7)  # 70% peak concurrency
    recommended_vcpus = max(2, peak_concurrent_users)  # Minimum 2 vCPUs
    
    return {
        "num_users": num_users,
        "daily_hours_per_user": daily_hours_per_user,
        "monthly_hours_per_user": monthly_hours_per_user,
        "total_monthly_hours": total_monthly_hours,
        "peak_concurrent_users": peak_concurrent_users,
        "recommended_vcpus": recommended_vcpus,
        "estimated_monthly_cost_usd": round(total_monthly_hours * 0.05, 2)  # ~$0.05/vCPU-hour
    }


# Tool definition for OpenAI function calling
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "calculate_monthly_usage",
            "description": "Calculate monthly computing resource usage based on number of users and their average daily usage patterns (morning and afternoon hours)",
            "parameters": {
                "type": "object",
                "properties": {
                    "num_users": {
                        "type": "integer",
                        "description": "Number of active users"
                    },
                    "morning_hours": {
                        "type": "number",
                        "description": "Average usage hours per user in the morning (9AM-12PM)"
                    },
                    "afternoon_hours": {
                        "type": "number",
                        "description": "Average usage hours per user in the afternoon (1PM-6PM)"
                    },
                    "days_per_month": {
                        "type": "integer",
                        "description": "Working days per month (default: 22)"
                    }
                },
                "required": ["num_users", "morning_hours", "afternoon_hours"]
            }
        }
    }
]


# -------------------------------------------------------------------
# Multi-Error Injector (429, 500, 502, 503)
# -------------------------------------------------------------------

class MultiErrorInjector:
    """
    Inject various error types for comprehensive observability testing.
    Sequence: 429 ‚Üí 500 ‚Üí 502 ‚Üí 503 ‚Üí Success ‚Üí Success ‚Üí ...
    """
    
    def __init__(self, client: AzureOpenAI):
        self.client = client
        self.call_count = 0
        self.error_sequence = [
            (429, "Rate limit exceeded", {"retry-after": "2"}),
            (500, "Internal server error", {}),
            (502, "Bad gateway", {}),
            (503, "Service temporarily unavailable", {}),
        ]
    
    def create_completion(self, **kwargs):
        self.call_count += 1
        
        if self.call_count <= len(self.error_sequence):
            code, msg, headers = self.error_sequence[self.call_count - 1]
            resp = create_mock_response(code, {"error": {"message": msg}}, headers)
            
            if code == 429:
                raise RateLimitError(msg, response=resp, body=resp.json())
            else:
                raise APIStatusError(msg, response=resp, body=resp.json())
        
        return self.client.chat.completions.create(**kwargs)


# -------------------------------------------------------------------
# Function Call Handler with Tracing
# -------------------------------------------------------------------

def handle_function_call_with_tracing(client: AzureOpenAI, messages: list, user: str) -> tuple:
    """
    Handle function calling flow with OpenTelemetry tracing.
    Returns (final_response, tool_calls_made)
    """
    tool_calls_made = []
    
    with tracer.start_as_current_span("function_call_flow") as flow_span:
        flow_span.set_attribute("user", user)
        
        # Step 1: Initial LLM call with tools
        with tracer.start_as_current_span("llm_tool_selection") as tool_span:
            start = time.time()
            response = client.chat.completions.create(
                model=AZURE_OPENAI_CHAT_DEPLOYMENT_NAME,
                messages=messages,
                tools=TOOLS,
                tool_choice="auto",
                max_tokens=500,
            )
            tool_span.set_attribute("latency_ms", (time.time() - start) * 1000)
            tool_span.set_attribute("has_tool_calls", bool(response.choices[0].message.tool_calls))
        
        # Step 2: Process tool calls if any
        if response.choices[0].message.tool_calls:
            messages.append(response.choices[0].message)
            
            for tool_call in response.choices[0].message.tool_calls:
                with tracer.start_as_current_span("tool_execution") as exec_span:
                    exec_span.set_attribute("tool_name", tool_call.function.name)
                    exec_span.set_attribute("tool_id", tool_call.id)
                    
                    # Parse arguments and execute function
                    import json as json_module
                    args = json_module.loads(tool_call.function.arguments)
                    exec_span.set_attribute("tool_args", str(args))
                    
                    if tool_call.function.name == "calculate_monthly_usage":
                        result = calculate_monthly_usage(**args)
                        tool_calls_made.append({
                            "name": tool_call.function.name,
                            "args": args,
                            "result": result
                        })
                        exec_span.set_attribute("result.recommended_vcpus", result["recommended_vcpus"])
                        exec_span.set_attribute("result.total_monthly_hours", result["total_monthly_hours"])
                    else:
                        result = {"error": f"Unknown function: {tool_call.function.name}"}
                    
                    # Add tool result to messages
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": json_module.dumps(result)
                    })
            
            # Step 3: Final LLM call with tool results
            with tracer.start_as_current_span("llm_final_response") as final_span:
                start = time.time()
                final_response = client.chat.completions.create(
                    model=AZURE_OPENAI_CHAT_DEPLOYMENT_NAME,
                    messages=messages,
                    max_tokens=500,
                )
                final_span.set_attribute("latency_ms", (time.time() - start) * 1000)
                
            return final_response, tool_calls_made
        
        return response, tool_calls_made


# -------------------------------------------------------------------
# MCP Server Call Handler with Tracing
# -------------------------------------------------------------------

async def call_mcp_server_with_tracing(query: str, user: str) -> str:
    """
    Call Microsoft Learn MCP server with OpenTelemetry tracing.
    """
    with tracer.start_as_current_span("mcp_server_call") as mcp_span:
        mcp_span.set_attribute("user", user)
        mcp_span.set_attribute("query", query[:100])
        mcp_span.set_attribute("mcp.server_url", "https://learn.microsoft.com/api/mcp")
        
        try:
            async with (
                MCPStreamableHTTPTool(
                    name="Microsoft Learn MCP",
                    url="https://learn.microsoft.com/api/mcp",
                ) as mcp_docs,
                AzureAIClient(
                    credential=credential, 
                    project_endpoint=AZURE_AI_PROJECT_ENDPOINT
                ).create_agent(
                    name="AzureDocsAgent",
                    instructions="You help with Azure computing resource recommendations based on usage calculations. Be concise.",
                    tools=mcp_docs,
                ) as agent,
            ):
                with tracer.start_as_current_span("mcp_agent_run") as agent_span:
                    start = time.time()
                    result = await agent.run(query)
                    latency_ms = (time.time() - start) * 1000
                    
                    agent_span.set_attribute("latency_ms", latency_ms)
                    agent_span.set_attribute("response_length", len(str(result)))
                    
                    mcp_span.set_status(Status(StatusCode.OK))
                    return str(result)
                    
        except Exception as e:
            mcp_span.set_status(Status(StatusCode.ERROR, str(e)[:100]))
            mcp_span.record_exception(e)
            return f"MCP Error: {str(e)[:200]}"


# -------------------------------------------------------------------
# Main Execution
# -------------------------------------------------------------------

print("üìä Part 2: Real LLM Calls with OpenTelemetry Tracing")
print("=" * 60)
print("   Includes: Basic Chat, Function Calling, MCP Server Calls")
print("   Error Scenarios: 429, 500, 502, 503")

if AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_API_KEY:
    # Initialize OpenAI client
    client = AzureOpenAI(
        azure_endpoint=AZURE_OPENAI_ENDPOINT,
        api_key=AZURE_OPENAI_API_KEY,
        api_version=AZURE_OPENAI_API_VERSION,
    )
    
    # Multi-error injector: 429 ‚Üí 500 ‚Üí 502 ‚Üí 503 ‚Üí Success...
    injector = MultiErrorInjector(client)
    
    # Local stats for summary (not exported - just for notebook display)
    local_stats = {"requests": 0, "errors": 0, "tokens_in": 0, "tokens_out": 0, "latencies": []}
    
    # Extended request data with different request types and error scenarios
    requests_data = [
        # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
        # Error Injection Phase (4 errors: 429, 500, 502, 503)
        # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
        {"prompt": "What is Python?", "user": "user-1", "type": "chat", "expected": "429 Rate Limit"},
        {"prompt": "What is Azure?", "user": "user-2", "type": "chat", "expected": "500 Internal"},
        {"prompt": "What is OpenAI?", "user": "user-3", "type": "chat", "expected": "502 Bad Gateway"},
        {"prompt": "Explain cloud computing", "user": "user-4", "type": "chat", "expected": "503 Unavailable"},
        
        # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
        # Success Phase (after errors exhausted)
        # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
        {"prompt": "What is machine learning?", "user": "user-5", "type": "chat", "expected": "Success"},
        {"prompt": "What is deep learning?", "user": "user-6", "type": "chat", "expected": "Success"},
        {"prompt": "What is neural network?", "user": "user-7", "type": "chat", "expected": "Success"},
        {"prompt": "What is AI?", "user": "user-8", "type": "chat", "expected": "Success"},
        
        # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
        # Function Calling - calculate computing usage
        # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
        {
            "prompt": "Our team has 50 users. On average, they use computing resources for 2 hours in the morning and 3 hours in the afternoon. Calculate the total monthly usage and recommend the number of vCPUs needed for stable operation.",
            "user": "user-9",
            "type": "function_call",
            "expected": "Function Call"
        },
        
        # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
        # MCP Server Call - Azure documentation query
        # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
        {
            "prompt": "Based on a team of 50 users needing 8 vCPUs for stable operation with approximately 5,500 compute hours per month, what Azure VM size would you recommend? Consider cost optimization.",
            "user": "user-10",
            "type": "mcp_call",
            "expected": "MCP Call"
        },
    ]
    
    print("\nüìù Making LLM requests with OpenTelemetry tracing:")
    print("-" * 60)
    print("   Phase 1: Error injection (429 ‚Üí 500 ‚Üí 502 ‚Üí 503)")
    print("   Phase 2: Successful completions")
    print("   Phase 3: Function calling & MCP server call")
    print("-" * 60)
    
    error_breakdown = {}
    
    for i, req in enumerate(requests_data, 1):
        request_id = f"req-{uuid.uuid4().hex[:8]}"
        request_type = req.get("type", "chat")
        expected = req.get("expected", "")
        
        # Use OpenTelemetry tracing directly
        with tracer.start_as_current_span(
            f"{request_type}_completion",
            attributes={
                "request_id": request_id,
                "user": req["user"],
                "request_type": request_type,
                "prompt_length": len(req["prompt"]),
            }
        ) as span:
            local_stats["requests"] += 1
            
            try:
                # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
                # Type 1: Basic Chat Completion (with error injection)
                # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
                if request_type == "chat":
                    with tracer.start_as_current_span("validate_input") as validate_span:
                        validate_span.set_attribute("input_length", len(req["prompt"]))
                    
                    with tracer.start_as_current_span("llm_call", attributes={"model": AZURE_OPENAI_CHAT_DEPLOYMENT_NAME}) as llm_span:
                        start = time.time()
                        response = injector.create_completion(
                            model=AZURE_OPENAI_CHAT_DEPLOYMENT_NAME,
                            messages=[{"role": "user", "content": req["prompt"]}],
                            max_tokens=50,
                        )
                        latency_ms = (time.time() - start) * 1000
                        
                        # Record metrics directly via OpenTelemetry
                        record_success(
                            model=AZURE_OPENAI_CHAT_DEPLOYMENT_NAME,
                            input_tokens=response.usage.prompt_tokens,
                            output_tokens=response.usage.completion_tokens,
                            latency_ms=latency_ms,
                            span=llm_span
                        )
                        
                        # Update local stats for summary
                        local_stats["tokens_in"] += response.usage.prompt_tokens
                        local_stats["tokens_out"] += response.usage.completion_tokens
                        local_stats["latencies"].append(latency_ms)
                    
                    answer = response.choices[0].message.content[:40]
                    print(f"   ‚úÖ {i:2d}. {req['user']:8} [chat]: '{answer}...'")
                    print(f"       ‚îî‚îÄ Trace: {span.get_span_context().trace_id:032x}")
                
                # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
                # Type 2: Function Calling (Tool Use)
                # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
                elif request_type == "function_call":
                    messages = [{"role": "user", "content": req["prompt"]}]
                    
                    start = time.time()
                    response, tool_calls = handle_function_call_with_tracing(client, messages, req["user"])
                    latency_ms = (time.time() - start) * 1000
                    
                    # Record metrics directly via OpenTelemetry
                    record_success(
                        model=AZURE_OPENAI_CHAT_DEPLOYMENT_NAME,
                        input_tokens=response.usage.prompt_tokens,
                        output_tokens=response.usage.completion_tokens,
                        latency_ms=latency_ms,
                        span=span
                    )
                    
                    # Update local stats
                    local_stats["tokens_in"] += response.usage.prompt_tokens
                    local_stats["tokens_out"] += response.usage.completion_tokens
                    local_stats["latencies"].append(latency_ms)
                    
                    span.set_attribute("tool_calls_count", len(tool_calls))
                    
                    answer = response.choices[0].message.content[:60] if response.choices[0].message.content else "No content"
                    print(f"   üîß {i:2d}. {req['user']:8} [function_call]: Tool calls={len(tool_calls)}")
                    
                    if tool_calls:
                        for tc in tool_calls:
                            print(f"       ‚îî‚îÄ {tc['name']}: users={tc['args'].get('num_users')}, "
                                  f"vcpus={tc['result'].get('recommended_vcpus')}, "
                                  f"monthly_hrs={tc['result'].get('total_monthly_hours')}")
                    
                    print(f"       ‚îî‚îÄ Response: '{answer}...'")
                    print(f"       ‚îî‚îÄ Trace: {span.get_span_context().trace_id:032x}")
                
                # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
                # Type 3: MCP Server Call
                # ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
                elif request_type == "mcp_call":
                    span.set_attribute("mcp.server", "Microsoft Learn")
                    
                    start = time.time()
                    # Run async MCP call
                    mcp_result = await call_mcp_server_with_tracing(req["prompt"], req["user"])
                    latency_ms = (time.time() - start) * 1000
                    
                    # Record latency (MCP doesn't return token counts)
                    latency_histogram.record(latency_ms, {"model": "mcp_agent"})
                    local_stats["latencies"].append(latency_ms)
                    
                    span.set_attribute("latency_ms", latency_ms)
                    span.set_attribute("mcp_response_length", len(mcp_result))
                    
                    answer = mcp_result[:80] if mcp_result else "No response"
                    print(f"   üåê {i:2d}. {req['user']:8} [mcp_call]: MCP Server Response")
                    print(f"       ‚îî‚îÄ Response: '{answer}...'")
                    print(f"       ‚îî‚îÄ Trace: {span.get_span_context().trace_id:032x}")
                
            except RateLimitError as e:
                error_type = "RateLimitError_429"
                error_breakdown[error_type] = error_breakdown.get(error_type, 0) + 1
                local_stats["errors"] += 1
                record_error(AZURE_OPENAI_CHAT_DEPLOYMENT_NAME, error_type, span)
                span.set_status(Status(StatusCode.ERROR, "Rate limit exceeded"))
                print(f"   ‚ö†Ô∏è {i:2d}. {req['user']:8} [{request_type}]: Rate limited (429)")
                print(f"       ‚îî‚îÄ Trace: {span.get_span_context().trace_id:032x} [ERROR]")
                
            except APIStatusError as e:
                error_type = f"APIStatusError_{e.status_code}"
                error_breakdown[error_type] = error_breakdown.get(error_type, 0) + 1
                local_stats["errors"] += 1
                record_error(AZURE_OPENAI_CHAT_DEPLOYMENT_NAME, error_type, span)
                span.set_status(Status(StatusCode.ERROR, str(e)[:50]))
                print(f"   ‚ùå {i:2d}. {req['user']:8} [{request_type}]: Server error ({e.status_code})")
                print(f"       ‚îî‚îÄ Trace: {span.get_span_context().trace_id:032x} [ERROR]")
                
            except Exception as e:
                error_type = f"Exception_{type(e).__name__}"
                error_breakdown[error_type] = error_breakdown.get(error_type, 0) + 1
                local_stats["errors"] += 1
                record_error(AZURE_OPENAI_CHAT_DEPLOYMENT_NAME, error_type, span)
                span.set_status(Status(StatusCode.ERROR, str(e)[:50]))
                span.record_exception(e)
                print(f"   ‚ùå {i:2d}. {req['user']:8} [{request_type}]: {type(e).__name__}: {str(e)[:50]}")
                print(f"       ‚îî‚îÄ Trace: {span.get_span_context().trace_id:032x} [ERROR]")
    
    # Summary
    total_requests = local_stats['requests']
    total_errors = local_stats['errors']
    success_rate = ((total_requests - total_errors) / total_requests * 100) if total_requests > 0 else 0
    avg_latency = sum(local_stats["latencies"]) / len(local_stats["latencies"]) if local_stats["latencies"] else 0
    
    print(f"\nüìã Telemetry Summary:")
    print("-" * 60)
    print(f"   Total requests: {total_requests}")
    print(f"   ‚úÖ Successful:   {total_requests - total_errors}")
    print(f"   ‚ùå Failed:       {total_errors}")
    print(f"   Success rate:   {success_rate:.1f}%")
    print(f"   Tokens:         {local_stats['tokens_in']} in / {local_stats['tokens_out']} out")
    print(f"   Avg latency:    {avg_latency:.0f}ms")
    
    if error_breakdown:
        print(f"\n   Error breakdown:")
        for err_type, count in sorted(error_breakdown.items()):
            print(f"      ‚Ä¢ {err_type}: {count}")
    
    print(f"\nüìä Request Type Breakdown:")
    print(f"   ‚Ä¢ Chat completions: 8 (4 errors: 429/500/502/503, 4 success)")
    print(f"   ‚Ä¢ Function calls:   1 (with tool execution)")
    print(f"   ‚Ä¢ MCP server calls: 1 (remote documentation query)")
    
    print(f"\nüì§ Telemetry automatically exported to Azure Application Insights!")
    print(f"   View traces in: Azure Portal ‚Üí Application Insights ‚Üí Transaction search")
    print(f"   View errors in: Azure Portal ‚Üí Application Insights ‚Üí Failures")
else:
    print("‚ö†Ô∏è AZURE_OPENAI_ENDPOINT or AZURE_OPENAI_API_KEY not set")

üìä Part 2: Real LLM Calls with OpenTelemetry Tracing
   Includes: Basic Chat, Function Calling, MCP Server Calls
   Error Scenarios: 429, 500, 502, 503

üìù Making LLM requests with OpenTelemetry tracing:
------------------------------------------------------------
   Phase 1: Error injection (429 ‚Üí 500 ‚Üí 502 ‚Üí 503)
   Phase 2: Successful completions
   Phase 3: Function calling & MCP server call
------------------------------------------------------------
   ‚ö†Ô∏è  1. user-1   [chat]: Rate limited (429)
       ‚îî‚îÄ Trace: 3d44b2f074dbb4c774197b2bb6adf108 [ERROR]
   ‚ùå  2. user-2   [chat]: Server error (500)
       ‚îî‚îÄ Trace: bf095fac94e201502f571d763858fdeb [ERROR]
   ‚ùå  3. user-3   [chat]: Server error (502)
       ‚îî‚îÄ Trace: 0eee1a4bca195cde7e5315d4c154521c [ERROR]
   ‚ùå  4. user-4   [chat]: Server error (503)
       ‚îî‚îÄ Trace: af10e5b5000c438701e737a6c49bf818 [ERROR]


   ‚úÖ  5. user-5   [chat]: '**Machine learning** is a branch of arti...'
       ‚îî‚îÄ Trace: 5c1db7a3909c5abe09efdd65b652141b
   ‚úÖ  6. user-6   [chat]: '**Deep learning** is a subfield of *mach...'
       ‚îî‚îÄ Trace: 0d4d770dd61a29ac9b90848983626d64
   ‚úÖ  7. user-7   [chat]: 'A **neural network** is a type of comput...'
       ‚îî‚îÄ Trace: 2980fee4052e7756c7ba720cbb6bb357
   ‚úÖ  8. user-8   [chat]: '**AI** stands for **Artificial Intellige...'
       ‚îî‚îÄ Trace: e605f9afccdfa403279419467911571a
   üîß  9. user-9   [function_call]: Tool calls=1
       ‚îî‚îÄ calculate_monthly_usage: users=50, vcpus=35, monthly_hrs=5500
       ‚îî‚îÄ Response: 'Let's break down the calculation and recommendation:

1. Tot...'
       ‚îî‚îÄ Trace: ca229987626bda22d5f1b4f9b3804c72
   üåê 10. user-10  [mcp_call]: MCP Server Response
       ‚îî‚îÄ Response: 'Given your requirement of **~8 vCPUs for stable operation** and **~5,500 compute...'
       ‚îî‚îÄ Trace: dfa83bbf35f828f8f3358e650ad59a6e


### Simulation Results on Azure Application Insights (End-to-end transaction details)

![../images/simulation_result_application_insight.png](../images/simulation_result_application_insight.png)

## Part 3: Grafana Visualization

OpenTelemetry metrics exported to Azure Monitor can be visualized in Grafana via the Azure Monitor data source.

### 3.1: Azure Managed Grafana Setup

In [64]:
# Part 3.1: Azure Managed Grafana Setup
# ======================================
# OpenTelemetry metrics exported to Azure Monitor can be visualized in Grafana

CREATE_GRAFANA = False  # Set to True to create a new Grafana workspace

print("üìä Part 3.1: Azure Managed Grafana Setup")
print("=" * 60)

print("""
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ              OpenTelemetry ‚Üí Azure Monitor ‚Üí Grafana                ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ                                                                     ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îÇ
‚îÇ  ‚îÇ Application ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ  Azure Monitor   ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ    Grafana      ‚îÇ    ‚îÇ
‚îÇ  ‚îÇ OpenTelemetry‚îÇ    ‚îÇ (App Insights)   ‚îÇ    ‚îÇ  (Dashboards)   ‚îÇ    ‚îÇ
‚îÇ  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îÇ
‚îÇ                                                                     ‚îÇ
‚îÇ  OpenTelemetry SDK exports:                                         ‚îÇ
‚îÇ  ‚Ä¢ Traces ‚Üí customEvents, dependencies, requests                    ‚îÇ
‚îÇ  ‚Ä¢ Metrics ‚Üí customMetrics                                          ‚îÇ
‚îÇ  ‚Ä¢ Logs ‚Üí traces table                                              ‚îÇ
‚îÇ                                                                     ‚îÇ
‚îÇ  Grafana queries via Azure Monitor Data Source:                     ‚îÇ
‚îÇ  ‚Ä¢ KQL queries against Log Analytics                                ‚îÇ
‚îÇ  ‚Ä¢ customMetrics | where name == "ai_requests_total"                ‚îÇ
‚îÇ                                                                     ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
""")

# Grafana configuration - check if already exists in environment
EXISTING_GRAFANA_NAME = os.environ.get("GRAFANA_NAME")
GRAFANA_NAME = EXISTING_GRAFANA_NAME or f"grafana-ot-{LOCATION[:4]}"
GRAFANA_LOCATION = os.environ.get("AZURE_LOCATION", "eastus2")


def run_az(args: list) -> str:
    """Run Azure CLI command and return output."""
    result = subprocess.run(["az"] + args, capture_output=True, text=True)
    if result.returncode != 0:
        raise Exception(f"Azure CLI error: {result.stderr}")
    return result.stdout


def get_grafana_workspace(name: str) -> dict:
    """Get existing Azure Managed Grafana workspace info."""
    try:
        output = run_az([
            "grafana", "show",
            "-g", RESOURCE_GROUP,
            "-n", name,
            "-o", "json"
        ])
        return json.loads(output)
    except Exception:
        return {}


def setup_grafana_workspace() -> dict:
    """Create Azure Managed Grafana workspace."""
    print(f"üîß Creating Azure Managed Grafana: {GRAFANA_NAME}")
    print(f"   Resource Group: {RESOURCE_GROUP}")
    print(f"   Location: {GRAFANA_LOCATION}")
    
    try:
        output = run_az([
            "grafana", "create",
            "-g", RESOURCE_GROUP,
            "-n", GRAFANA_NAME,
            "-l", GRAFANA_LOCATION,
            "-o", "json"
        ])
        grafana_info = json.loads(output)
        
        endpoint = grafana_info.get("properties", {}).get("endpoint", "N/A")
        
        print(f"   ‚úÖ Grafana workspace created!")
        print(f"   üîó Endpoint: {endpoint}")
        
        os.environ["GRAFANA_NAME"] = GRAFANA_NAME
        os.environ["GRAFANA_ENDPOINT"] = endpoint
        
        return grafana_info
    except Exception as e:
        print(f"   ‚ö†Ô∏è Could not create Grafana: {e}")
        return {}


if CREATE_GRAFANA:
    # Check if Grafana already exists in environment
    if EXISTING_GRAFANA_NAME:
        print(f"‚úÖ Found existing GRAFANA_NAME in environment: {EXISTING_GRAFANA_NAME}")
        print(f"   Skipping creation, fetching existing workspace info...")
        grafana_workspace = get_grafana_workspace(EXISTING_GRAFANA_NAME)
        if grafana_workspace:
            endpoint = grafana_workspace.get("properties", {}).get("endpoint", "N/A")
            print(f"   üîó Endpoint: {endpoint}")
            os.environ["GRAFANA_ENDPOINT"] = endpoint
        else:
            print(f"   ‚ö†Ô∏è Could not fetch workspace info. It may not exist or you lack permissions.")
    else:
        grafana_workspace = setup_grafana_workspace()
else:
    print("üí° Set CREATE_GRAFANA=True to create a new Grafana workspace.")
    print("   Or use an existing Grafana instance.")
    print("\nüìù To connect existing Grafana to Application Insights:")
    print("   1. Add Azure Monitor data source in Grafana")
    print("   2. Configure with your subscription and App Insights resource")
    print("   3. Query OpenTelemetry metrics from customMetrics table")

üìä Part 3.1: Azure Managed Grafana Setup

‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ              OpenTelemetry ‚Üí Azure Monitor ‚Üí Grafana                ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ                                                                     ‚îÇ
‚îÇ  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îÇ
‚îÇ  ‚îÇ Application ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ  Azure Monitor   ‚îÇ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ    Grafana      ‚îÇ    ‚îÇ
‚îÇ  ‚îÇ OpenTelemetry‚îÇ    ‚îÇ (App Insights)   ‚îÇ    ‚îÇ  (Dashboards)   ‚îÇ    ‚î

### 3.2: Traffic Simulation with random traffic error injector for Dashboard Metrics

Generate realistic AI workload traffic with mixed success/error patterns for Grafana visualization.

In [65]:
# Part 3.2: Traffic Simulation for Dashboard Metrics
# ===================================================
# Generate realistic AI workload traffic with OpenTelemetry tracing

import random

class TrafficErrorInjector:
    """Inject random errors for realistic traffic simulation."""
    
    def __init__(self, client: AzureOpenAI, error_rate: float = 0.2):
        self.client = client
        self.error_rate = error_rate
        self.error_types = [
            (429, "Rate limit exceeded", {"retry-after": "1"}),
            (500, "Internal server error", {}),
            (502, "Bad gateway", {}),
            (503, "Service unavailable", {}),
        ]
    
    def create_completion(self, **kwargs):
        if random.random() < self.error_rate:
            code, msg, headers = random.choice(self.error_types)
            resp = create_mock_response(code, {"error": {"message": msg}}, headers)
            if code == 429:
                raise RateLimitError(msg, response=resp, body=resp.json())
            raise APIStatusError(msg, response=resp, body=resp.json())
        
        return self.client.chat.completions.create(**kwargs)


# Simulation configuration
SIMULATION_REQUESTS = 15
ERROR_RATE = 0.25  # 25% error rate

print("üìä Part 3.2: Traffic Simulation for Grafana")
print("=" * 60)

if AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_API_KEY:
    # Create traffic-specific tracer and meter
    traffic_tracer = trace.get_tracer("ai_traffic_simulator")
    traffic_meter = metrics.get_meter("ai_traffic_simulator")
    
    # Create metrics instruments for traffic simulation
    traffic_requests = traffic_meter.create_counter("traffic_requests_total", description="Traffic simulation requests")
    traffic_tokens = traffic_meter.create_counter("traffic_tokens_total", description="Traffic simulation tokens")
    traffic_errors = traffic_meter.create_counter("traffic_errors_total", description="Traffic simulation errors")
    traffic_latency = traffic_meter.create_histogram("traffic_latency_ms", description="Traffic simulation latency")
    
    traffic_injector = TrafficErrorInjector(client, error_rate=ERROR_RATE)
    
    prompts = [
        "What is Python?", "Explain REST APIs", "What is Docker?",
        "Define microservices", "What is Kubernetes?", "Explain CI/CD",
        "What is serverless?", "Define cloud native", "What is DevOps?",
        "Explain infrastructure as code", "What is observability?",
        "Define SRE", "What is chaos engineering?", "Explain blue-green deployment",
        "What is canary release?",
    ]
    
    print(f"\nüìù Simulating {SIMULATION_REQUESTS} requests ({ERROR_RATE*100:.0f}% error rate):")
    print("-" * 60)
    
    traffic_stats = {"success": 0, "failed": 0, "errors": {}, "latencies": []}
    
    for i in range(SIMULATION_REQUESTS):
        prompt = prompts[i % len(prompts)]
        request_id = f"sim-{uuid.uuid4().hex[:8]}"
        
        with traffic_tracer.start_as_current_span("traffic_request") as span:
            span.set_attribute("request_id", request_id)
            span.set_attribute("request_index", i + 1)
            
            try:
                with traffic_tracer.start_as_current_span("llm_call") as llm_span:
                    start = time.time()
                    response = traffic_injector.create_completion(
                        model=AZURE_OPENAI_CHAT_DEPLOYMENT_NAME,
                        messages=[{"role": "user", "content": prompt}],
                        max_tokens=20,
                    )
                    latency_ms = (time.time() - start) * 1000
                    
                    llm_span.set_attribute("latency_ms", latency_ms)
                    llm_span.set_attribute("model", AZURE_OPENAI_CHAT_DEPLOYMENT_NAME)
                    llm_span.set_attribute("tokens.input", response.usage.prompt_tokens)
                    llm_span.set_attribute("tokens.output", response.usage.completion_tokens)
                    
                    # Record metrics
                    traffic_requests.add(1, {"model": AZURE_OPENAI_CHAT_DEPLOYMENT_NAME, "status": "success"})
                    traffic_tokens.add(response.usage.total_tokens, {"model": AZURE_OPENAI_CHAT_DEPLOYMENT_NAME})
                    traffic_latency.record(latency_ms, {"model": AZURE_OPENAI_CHAT_DEPLOYMENT_NAME})
                
                traffic_stats["success"] += 1
                traffic_stats["latencies"].append(latency_ms)
                status = "‚úÖ"
                msg = f"{latency_ms:.0f}ms"
                span.set_status(Status(StatusCode.OK))
                
            except (RateLimitError, APIStatusError) as e:
                error_type = "429" if isinstance(e, RateLimitError) else str(e.status_code)
                traffic_stats["failed"] += 1
                traffic_stats["errors"][error_type] = traffic_stats["errors"].get(error_type, 0) + 1
                
                # Record error metrics
                traffic_errors.add(1, {"model": AZURE_OPENAI_CHAT_DEPLOYMENT_NAME, "error_type": error_type})
                
                span.set_status(Status(StatusCode.ERROR, f"HTTP {error_type}"))
                span.set_attribute("error.type", error_type)
                
                status = "‚ùå" if error_type != "429" else "‚ö†Ô∏è"
                msg = f"Error {error_type}"
        
        # Progress bar style output
        progress = (i + 1) / SIMULATION_REQUESTS
        bar = "‚ñà" * int(progress * 20) + "‚ñë" * (20 - int(progress * 20))
        print(f"   [{bar}] {i+1}/{SIMULATION_REQUESTS} {status} {msg}")
    
    # Summary
    print(f"\nüìã Traffic Simulation Summary:")
    print("-" * 60)
    success_rate = traffic_stats["success"] / SIMULATION_REQUESTS * 100
    avg_latency = sum(traffic_stats["latencies"]) / len(traffic_stats["latencies"]) if traffic_stats["latencies"] else 0
    
    print(f"   Total requests:  {SIMULATION_REQUESTS}")
    print(f"   ‚úÖ Successful:    {traffic_stats['success']} ({success_rate:.1f}%)")
    print(f"   ‚ùå Failed:        {traffic_stats['failed']}")
    print(f"   ‚è±Ô∏è  Avg latency:   {avg_latency:.0f}ms")
    
    if traffic_stats["errors"]:
        print(f"\n   Error breakdown:")
        for err, count in sorted(traffic_stats["errors"].items()):
            print(f"      ‚Ä¢ {err}: {count}")
    
    print("\n‚úÖ Traffic simulation complete!")
    print("   ‚Üí Telemetry automatically exported to Azure Monitor via OpenTelemetry")
else:
    print("‚ö†Ô∏è AZURE_OPENAI_ENDPOINT or AZURE_OPENAI_API_KEY not set")

üìä Part 3.2: Traffic Simulation for Grafana

üìù Simulating 15 requests (25% error rate):
------------------------------------------------------------


   [‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 1/15 ‚úÖ 925ms
   [‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 2/15 ‚úÖ 1030ms
   [‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 3/15 ‚úÖ 964ms
   [‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 4/15 ‚úÖ 1493ms
   [‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 5/15 ‚úÖ 1093ms
   [‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 6/15 ‚ùå Error 503
   [‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 7/15 ‚úÖ 877ms
   [‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 8/15 ‚ùå Error 502
   [‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 9/15 ‚úÖ 880ms
   [‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 10/15 ‚úÖ 1216ms
   [‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë] 11/15 ‚úÖ 878ms
   [‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë] 12/15 ‚úÖ 909ms
   [‚ñà‚ñà‚ñà

### 3.3: Grafana Dashboard JSON

Export dashboard configuration for import into Azure Managed Grafana.

In [69]:
# Part 3.3: Grafana Dashboard JSON Export
# ========================================
# Dashboard configured with actual Azure Monitor data source and OpenTelemetry metrics

print("üìä Part 3.3: Grafana Dashboard Export")
print("=" * 60)

# Get subscription ID
SUBSCRIPTION_ID = os.environ.get("AZURE_SUBSCRIPTION_ID", config.get("AZURE_SUBSCRIPTION_ID", ""))

# Try to get Application Insights name from various sources
APPLICATIONINSIGHTS_RESOURCE_ID = os.environ.get("APPLICATIONINSIGHTS_RESOURCE_ID", "")

print(f"üìå Configuration:")
print(f"   Subscription ID: {SUBSCRIPTION_ID[:2]}...{SUBSCRIPTION_ID[-4:] if len(SUBSCRIPTION_ID) > 12 else SUBSCRIPTION_ID}")
print(f"   App Insights Resource ID: {APPLICATIONINSIGHTS_RESOURCE_ID[:4]}")

# KQL Queries - use actual newlines for Grafana compatibility
QUERIES = {
    "total_requests": "customMetrics\n| where name == 'ai_requests_total'\n| summarize total = sum(value) by bin(timestamp, 1m)\n| order by timestamp desc\n| take 1",
    "success_rate": "customMetrics\n| where name == 'ai_requests_total'\n| extend status = tostring(customDimensions['status'])\n| summarize success = countif(status == 'success'), total = count()\n| project success_rate = round(100.0 * success / total, 2)",
    "avg_latency": "customMetrics\n| where name == 'ai_latency_ms'\n| summarize avg_latency = avg(value)",
    "total_tokens": "customMetrics\n| where name == 'ai_tokens_total'\n| summarize total_tokens = sum(value)",
    "latency_over_time": "customMetrics\n| where name == 'ai_latency_ms'\n| summarize avg_latency = avg(value), p95_latency = percentile(value, 95) by bin(timestamp, 1m)\n| order by timestamp asc",
    "status_distribution": "customMetrics\n| where name == 'ai_requests_total'\n| extend status = tostring(customDimensions['status'])\n| summarize count = count() by status",
    "error_distribution": "customMetrics\n| where name == 'ai_errors_total'\n| extend error_type = tostring(customDimensions['error_type'])\n| summarize count = count() by error_type\n| order by count desc",
    "requests_over_time": "customMetrics\n| where name == 'ai_requests_total'\n| extend status = tostring(customDimensions['status'])\n| summarize count = count() by status, bin(timestamp, 1m)\n| order by timestamp asc"
}

# Build the dashboard - use ${datasource} variable reference for flexibility
GRAFANA_DASHBOARD = {
    "dashboard": {
        "id": None,
        "uid": "ai-gateway-otel",
        "title": "AI Gateway Observability Dashboard (OpenTelemetry)",
        "tags": ["ai", "gateway", "openai", "azure", "opentelemetry"],
        "timezone": "browser",
        "schemaVersion": 39,
        "refresh": "30s",
        "time": {"from": "now-1h", "to": "now"},
        "panels": [
            {
                "id": 1,
                "title": "Total Requests",
                "type": "stat",
                "gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},
                "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                "fieldConfig": {
                    "defaults": {
                        "color": {"mode": "thresholds"},
                        "thresholds": {"mode": "absolute", "steps": [{"color": "blue", "value": None}]}
                    }
                },
                "options": {
                    "colorMode": "value",
                    "graphMode": "area",
                    "justifyMode": "auto",
                    "orientation": "auto",
                    "textMode": "auto",
                    "reduceOptions": {"calcs": ["sum"], "fields": "", "values": False}
                },
                "targets": [{
                    "refId": "A",
                    "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                    "queryType": "Azure Log Analytics",
                    "azureLogAnalytics": {
                        "resources": [APPLICATIONINSIGHTS_RESOURCE_ID],
                        "query": QUERIES["total_requests"],
                        "resultFormat": "table"
                    }
                }]
            },
            {
                "id": 2,
                "title": "Success Rate",
                "type": "gauge",
                "gridPos": {"h": 4, "w": 6, "x": 6, "y": 0},
                "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                "fieldConfig": {
                    "defaults": {
                        "max": 100,
                        "min": 0,
                        "unit": "percent",
                        "color": {"mode": "thresholds"},
                        "thresholds": {
                            "mode": "absolute",
                            "steps": [
                                {"color": "red", "value": None},
                                {"color": "yellow", "value": 80},
                                {"color": "green", "value": 95}
                            ]
                        }
                    }
                },
                "options": {
                    "reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": False},
                    "showThresholdLabels": False,
                    "showThresholdMarkers": True
                },
                "targets": [{
                    "refId": "A",
                    "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                    "queryType": "Azure Log Analytics",
                    "azureLogAnalytics": {
                        "resources": [APPLICATIONINSIGHTS_RESOURCE_ID],
                        "query": QUERIES["success_rate"],
                        "resultFormat": "table"
                    }
                }]
            },
            {
                "id": 3,
                "title": "Avg Latency",
                "type": "stat",
                "gridPos": {"h": 4, "w": 6, "x": 12, "y": 0},
                "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                "fieldConfig": {
                    "defaults": {
                        "unit": "ms",
                        "color": {"mode": "thresholds"},
                        "thresholds": {
                            "mode": "absolute",
                            "steps": [
                                {"color": "green", "value": None},
                                {"color": "yellow", "value": 500},
                                {"color": "red", "value": 1000}
                            ]
                        }
                    }
                },
                "options": {
                    "colorMode": "value",
                    "graphMode": "area",
                    "justifyMode": "auto",
                    "orientation": "auto",
                    "textMode": "auto",
                    "reduceOptions": {"calcs": ["mean"], "fields": "", "values": False}
                },
                "targets": [{
                    "refId": "A",
                    "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                    "queryType": "Azure Log Analytics",
                    "azureLogAnalytics": {
                        "resources": [APPLICATIONINSIGHTS_RESOURCE_ID],
                        "query": QUERIES["avg_latency"],
                        "resultFormat": "table"
                    }
                }]
            },
            {
                "id": 4,
                "title": "Total Tokens",
                "type": "stat",
                "gridPos": {"h": 4, "w": 6, "x": 18, "y": 0},
                "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                "fieldConfig": {
                    "defaults": {
                        "color": {"mode": "thresholds"},
                        "thresholds": {"mode": "absolute", "steps": [{"color": "purple", "value": None}]}
                    }
                },
                "options": {
                    "colorMode": "value",
                    "graphMode": "area",
                    "justifyMode": "auto",
                    "orientation": "auto",
                    "textMode": "auto",
                    "reduceOptions": {"calcs": ["sum"], "fields": "", "values": False}
                },
                "targets": [{
                    "refId": "A",
                    "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                    "queryType": "Azure Log Analytics",
                    "azureLogAnalytics": {
                        "resources": [APPLICATIONINSIGHTS_RESOURCE_ID],
                        "query": QUERIES["total_tokens"],
                        "resultFormat": "table"
                    }
                }]
            },
            {
                "id": 5,
                "title": "Latency Over Time",
                "type": "timeseries",
                "gridPos": {"h": 8, "w": 12, "x": 0, "y": 4},
                "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                "fieldConfig": {
                    "defaults": {
                        "unit": "ms",
                        "color": {"mode": "palette-classic"},
                        "custom": {
                            "lineWidth": 2,
                            "fillOpacity": 20,
                            "showPoints": "auto",
                            "spanNulls": False
                        }
                    }
                },
                "options": {
                    "legend": {"displayMode": "list", "placement": "bottom", "showLegend": True},
                    "tooltip": {"mode": "single", "sort": "none"}
                },
                "targets": [{
                    "refId": "A",
                    "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                    "queryType": "Azure Log Analytics",
                    "azureLogAnalytics": {
                        "resources": [APPLICATIONINSIGHTS_RESOURCE_ID],
                        "query": QUERIES["latency_over_time"],
                        "resultFormat": "time_series"
                    }
                }]
            },
            {
                "id": 6,
                "title": "Request Status Distribution",
                "type": "piechart",
                "gridPos": {"h": 8, "w": 12, "x": 12, "y": 4},
                "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                "fieldConfig": {
                    "defaults": {"color": {"mode": "palette-classic"}},
                    "overrides": [
                        {
                            "matcher": {"id": "byName", "options": "success"},
                            "properties": [{"id": "color", "value": {"fixedColor": "green", "mode": "fixed"}}]
                        },
                        {
                            "matcher": {"id": "byName", "options": "error"},
                            "properties": [{"id": "color", "value": {"fixedColor": "red", "mode": "fixed"}}]
                        }
                    ]
                },
                "options": {
                    "legend": {"displayMode": "list", "placement": "right", "showLegend": True},
                    "pieType": "donut",
                    "reduceOptions": {"calcs": ["sum"], "fields": "", "values": False},
                    "tooltip": {"mode": "single", "sort": "none"}
                },
                "targets": [{
                    "refId": "A",
                    "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                    "queryType": "Azure Log Analytics",
                    "azureLogAnalytics": {
                        "resources": [APPLICATIONINSIGHTS_RESOURCE_ID],
                        "query": QUERIES["status_distribution"],
                        "resultFormat": "table"
                    }
                }]
            },
            {
                "id": 7,
                "title": "Error Distribution by Type",
                "type": "barchart",
                "gridPos": {"h": 6, "w": 12, "x": 0, "y": 12},
                "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                "fieldConfig": {
                    "defaults": {
                        "color": {"mode": "palette-classic"},
                        "custom": {"fillOpacity": 80}
                    }
                },
                "options": {
                    "legend": {"displayMode": "list", "placement": "bottom", "showLegend": True},
                    "orientation": "horizontal",
                    "xTickLabelRotation": 0
                },
                "targets": [{
                    "refId": "A",
                    "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                    "queryType": "Azure Log Analytics",
                    "azureLogAnalytics": {
                        "resources": [APPLICATIONINSIGHTS_RESOURCE_ID],
                        "query": QUERIES["error_distribution"],
                        "resultFormat": "table"
                    }
                }]
            },
            {
                "id": 8,
                "title": "Requests Over Time",
                "type": "timeseries",
                "gridPos": {"h": 6, "w": 12, "x": 12, "y": 12},
                "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                "fieldConfig": {
                    "defaults": {
                        "color": {"mode": "palette-classic"},
                        "custom": {
                            "lineWidth": 2,
                            "fillOpacity": 20,
                            "showPoints": "auto",
                            "stacking": {"mode": "normal", "group": "A"}
                        }
                    },
                    "overrides": [
                        {
                            "matcher": {"id": "byName", "options": "success"},
                            "properties": [{"id": "color", "value": {"fixedColor": "green", "mode": "fixed"}}]
                        },
                        {
                            "matcher": {"id": "byName", "options": "error"},
                            "properties": [{"id": "color", "value": {"fixedColor": "red", "mode": "fixed"}}]
                        }
                    ]
                },
                "options": {
                    "legend": {"displayMode": "list", "placement": "bottom", "showLegend": True},
                    "tooltip": {"mode": "multi", "sort": "none"}
                },
                "targets": [{
                    "refId": "A",
                    "datasource": {"type": "grafana-azure-monitor-datasource", "uid": "${datasource}"},
                    "queryType": "Azure Log Analytics",
                    "azureLogAnalytics": {
                        "resources": [APPLICATIONINSIGHTS_RESOURCE_ID],
                        "query": QUERIES["requests_over_time"],
                        "resultFormat": "time_series"
                    }
                }]
            }
        ],
        "templating": {
            "list": [
                {
                    "name": "datasource",
                    "type": "datasource",
                    "label": "Azure Monitor",
                    "query": "grafana-azure-monitor-datasource",
                    "current": {},
                    "hide": 0
                }
            ]
        }
    },
    "overwrite": True
}

print("\nüìã Dashboard Panels:")
for panel in GRAFANA_DASHBOARD["dashboard"]["panels"]:
    print(f"   {panel['id']}. {panel['title']} ({panel['type']})")

# Save to file - Grafana UI import needs the dashboard object directly (not wrapped)
dashboard_file = "grafana_dashboard.json"
with open(dashboard_file, 'w') as f:
    # Export only the dashboard object, not the wrapper
    json.dump(GRAFANA_DASHBOARD["dashboard"], f, indent=2)

print(f"\n‚úÖ Dashboard JSON saved to: {dashboard_file}")

print("\nüí° To import into Azure Managed Grafana:")
print(f"   1. Open Grafana: {os.environ.get('GRAFANA_ENDPOINT', 'https://<your-grafana>.grafana.azure.com')}")
print("   2. Go to: Dashboards ‚Üí Import")
print("   3. Upload the JSON file")
print("   4. Select Azure Monitor data source when prompted")

print("\nüìù Sample KQL query to test in Application Insights:")
print("   customMetrics | where name startswith 'ai_' | take 10")

üìä Part 3.3: Grafana Dashboard Export
üìå Configuration:
   Subscription ID: 3d...c6ca
   App Insights Resource ID: /sub

üìã Dashboard Panels:
   1. Total Requests (stat)
   2. Success Rate (gauge)
   3. Avg Latency (stat)
   4. Total Tokens (stat)
   5. Latency Over Time (timeseries)
   6. Request Status Distribution (piechart)
   7. Error Distribution by Type (barchart)
   8. Requests Over Time (timeseries)

‚úÖ Dashboard JSON saved to: grafana_dashboard.json

üí° To import into Azure Managed Grafana:
   1. Open Grafana: https://grafana-obs-swed-gecwcfc4fbamhzae.cse.grafana.azure.com
   2. Go to: Dashboards ‚Üí Import
   3. Upload the JSON file
   4. Select Azure Monitor data source when prompted

üìù Sample KQL query to test in Application Insights:
   customMetrics | where name startswith 'ai_' | take 10


### Simulation Results on Azure Managed Grafana Dashboard

![../images/simulation_result_grafana.png](../images/simulation_result_grafana.png)

## Best Practices Summary

### OpenTelemetry + Azure Application Insights Integration

| Component | OpenTelemetry API | Azure Monitor Destination |
|-----------|-------------------|---------------------------|
| **Traces** | `tracer.start_as_current_span()` | requests, dependencies |
| **Metrics** | `Counter`, `Histogram` | customMetrics |
| **Logs** | `logging` module | traces table |

### Key OpenTelemetry Patterns

```python
# 1. Azure Monitor setup (required)
from azure.monitor.opentelemetry import configure_azure_monitor

CONNECTION_STRING = os.environ.get("APPLICATIONINSIGHTS_CONNECTION_STRING")
if not CONNECTION_STRING:
    raise ValueError("APPLICATIONINSIGHTS_CONNECTION_STRING is required")

configure_azure_monitor(
    connection_string=CONNECTION_STRING,
    enable_live_metrics=True,
)

# 2. Get tracer and meter instances
from opentelemetry import trace, metrics

tracer = trace.get_tracer("service_name", "1.0.0")
meter = metrics.get_meter("service_name", "1.0.0")

# 3. Tracing with context
with tracer.start_as_current_span("operation", attributes={"key": "value"}) as span:
    span.set_attribute("ai.model", "gpt-4o")
    # ... operation ...

# 4. Metrics collection
request_counter = meter.create_counter("ai_requests_total")
latency_histogram = meter.create_histogram("ai_latency_ms")

request_counter.add(1, {"model": "gpt-4o", "status": "success"})
latency_histogram.record(latency_ms, {"model": "gpt-4o"})

# 5. Error recording
from opentelemetry.trace import Status, StatusCode
span.set_status(Status(StatusCode.ERROR, "Error message"))
span.record_exception(exception)
```



## Cleanup Resources

If you created Azure resources (Grafana, Log Analytics, etc.) during this lab, you can clean them up to avoid ongoing costs.

In [None]:
# Cleanup Azure Resources
# =======================
# Uncomment and run this cell to delete resources created during this lab

DELETE_RESOURCES = False  # Set to True to delete resources


def delete_grafana_workspace() -> None:
    """Delete the Azure Managed Grafana workspace."""
    grafana_name = os.environ.get('GRAFANA_NAME', GRAFANA_NAME if 'GRAFANA_NAME' in dir() else '')
    
    if not grafana_name or not RESOURCE_GROUP:
        print("‚ö†Ô∏è Grafana name or resource group not found.")
        return
    
    print(f"üóëÔ∏è Deleting Azure Managed Grafana: {grafana_name}")
    try:
        run_az([
            "grafana", "delete",
            "-g", RESOURCE_GROUP,
            "-n", grafana_name,
            "--yes"
        ])
        print(f"   ‚úÖ Grafana workspace deleted: {grafana_name}")
    except Exception as e:
        print(f"   ‚ö†Ô∏è Could not delete Grafana: {e}")


print("üßπ Resource Cleanup")
print("=" * 60)

if DELETE_RESOURCES:
    print("\n‚ö†Ô∏è WARNING: This will delete the following resources:")
    print(f"   - Azure Managed Grafana: {os.environ.get('GRAFANA_NAME', 'N/A')}")
    print("\nüîÑ Proceeding with cleanup...")
    
    delete_grafana_workspace()
    
    print("\n‚úÖ Cleanup complete!")
else:
    print("\n‚ÑπÔ∏è Set DELETE_RESOURCES=True to delete Azure resources.")
    print("   Resources to clean up:")
    print(f"   - Azure Managed Grafana: {os.environ.get('GRAFANA_NAME', 'N/A')}")
    print("\nüí° Tip: You can also delete resources from the Azure Portal.")

## Wrap-up

### What You Learned

1. **OpenTelemetry Integration**: Use `azure-monitor-opentelemetry` for one-line Azure Monitor setup
2. **Distributed Tracing**: Track requests with `tracer.start_as_current_span()` context managers
3. **Metrics Collection**: Use `Counter` and `Histogram` instruments for custom metrics
4. **Error Handling**: Record errors with `Status`, `StatusCode`, and `span.record_exception()`
5. **Azure Monitor**: Automatic export of traces, metrics, and logs to Application Insights
6. **Grafana Dashboards**: Query OpenTelemetry metrics via KQL in customMetrics table

### Observability Architecture

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                 OpenTelemetry Observability Stack                   ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ   Application   ‚îÇ      Transport       ‚îÇ      Visualization         ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ OpenTelemetry   ‚îÇ azure-monitor-       ‚îÇ ‚Ä¢ Azure Portal             ‚îÇ
‚îÇ ‚Ä¢ Tracer        ‚îÇ   opentelemetry      ‚îÇ ‚Ä¢ Application Insights     ‚îÇ
‚îÇ ‚Ä¢ Meter         ‚îÇ ‚Ä¢ Auto-export        ‚îÇ ‚Ä¢ Log Analytics            ‚îÇ
‚îÇ ‚Ä¢ Logger        ‚îÇ ‚Ä¢ Batching           ‚îÇ ‚Ä¢ Azure Managed Grafana    ‚îÇ
‚îÇ                 ‚îÇ ‚Ä¢ Retry              ‚îÇ ‚Ä¢ Custom Dashboards        ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### Additional Resources

- [Azure Monitor OpenTelemetry Distro](https://learn.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-enable)
- [OpenTelemetry Python](https://opentelemetry.io/docs/instrumentation/python/)
- [Azure Managed Grafana](https://learn.microsoft.com/en-us/azure/managed-grafana/overview)
- [KQL for Application Insights](https://learn.microsoft.com/en-us/azure/azure-monitor/logs/get-started-queries)

### Related Notebooks

- **3_reliability/**: Retry, rate limiting, timeout management
- **2_workload_optimization/5_caching_strategies.ipynb**: Redis caching strategies