# üõçÔ∏è | Cora-For-Zava: Introduction to OpenTelemetry and Tracing

Welcome! This notebook provides beginner-friendly explanations of key observability concepts you'll encounter when working with AI agents.

## üõí Our Zava Scenario

**Cora** is a customer service chatbot for **Zava** - a fictitious retailer of home improvement goods for DIY enthusiasts. To ensure Cora provides reliable service in production, we need to observe and monitor how it processes customer requests, calls tools, and generates responses. This introduction helps you understand the fundamental concepts of OpenTelemetry and distributed tracing before diving into hands-on agent tracing.

## üéØ What You'll Learn

- What OpenTelemetry is and why it matters
- Core tracing concepts: traces, spans, and attributes
- How tracing applies to AI agents like Cora
- GenAI semantic conventions for standardized telemetry
- How to think about observability in agentic workflows

## üí° Why This Matters

Before diving into the hands-on agent tracing notebooks, understanding these concepts will help you:
- Instrument your AI applications effectively
- Troubleshoot issues in production
- Understand performance bottlenecks
- Monitor agent behavior and tool usage

Ready to learn about observability? Let's begin! üöÄ

---

## What is OpenTelemetry?

OpenTelemetry is an open-source framework for collecting telemetry data (metrics, logs, and traces) from your applications. Think of it as a standardized way to answer: "What is my application doing right now, and how is it performing?"

**Key benefits:**
- **Vendor-neutral**: Works with many monitoring tools, not locked to one provider
- **Standardized**: Uses common terminology and data formats
- **Built-in**: Supported natively in modern cloud platforms like Azure

---

## Core Tracing Concepts

### 1. Trace

A **trace** represents the complete journey of a single request through your system.

**Example**: When a customer asks Cora "What paint do you have?", the trace captures:
- The customer's question arriving
- The agent deciding which tools to call
- Each tool execution (product lookup, inventory check)
- The final response generation

Think of a trace as the full story of one customer interaction.

### 2. Span

A **span** is a single operation within a trace. Spans can be nested (parent-child relationships) to show how work flows through your system.

**Example spans in the agent notebooks:**
- `zava_customer_session` - The overall customer interaction (parent span)
- `invoke_agent` - Agent processing the request (child span)
- `execute_tool: get_product_info` - Looking up product details (child span)
- `execute_tool: calculate_discount` - Computing discount (child span)

**Analogy**: If a trace is a recipe, spans are the individual steps (preheat oven, mix ingredients, bake).

### 3. Attributes

**Attributes** are key-value pairs that provide context about what happened in a span.

**Examples from the agent notebooks:**
```python
span.set_attribute("user.request", "What paint do you have?")
span.set_attribute("customer.tier", "gold")
span.set_attribute("agent.name", "Cora")
span.set_attribute("gen_ai.request.model", "gpt-4o-mini")
```

Attributes let you filter and analyze traces: "Show me all failed requests from Gold tier customers."

### 4. Trace ID

Every trace gets a unique identifier (TraceID). This lets you:
- Find all spans belonging to one customer interaction
- Correlate logs and metrics with traces
- Debug specific issues by trace ID

---

## Azure AI Foundry Tracing Features

Azure AI Foundry provides specialized tracing for AI agents with these capabilities:

### Automatic Instrumentation

When you use `OpenAIAgentsInstrumentor().instrument()`, the framework automatically captures:
- **Agent creation**: Model, instructions, tools configured
- **Agent invocations**: User messages, system prompts, reasoning steps
- **Tool executions**: Which tools were called, with what parameters, what they returned
- **Model calls**: Token usage, latency, responses

You don't need to manually create spans for these operations.

### GenAI Semantic Conventions

OpenTelemetry defines standard attribute names for AI applications. The agent notebooks use:

**For agents:**
- `gen_ai.provider.name` - Which AI provider (e.g., "azure.ai.openai")
- `gen_ai.request.model` - Model name (e.g., "gpt-4o-mini")
- `agent.name` - Agent identifier

**For operations:**
- `user.request` - Customer's question
- `agent.response` - Agent's answer
- `request.success` - Did it work (true/false)

These conventions ensure your traces are readable across different tools.

### Application Insights Integration

**Application Insights** is Azure's monitoring service. When you configure:
```python
APPLICATION_INSIGHTS_CONNECTION_STRING = "..."
```

Your traces automatically flow to Azure Monitor where you can:
- **Search traces**: Find specific customer interactions
- **Build dashboards**: Visualize agent performance over time
- **Set alerts**: Get notified when errors spike
- **Analyze trends**: Track response times, tool usage patterns

---

## How Tracing Helps You

### 1. Debugging

When something goes wrong, traces show you exactly what happened:
```
Customer asked about paint
  ‚úì Agent invoked successfully
  ‚úì Tool: get_product_info("PFIP000002") ‚Üí Found product
  ‚úó Tool: calculate_discount("gold", 200) ‚Üí Error: Invalid tier format
  ‚úó Agent failed to generate response
```

You can see the tool call failed and why.

### 2. Performance Optimization

Traces show how long each operation takes:
```
Total request: 2.3 seconds
  - Agent reasoning: 0.8 seconds
  - Tool: get_product_info: 0.1 seconds
  - Tool: check_inventory: 1.2 seconds ‚Üê SLOW!
  - Response generation: 0.2 seconds
```

Now you know to optimize the inventory check.

### 3. Understanding Agent Behavior

Traces reveal how your agent makes decisions:
- Which tools does it call most often?
- Does it call tools in the right order?
- Are some tools never used?
- How does it handle ambiguous questions?

This helps you refine instructions and tool configurations.

---

## Best Practices

### 1. Use Descriptive Span Names

```python
# Good
with tracer.start_as_current_span("calculate_gold_tier_discount"):

# Less helpful
with tracer.start_as_current_span("calc"):
```

### 2. Add Meaningful Attributes

```python
# Good - provides context
span.set_attribute("customer.tier", "gold")
span.set_attribute("cart.value", 200.0)
span.set_attribute("discount.amount", 30.0)

# Less useful
span.set_attribute("data", "some value")
```

### 3. Don't Log Sensitive Data

```python
# Bad - contains PII
span.set_attribute("customer.email", "john@example.com")
span.set_attribute("customer.credit_card", "1234-5678...")

# Good - use IDs
span.set_attribute("customer.id", "cust_12345")
```

### 4. Set Trace Sampling for Production

In production, you may not want to trace every request (too much data). Configure sampling:
```python
# Trace 10% of requests
sampler = TraceIdRatioBased(0.1)
```

---

## Terminology Quick Reference

| Term | Simple Definition |
|------|-------------------|
| **Trace** | The full story of one request through your system |
| **Span** | A single step or operation within a trace |
| **Attribute** | Extra information about what happened (key-value pair) |
| **Tracer** | The object you use to create spans |
| **Exporter** | Sends trace data to a monitoring system |
| **Instrumentation** | Code that automatically creates spans |
| **Semantic Convention** | Standard names for common attributes |
| **Distributed Tracing** | Following a request across multiple services |

---

## What's Next?

Now that you understand the core concepts, you're ready to explore the hands-on tracing notebooks:

### Getting Started with Tracing

- **`50-trace-agent-session.ipynb`** - Learn how to emit OpenTelemetry spans for agent workflows
- **`50-collect-span-snapshots.ipynb`** - Capture and inspect spans locally before exporting

### Framework-Specific Examples

- **`51-openai-retailer-chatbot.ipynb`** - Build Cora retail agent with OpenAI Agents and full telemetry
- **`51-openai-weekend-planner.ipynb`** - Weekend planning agent with OpenAI framework
- **`51-trace-cora-retail-agent.ipynb`** - Complete Cora implementation with tracing
- **`52-langchain-weekend-planner.ipynb`** - LangChain implementation with tracing
- **`53-langgraph-music_router.ipynb`** - LangGraph routing example with observability

---

## Further Reading

For deeper understanding:

- **[OpenTelemetry Concepts](https://opentelemetry.io/docs/concepts/)** - Official OTel documentation
- **[Azure AI Foundry Tracing Guide](https://learn.microsoft.com/azure/ai-foundry/how-to/develop/trace-agents-sdk)** - Azure-specific tracing setup
- **[Application Insights Documentation](https://learn.microsoft.com/azure/azure-monitor/app/distributed-tracing)** - Using Azure Monitor for traces
- **[GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/)** - Standard attributes for AI applications

---

Ready to get started? Open one of the hands-on notebooks above to begin tracing your AI agents! üöÄ