# Step 11 — OpenTelemetry Export

**What we built**: An `OtelModule` that creates a root `arcllm.invoke` span per call with GenAI semantic convention attributes, plus deep integration in `BaseModule` giving every module `_tracer` and `_span()` helpers for child spans.

**Why it matters**: Thousands of autonomous agents need distributed tracing to debug latency, identify bottlenecks, and understand cross-service request flows. OTel is the industry standard — Jaeger, Datadog, Grafana all consume it natively.

**Key decisions**:
- **D-079**: Deep integration — `BaseModule` + per-module spans (full trace waterfall showing retry attempts, fallback hops, rate-limit waits)
- **D-080**: `opentelemetry-api` is a core dependency (~100KB, zero-cost no-op when SDK not configured)
- **D-081**: OtelModule creates root span, auto-nests under parent when agent framework provides OTel context
- **D-082**: GenAI semantic conventions (`gen_ai.*`) + custom `arcllm.*` attributes
- **D-083**: Config-driven SDK setup via `[modules.otel]` TOML section
- **D-084**: SDK + exporters are optional extras (`pip install arcllm[otel]`), clean error if missing
- **D-085**: TelemetryModule (structured logs) and OtelModule (distributed traces) coexist — different pillars
- **D-086**: `_span()` context manager in BaseModule — each module wraps its logic explicitly
- **D-087**: Record exceptions as events; ERROR status only on final failure
- **D-088**: Full enterprise config — auth headers, mTLS, batch tuning, resource attributes

**Stack position**: `Otel → Telemetry → Audit → Security → Retry → Fallback → RateLimit → Adapter`

In [None]:
# Setup: ensure arcllm is importable
import sys, os
sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '..', 'src')))

---
## 1. The Two-Layer Architecture

OTel integration lives in **two places**:

| Layer | What | Purpose |
|-------|------|---------|
| `OtelModule` | Root span per `invoke()` + SDK setup | Entry point — GenAI attributes, config-driven exporter |
| `BaseModule._span()` | Child spans in every module | Waterfall — retry attempts, fallback hops, rate-limit waits |

```
OtelModule [arcllm.invoke]           ← root span with gen_ai.* attributes
  └─ TelemetryModule                 ← (no span — uses structured logs)
      └─ AuditModule                 ← (no span — uses structured logs)
          └─ SecurityModule [security]              ← child span
              ├─ [security.pii_redact_outbound]     ← grandchild span
              ├─ [security.pii_redact_inbound]      ← grandchild span
              └─ [security.sign]                    ← grandchild span
              └─ RetryModule [retry.attempt.1]      ← child span
                  └─ FallbackModule                 ← (delegates)
                      └─ RateLimitModule            ← (token bucket)
                          └─ Adapter (HTTP call)
```

### API vs SDK — Why Two Packages?

| Package | Size | Purpose | When installed |
|---------|------|---------|----------------|
| `opentelemetry-api` | ~100KB | Tracer/Span interfaces, `NonRecordingSpan` | Always (core dep) |
| `opentelemetry-sdk` | ~1MB | `TracerProvider`, samplers, exporters, processors | Only with `pip install arcllm[otel]` |

When only the API is installed, all spans are `NonRecordingSpan` — zero overhead. The `_span()` context manager still works, it just does nothing.

---
## 2. BaseModule — `_tracer` and `_span()` Helper

In [None]:
from arcllm.modules.base import BaseModule
from opentelemetry import trace
import inspect

print("=== _tracer property ===")
print(inspect.getsource(BaseModule._tracer.fget))

print("\n=== _span() context manager ===")
print(inspect.getsource(BaseModule._span))

Key design points:

1. **`_tracer`** returns `trace.get_tracer("arcllm")` — all modules share one tracer, scoped to the library
2. **`_span()`** is a context manager that:
   - Creates a span via `start_as_current_span()` (auto-nests under parent)
   - Records exceptions and sets ERROR status on unhandled errors
   - Re-raises — never swallows exceptions
3. Without SDK configured, `_tracer` returns a no-op tracer and spans are `NonRecordingSpan` — zero overhead

In [None]:
from unittest.mock import AsyncMock, MagicMock
from arcllm.types import LLMProvider, LLMResponse, Message, Usage

# Without SDK, spans are NonRecordingSpan (zero cost)
inner = MagicMock(spec=LLMProvider)
inner.name = "test"
inner.model_name = "test-model"

module = BaseModule({}, inner)

tracer = module._tracer
print(f"Tracer type: {type(tracer).__name__}")

with module._span("test.span") as span:
    print(f"Span type: {type(span).__name__}")
    print(f"Is recording: {span.is_recording()}")
    # Setting attributes is a no-op but doesn't error
    span.set_attribute("test.key", "value")

print("\nNo SDK = zero overhead. Spans work but record nothing.")

---
## 3. OtelModule Construction and Config Validation

In [None]:
from arcllm.modules.otel import OtelModule, _VALID_CONFIG_KEYS, _VALID_EXPORTERS, _VALID_PROTOCOLS
from arcllm.exceptions import ArcLLMConfigError

print(f"Valid config keys: {sorted(_VALID_CONFIG_KEYS - {'enabled'})}")
print(f"Valid exporters:   {sorted(_VALID_EXPORTERS)}")
print(f"Valid protocols:   {sorted(_VALID_PROTOCOLS)}")

In [None]:
# exporter='none' skips SDK setup — useful for testing
inner = MagicMock(spec=LLMProvider)
inner.name = "anthropic"
inner.model_name = "claude-sonnet-4-20250514"

module = OtelModule({"exporter": "none"}, inner)
print(f"Module type: {type(module).__name__}")
print(f"Inner:       {module.name}")
print("exporter='none' skips SDK setup — no collector needed for testing")

In [None]:
# Config validation: unknown keys
try:
    OtelModule({"exportr": "otlp"}, inner)  # Typo
except ArcLLMConfigError as e:
    print(f"Typo caught: {e}")

# Invalid exporter
try:
    OtelModule({"exporter": "prometheus"}, inner)
except ArcLLMConfigError as e:
    print(f"\nBad exporter: {e}")

# Invalid protocol
try:
    OtelModule({"protocol": "websocket"}, inner)
except ArcLLMConfigError as e:
    print(f"\nBad protocol: {e}")

# sample_rate out of range
try:
    OtelModule({"exporter": "none", "sample_rate": 1.5}, inner)
except ArcLLMConfigError as e:
    print(f"\nBad sample_rate: {e}")

---
## 4. OtelModule.invoke() — GenAI Semantic Convention Attributes

In [None]:
print(inspect.getsource(OtelModule.invoke))

The span gets these standard attributes:

| Attribute | Source | GenAI Convention |
|-----------|--------|------------------|
| `gen_ai.system` | `inner.name` (e.g., "anthropic") | Yes — vendor dashboards auto-detect |
| `gen_ai.request.model` | `inner.model_name` (e.g., "claude-sonnet-4-20250514") | Yes |
| `gen_ai.usage.input_tokens` | `response.usage.input_tokens` | Yes |
| `gen_ai.usage.output_tokens` | `response.usage.output_tokens` | Yes |
| `gen_ai.response.model` | `response.model` | Yes — may differ from request |
| `gen_ai.response.finish_reasons` | `response.stop_reason` | Yes |

Using `gen_ai.*` means Datadog, Grafana, and Jaeger can auto-detect LLM calls and render them with rich UI.

In [None]:
# Invoke with exporter='none' — spans created but not recorded
inner = MagicMock(spec=LLMProvider)
inner.name = "anthropic"
inner.model_name = "claude-sonnet-4-20250514"
inner.invoke = AsyncMock(return_value=LLMResponse(
    content="Hello!",
    usage=Usage(input_tokens=100, output_tokens=50, total_tokens=150),
    model="claude-sonnet-4-20250514",
    stop_reason="end_turn",
))

module = OtelModule({"exporter": "none"}, inner)

messages = [Message(role="user", content="What is 2+2?")]
result = await module.invoke(messages)

print(f"Response: {result.content}")
print(f"Tokens: {result.usage.input_tokens}in / {result.usage.output_tokens}out")
print(f"Stop: {result.stop_reason}")
print("\nSpan was created (NonRecordingSpan) — zero overhead without SDK")

---
## 5. SDK Setup — Config-Driven via TOML

When `exporter != 'none'`, `_setup_sdk()` configures the full OTel pipeline:

```
Resource (service.name + custom attrs)
  → TracerProvider (with sampler)
    → BatchSpanProcessor (with export tuning)
      → Exporter (OTLP gRPC/HTTP or Console)
```

In [None]:
from arcllm.modules.otel import _setup_sdk

print(inspect.getsource(_setup_sdk))

### Config Shape (`config.toml`)

```toml
[modules.otel]
enabled = false
exporter = "otlp"                    # otlp | console | none
endpoint = "http://localhost:4317"    # OTLP collector
protocol = "grpc"                    # grpc | http
service_name = "arcllm"              # OTel resource
sample_rate = 1.0                    # 0.0-1.0
headers = {}                         # Auth headers for OTLP
insecure = false                     # Allow insecure gRPC
certificate_file = ""               # TLS CA cert
client_key_file = ""                # mTLS client key
client_cert_file = ""               # mTLS client cert
timeout_ms = 10000                   # Export timeout
max_batch_size = 512                 # BatchSpanProcessor tuning
max_queue_size = 2048                # Queue for 10K agents
schedule_delay_ms = 5000             # Batch export interval

[modules.otel.resource_attributes]   # Custom resource attrs
# deployment.environment = "production"
# service.version = "1.2.3"
```

### Enterprise Features

| Feature | Config Keys | Use Case |
|---------|-------------|----------|
| Auth headers | `headers` | OTLP with Bearer token / API key |
| mTLS | `certificate_file`, `client_key_file`, `client_cert_file` | Federal zero-trust |
| Sampling | `sample_rate` | 10K agents → sample 10% (0.1) to control volume |
| Batch tuning | `max_batch_size`, `max_queue_size`, `schedule_delay_ms` | High-throughput agent pools |
| Resource attrs | `resource_attributes` | Deployment metadata for filtering |

---
## 6. Missing SDK — Clear Error Message

In [None]:
# When exporter='otlp' but SDK not installed, you get a clear error.
# We can't demo this directly (SDK may or may not be installed),
# but here's the guard code:

print("The guard in _setup_sdk():")
print()
print("    try:")
print("        from opentelemetry.sdk.resources import Resource")
print("        from opentelemetry.sdk.trace import TracerProvider")
print("        ...")
print("    except ImportError:")
print("        raise ArcLLMConfigError(")
print('            "OTel SDK not installed. Run: pip install arcllm[otel]"')
print("        )")
print()
print("Same pattern for gRPC and HTTP exporter packages.")
print("Clear message tells the user EXACTLY what to install.")

---
## 7. Error Recording in Spans

The `_span()` context manager records exceptions as span events and sets ERROR status.

In [None]:
# When an exception occurs inside a span, it's recorded and re-raised
inner_fail = MagicMock(spec=LLMProvider)
inner_fail.name = "test"
inner_fail.model_name = "test-model"
inner_fail.invoke = AsyncMock(side_effect=RuntimeError("provider timeout"))

module = OtelModule({"exporter": "none"}, inner_fail)

try:
    await module.invoke([Message(role="user", content="hi")])
except RuntimeError as e:
    print(f"Exception propagated: {e}")
    print("\nThe span recorded:")
    print("  1. Exception event (exc type + message + traceback)")
    print("  2. StatusCode.ERROR")
    print("  3. Then re-raised to the caller")
    print("\nAgent sees the error. Trace backend shows it in the span.")

### D-087: Error Recording Philosophy

| Scenario | Span Status | Why |
|----------|-------------|-----|
| Retry attempt fails but retry succeeds | Exception event on attempt span, OK on root | Handled error — operational noise, not failure |
| All retries exhausted | ERROR on root span | True failure — agent can't proceed |
| Provider timeout | ERROR on root span | Unrecoverable within this invoke |

This gives clean trace UIs — red spans only for actual failures, not intermediate retries.

---
## 8. Modules Using `_span()` — Child Spans in the Waterfall

Every module that subclasses `BaseModule` gets `_span()` for free. Here's how SecurityModule uses it:

In [None]:
from arcllm.modules.security import SecurityModule

print(inspect.getsource(SecurityModule.invoke))

Notice the nesting:
```python
with self._span("security"):                         # Parent span
    with self._span("security.pii_redact_outbound"): # Child span
        ...
    response = await self._inner.invoke(...)          # Inner modules create their own spans
    with self._span("security.pii_redact_inbound"):  # Child span
        ...
    with self._span("security.sign"):                # Child span
        ...
```

OTel automatically nests these as parent-child because `start_as_current_span()` reads the current context.

---
## 9. Telemetry vs OTel — Different Pillars

D-085: Both modules coexist because they serve different purposes.

| | TelemetryModule | OtelModule |
|---|-----------------|------------|
| **Output** | Structured log lines | Distributed trace spans |
| **Consumer** | grep, Splunk, CloudWatch Logs | Jaeger, Datadog APM, Grafana Tempo |
| **Correlation** | Per-call (one log line) | Cross-service (trace ID propagation) |
| **Key data** | duration_ms, tokens, cost_usd | Span timing, attributes, parent-child tree |
| **Overhead** | Minimal (string formatting) | SDK + export (batch async) |
| **When to use** | Always (lightweight) | When you need distributed tracing |

Both can be enabled simultaneously — they don't interfere.

---
## 10. Registry Integration

In [None]:
from arcllm.registry import load_model, clear_cache

os.environ.setdefault("ANTHROPIC_API_KEY", "test-key")
os.environ.setdefault("OPENAI_API_KEY", "test-key")
clear_cache()

# Enable OTel with exporter='none' (no collector needed)
model = load_model("anthropic", otel={"exporter": "none"})
print(f"Type: {type(model).__name__}")
print(f"Inner: {type(model._inner).__name__}")

In [None]:
clear_cache()

# Full stack with OTel outermost
model = load_model(
    "anthropic",
    otel={"exporter": "none"},
    telemetry=True,
    audit=True,
    retry=True,
    rate_limit=True,
)

# Walk the stack
layers = []
layer = model
while hasattr(layer, '_inner'):
    layers.append(type(layer).__name__)
    layer = layer._inner
layers.append(type(layer).__name__)
print(f"Stack: {' → '.join(layers)}")
print(f"\n{len(layers)} layers. OtelModule is outermost — root span wraps everything.")

---
## 11. Auto-Nesting Under Agent Framework Spans

D-081: When an agent framework (LangGraph, CrewAI, custom) provides OTel context, ArcLLM spans automatically nest as children.

```python
# Agent framework creates parent span
with tracer.start_as_current_span("agent.task.research"):
    # ArcLLM invoke creates child span automatically
    response = await model.invoke(messages, tools)
    # arcllm.invoke span is a child of agent.task.research
```

This happens because `start_as_current_span()` in `_span()` reads `opentelemetry.context` — if a parent exists, the new span attaches to it. No configuration needed.

The resulting trace:
```
agent.task.research (agent framework)          ← parent
  └─ arcllm.invoke (OtelModule)                ← auto-nested child
      └─ security (SecurityModule)             ← grandchild
          └─ retry.attempt.1 (RetryModule)     ← great-grandchild
```

---
## 12. Implementation Details

In [None]:
print("=== OtelModule.__init__ ===")
print(inspect.getsource(OtelModule.__init__))

In [None]:
print("=== OtelModule.invoke ===")
print(inspect.getsource(OtelModule.invoke))

---
## 13. Console Exporter Demo

When `exporter='console'`, spans are printed to stdout — useful for local development.

In [None]:
# Console exporter prints spans to stdout
# This requires opentelemetry-sdk to be installed
try:
    from opentelemetry.sdk.trace import TracerProvider
    has_sdk = True
except ImportError:
    has_sdk = False

if has_sdk:
    from opentelemetry import trace as trace_api
    
    # Reset provider for clean demo
    trace_api.set_tracer_provider(trace_api.NoOpTracerProvider())
    clear_cache()
    
    inner = MagicMock(spec=LLMProvider)
    inner.name = "anthropic"
    inner.model_name = "claude-sonnet-4-20250514"
    inner.invoke = AsyncMock(return_value=LLMResponse(
        content="4",
        usage=Usage(input_tokens=10, output_tokens=5, total_tokens=15),
        model="claude-sonnet-4-20250514",
        stop_reason="end_turn",
    ))
    
    module = OtelModule({"exporter": "console"}, inner)
    result = await module.invoke([Message(role="user", content="2+2?")])
    
    print(f"\nResponse: {result.content}")
    print("\n(Console exporter printed the span details above)")
else:
    print("Skipped — opentelemetry-sdk not installed")
    print("Install with: pip install arcllm[otel]")

---
## Summary

| Component | What | Why |
|-----------|------|-----|
| `OtelModule` | Root `arcllm.invoke` span with GenAI attributes | Entry point for distributed tracing |
| `BaseModule._span()` | Context manager creating child spans | Every module gets tracing for free |
| `BaseModule._tracer` | Shared OTel tracer scoped to "arcllm" | Consistent instrumentation |
| `_setup_sdk()` | Config-driven SDK setup | OTLP/console/none exporters, sampling, batch tuning |
| GenAI attributes | `gen_ai.system`, `gen_ai.request.model`, etc. | Vendor dashboard auto-detection |
| API-only mode | `opentelemetry-api` always, SDK optional | Zero overhead by default |
| Error recording | Exception events + ERROR status on failure | Clean trace UI |
| Auto-nesting | `start_as_current_span()` reads context | Integrates with agent framework traces |
| Enterprise config | Auth headers, mTLS, sampling, batch tuning | Federal/zero-trust environments |

**Config**:
```python
load_model("anthropic", otel=True)                              # OTLP to localhost:4317
load_model("anthropic", otel={"exporter": "console"})           # Print to stdout
load_model("anthropic", otel={"exporter": "none"})              # No-op (testing)
load_model("anthropic", otel={"sample_rate": 0.1})              # Sample 10%
load_model("anthropic", otel={"endpoint": "https://otel.example.com:4317", "headers": {"Authorization": "Bearer tok"}})
```