# Observability in Agentic AI with ADK & AgentOps

This notebook is a crisp, practical deep dive:
1) Observability in **agentic AI** (concepts & goals)
2) Observability **in ADK** (what the toolkit emits and how to capture it)
3) **AgentOps** integration for session replays, traces, metrics

> You can run code cells as snippets in your own project. They are designed to be copy‑paste friendly.


## Table of contents
- [1. Why Observability for Agentic AI?](#why)
- [2. Core Signals & Telemetry in Agent Systems](#signals)
- [3. ADK Observability: Logging & Tracing](#adk-obs)
  - [3.1 Logging (Python `logging`, CLI flags)](#adk-logging)
  - [3.2 Event stream mental model](#adk-events)
- [4. AgentOps: Replays, Traces, & Metrics](#agentops)
  - [4.1 What AgentOps adds on top of ADK](#agentops-why)
  - [4.2 Minimal integration](#agentops-min)
  - [4.3 What the spans look like](#agentops-spans)
  - [4.4 Cost/latency insights & common checks](#agentops-tips)
- [5. Patterns & Best Practices](#best)


<a id="why"></a>
## 1. Why Observability for Agentic AI?

Agentic apps are **stateful**, **tool-using**, and often **multi-step**. Observability lets you answer:

- What did the **user** ask, and how did the **agent** decide what to do?

- What **tools** were called, with what **arguments**, and what came back?

- What did the **LLM** see (system prompts, history, functions) and return (text, function calls)?

- Where did **latency** and **cost** accrue?

- Why did a **branch**/loop/transfer occur (and did it match policy)?

- What **state** or **artifacts** changed during the run?



### Visual: high-level observability in agentic AI
```mermaid
flowchart LR
  U[User] -->|message| R(Runner/UI)
  R -->|records| H[Event History]
  R -->|logs/traces| O[Observability Sink]
  subgraph Agentic Flow
    R --> A1[Agent / LLM]
    A1 -->|tool call| T1[Tool A]
    A1 -->|tool call| T2[Tool B]
    A1 -->|writes| S[Session State]
    T1 -->|artifacts| F[Artifact Store]
    T2 -->|artifacts| F
  end
  O --> Dash[Dashboards/Replay]
```


<a id="signals"></a>
## 2. Core Signals & Telemetry in Agent Systems

| Signal | What it shows | Why it matters |
|---|---|---|
| **Logs** | Textual diagnostics (who/when/what) | Quick debugging, grep‑ability |
| **Traces/Spans** | Hierarchical timing of steps (agent → LLM → tool) | Find hotspots, follow causal chain |
| **Events** | Immutable records of content + side effects | Auditability & replay |
| **Metrics** | Counts, durations, costs | SLOs, alerting, capacity planning |
| **Artifacts/State Deltas** | Files/blobs & key/value changes | Data lineage & correctness |


<a id="adk-obs"></a>
## 3. ADK Observability: Logging & Tracing

ADK purposefully **does not auto‑configure** logging; you control verbosity. Tracing spans can be exported to your preferred backend; for simple workflows, **logs + event history** are often enough.


<a id="adk-logging"></a>
### 3.1 Logging (Python `logging`, CLI flags)

Enable structured logs in your entrypoint:
```python
import logging

logging.basicConfig(
    level=logging.INFO,  # use DEBUG while troubleshooting
    format="%(asctime)s - %(levelname)s - %(name)s - %(message)s"
)
```
When using ADK CLI you can override verbosity without touching code:
```bash
# Run the web UI with verbose logs
adk web --log_level DEBUG path/to/your/agents_dir

# Shorthand
adk web -v path/to/your/agents_dir
```
**Tip:** Use `INFO` or `WARNING` in prod; `DEBUG` can include full LLM prompts.


<a id="adk-events"></a>
### 3.2 Event stream mental model

Every significant step yields an **Event** (user message, agent text, tool call/result, state/artifact update). A simple model:

```mermaid
sequenceDiagram
  participant User
  participant Runner
  participant Agent
  participant LLM
  participant Tool

  User->>Runner: new message
  Runner->>Agent: invoke (InvocationContext)
  Agent->>LLM: prompt (+ tools schema)
  LLM-->>Agent: text / function_call
  Agent->>Tool: execute(args)
  Tool-->>Agent: result (dict/parts)
  Agent-->>Runner: Event(s) with actions (state/artifacts)
  Runner-->>User: final Event(s)
```
**Read events** to reconstruct what happened and **why**.


<a id="agentops"></a>
## 4. AgentOps: Replays, Traces, & Metrics

AgentOps adds **session replays**, **rich traces**, and **cost/latency** analytics with minimal code.


<a id="agentops-why"></a>
### 4.1 What AgentOps adds on top of ADK
- **Unified tracing & replay** across agent, LLM, tools

- **Prompt & completion visibility** (tokens, finish reasons)

- **Performance views** (latency buckets, slow steps)

- **Minimal setup** (single `init`) and works alongside ADK runners


<a id="agentops-min"></a>
### 4.2 Minimal integration

> Put this near the top of your app (before you run the ADK `Runner`). Set the API key in your env.

```python
# pip install -U agentops
import os
import agentops

agentops.init(
    api_key=os.getenv("AGENTOPS_API_KEY"),  # export AGENTOPS_API_KEY=...
    trace_name="adk-demo-trace"             # optional label shown in UI
)
# ... define/load your ADK root_agent and Runner as usual ...
```
Once initialized, AgentOps auto‑instruments ADK calls so you get **nested spans** and **session replays** in the dashboard.


<a id="agentops-spans"></a>
### 4.3 What the spans look like

```mermaid
flowchart TD
  A[AgentOps Session Trace] --> B[ADK Runner / Root Agent]
  B --> C[Sub‑Agent / Workflow Step]
  C --> D[LLM Call: gemini-2.0-flash]
  C --> E[Tool Call: search_api]
  C --> F[Tool Call: summarize_pdf]
  D --> G{Tokens, Prompt, Latency}
  E --> H{Args, Result, Duration}
  F --> I{Args, Result, Duration}
```
You can expand each span to inspect **inputs/outputs**, **timings**, and **errors**.


<a id="agentops-tips"></a>
### 4.4 Cost/latency insights & common checks
- Check the **longest spans** first (often tools or large prompts)

- Inspect **prompts** to ensure correct instruction/templating

- Watch **function_call** frequency & fan‑out

- Correlate **tool errors** with agent **fallbacks** or **retries**


<a id="best"></a>
## 5. Patterns & Best Practices

- **Name things well**: agent names, tool names, and state keys improve trace readability.

- **Log at edges**: before/after tool calls; summarize inputs, not full blobs.

- **Guard PII**: scrub secrets from logs; prefer secure stores for credentials.

- **Use levels**: DEBUG for prompt inspection locally; INFO in prod.

- **Sample** high‑volume traffic; keep detailed traces for error cases.

- **Link runs** to user/session IDs to compare behavior over time.

- **Automate alerts** on latency spikes, error rates, and cost thresholds.
