# BigQuery Agent Analytics - Setup & Queries

This notebook sets up the BigQuery dataset and table for ADK agent analytics logging,
and provides example queries for analyzing agent behavior.

**Reference:** [BigQuery Agent Analytics Plugin Documentation](https://google.github.io/adk-docs/observability/bigquery-agent-analytics/)

## Setup & Configuration

In [1]:
import os
from google.cloud import bigquery
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Configuration
PROJECT_ID = os.getenv('GOOGLE_CLOUD_PROJECT')
LOCATION = os.getenv('GOOGLE_CLOUD_LOCATION', 'US')
DATASET_ID = os.getenv('BQ_ANALYTICS_DATASET', 'applied_ml_concept_bq')
TABLE_ID = os.getenv('BQ_ANALYTICS_TABLE', 'agent_events')

# Full table reference
FULL_TABLE_ID = f"{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}"

print(f"Project: {PROJECT_ID}")
print(f"Location: {LOCATION}")
print(f"Dataset: {DATASET_ID}")
print(f"Table: {FULL_TABLE_ID}")

# Initialize BigQuery client
bq_client = bigquery.Client(project=PROJECT_ID)

Project: statmike-mlops-349915
Location: us-central1
Dataset: applied_ml_concept_bq
Table: statmike-mlops-349915.applied_ml_concept_bq.agent_events


## Dataset Management

Check if the dataset exists, create it if missing.

In [2]:
def create_dataset_if_not_exists(client, project_id, dataset_id, location):
    """Create dataset if it doesn't exist."""
    dataset_ref = f"{project_id}.{dataset_id}"
    
    try:
        dataset = client.get_dataset(dataset_ref)
        print(f"Dataset '{dataset_ref}' already exists.")
        print(f"  Location: {dataset.location}")
        print(f"  Created: {dataset.created}")
        return dataset
    except Exception as e:
        if "Not found" in str(e):
            print(f"Dataset '{dataset_ref}' not found. Creating...")
            dataset = bigquery.Dataset(dataset_ref)
            dataset.location = location
            dataset.description = "ADK Agent Analytics - Logs agent events for analysis and debugging"
            dataset = client.create_dataset(dataset, exists_ok=True)
            print(f"Dataset '{dataset_ref}' created successfully.")
            return dataset
        else:
            raise e

dataset = create_dataset_if_not_exists(bq_client, PROJECT_ID, DATASET_ID, LOCATION)

Dataset 'statmike-mlops-349915.applied_ml_concept_bq' not found. Creating...
Dataset 'statmike-mlops-349915.applied_ml_concept_bq' created successfully.


## Table Management

Check if the table exists, create it with the recommended schema if missing.

The schema includes:
- Event metadata (timestamp, event_type, agent, session_id, etc.)
- OpenTelemetry tracing (trace_id, span_id, parent_span_id)
- Content as JSON with multimodal support (content_parts with GCS references)
- Performance metrics (latency_ms, status, error_message)

In [3]:
# DDL for creating the table (from ADK documentation)
CREATE_TABLE_DDL = f"""
CREATE TABLE IF NOT EXISTS `{FULL_TABLE_ID}`
(
  timestamp TIMESTAMP NOT NULL OPTIONS(description="The UTC time at which the event was logged."),
  event_type STRING OPTIONS(description="Indicates the type of event being logged (e.g., 'LLM_REQUEST', 'TOOL_COMPLETED')."),
  agent STRING OPTIONS(description="The name of the ADK agent or author associated with the event."),
  session_id STRING OPTIONS(description="A unique identifier to group events within a single conversation or user session."),
  invocation_id STRING OPTIONS(description="A unique identifier for each individual agent execution or turn within a session."),
  user_id STRING OPTIONS(description="The identifier of the user associated with the current session."),
  trace_id STRING OPTIONS(description="OpenTelemetry trace ID for distributed tracing."),
  span_id STRING OPTIONS(description="OpenTelemetry span ID for this specific operation."),
  parent_span_id STRING OPTIONS(description="OpenTelemetry parent span ID to reconstruct hierarchy."),
  content JSON OPTIONS(description="The event-specific data (payload) stored as JSON."),
  content_parts ARRAY<STRUCT<
    mime_type STRING,
    uri STRING,
    object_ref STRUCT<
      uri STRING,
      version STRING,
      authorizer STRING,
      details JSON
    >,
    text STRING,
    part_index INT64,
    part_attributes STRING,
    storage_mode STRING
  >> OPTIONS(description="Detailed content parts for multi-modal data."),
  attributes JSON OPTIONS(description="Arbitrary key-value pairs for additional metadata."),
  latency_ms JSON OPTIONS(description="Latency measurements (e.g., total_ms)."),
  status STRING OPTIONS(description="The outcome of the event, typically 'OK' or 'ERROR'."),
  error_message STRING OPTIONS(description="Populated if an error occurs."),
  is_truncated BOOLEAN OPTIONS(description="Flag indicates if content was truncated.")
)
PARTITION BY DATE(timestamp)
CLUSTER BY event_type, agent, user_id;
"""

def create_table_if_not_exists(client, full_table_id, ddl):
    """Create table if it doesn't exist."""
    try:
        table = client.get_table(full_table_id)
        print(f"Table '{full_table_id}' already exists.")
        print(f"  Created: {table.created}")
        print(f"  Rows: {table.num_rows}")
        print(f"  Size: {table.num_bytes / 1024 / 1024:.2f} MB")
        return table
    except Exception as e:
        if "Not found" in str(e):
            print(f"Table '{full_table_id}' not found. Creating...")
            query_job = client.query(ddl)
            query_job.result()  # Wait for completion
            print(f"Table '{full_table_id}' created successfully.")
            return client.get_table(full_table_id)
        else:
            raise e

table = create_table_if_not_exists(bq_client, FULL_TABLE_ID, CREATE_TABLE_DDL)

Table 'statmike-mlops-349915.applied_ml_concept_bq.agent_events' not found. Creating...
Table 'statmike-mlops-349915.applied_ml_concept_bq.agent_events' created successfully.


### Optional: Truncate Table (for demos)

**Warning:** This will delete all data in the table. Use only for demo resets.

In [4]:
# Set to True to truncate the table
TRUNCATE_TABLE = False

if TRUNCATE_TABLE:
    confirm = input(f"Are you sure you want to DELETE ALL DATA from '{FULL_TABLE_ID}'? Type 'yes' to confirm: ")
    if confirm.lower() == 'yes':
        truncate_query = f"TRUNCATE TABLE `{FULL_TABLE_ID}`"
        query_job = bq_client.query(truncate_query)
        query_job.result()
        print(f"Table '{FULL_TABLE_ID}' has been truncated.")
    else:
        print("Truncate cancelled.")
else:
    print("Skipping truncate (TRUNCATE_TABLE = False)")

Skipping truncate (TRUNCATE_TABLE = False)


## Preview Data

View recent events logged by agents.

In [6]:
preview_query = f"""
SELECT 
    timestamp,
    event_type,
    agent,
    session_id,
    status
FROM `{FULL_TABLE_ID}`
ORDER BY timestamp DESC
LIMIT 20
"""

df = bq_client.query(preview_query).to_dataframe()
print(f"Recent events ({len(df)} rows):")
df

Recent events (19 rows):


Unnamed: 0,timestamp,event_type,agent,session_id,status
0,2026-01-09 20:52:02.960056+00:00,INVOCATION_COMPLETED,agent_convo_api,719a8d55-52c1-466c-9c29-c71f1768551b,OK
1,2026-01-09 20:52:02.959920+00:00,AGENT_COMPLETED,agent_convo_api,719a8d55-52c1-466c-9c29-c71f1768551b,OK
2,2026-01-09 20:52:02.945318+00:00,LLM_RESPONSE,agent_convo_api,719a8d55-52c1-466c-9c29-c71f1768551b,OK
3,2026-01-09 20:52:00.927035+00:00,LLM_REQUEST,agent_convo_api,719a8d55-52c1-466c-9c29-c71f1768551b,OK
4,2026-01-09 20:52:00.894605+00:00,TOOL_COMPLETED,agent_convo_api,719a8d55-52c1-466c-9c29-c71f1768551b,OK
5,2026-01-09 20:51:39.416494+00:00,TOOL_STARTING,agent_convo_api,719a8d55-52c1-466c-9c29-c71f1768551b,OK
6,2026-01-09 20:51:39.401975+00:00,LLM_RESPONSE,agent_convo_api,719a8d55-52c1-466c-9c29-c71f1768551b,OK
7,2026-01-09 20:51:37.321781+00:00,LLM_REQUEST,agent_convo_api,719a8d55-52c1-466c-9c29-c71f1768551b,OK
8,2026-01-09 20:51:37.302030+00:00,TOOL_COMPLETED,agent_convo_api,719a8d55-52c1-466c-9c29-c71f1768551b,OK
9,2026-01-09 20:51:36.138299+00:00,TOOL_STARTING,agent_convo_api,719a8d55-52c1-466c-9c29-c71f1768551b,OK


---

# Example Queries

The following queries are from the [ADK documentation](https://google.github.io/adk-docs/observability/bigquery-agent-analytics/) for analyzing agent behavior.

## Event Type Distribution

See what types of events are being logged and how many of each.

In [7]:
event_distribution_query = f"""
SELECT 
    event_type,
    COUNT(*) as event_count
FROM `{FULL_TABLE_ID}`
GROUP BY event_type
ORDER BY event_count DESC
"""

df = bq_client.query(event_distribution_query).to_dataframe()
df

Unnamed: 0,event_type,event_count
0,LLM_REQUEST,4
1,LLM_RESPONSE,4
2,TOOL_STARTING,3
3,TOOL_COMPLETED,3
4,AGENT_STARTING,1
5,AGENT_COMPLETED,1
6,USER_MESSAGE_RECEIVED,1
7,INVOCATION_STARTING,1
8,INVOCATION_COMPLETED,1


## Agent Activity Comparison

Compare event counts across different agents.

In [8]:
agent_comparison_query = f"""
SELECT 
    agent,
    COUNT(*) as events,
    COUNT(DISTINCT session_id) as sessions,
    AVG(CAST(JSON_VALUE(latency_ms, '$.total_ms') AS INT64)) as avg_latency_ms
FROM `{FULL_TABLE_ID}`
WHERE event_type = 'LLM_RESPONSE'
GROUP BY agent
ORDER BY events DESC
"""

df = bq_client.query(agent_comparison_query).to_dataframe()
df

Unnamed: 0,agent,events,sessions,avg_latency_ms
0,agent_convo_api,4,1,2071.5


## Token Usage Analysis

Analyze token consumption from LLM responses.

In [9]:
token_usage_query = f"""
SELECT
    agent,
    COUNT(*) as response_count,
    AVG(CAST(JSON_VALUE(content, '$.usage.total') AS INT64)) as avg_tokens,
    SUM(CAST(JSON_VALUE(content, '$.usage.total') AS INT64)) as total_tokens
FROM `{FULL_TABLE_ID}`
WHERE event_type = 'LLM_RESPONSE'
GROUP BY agent
ORDER BY total_tokens DESC
"""

df = bq_client.query(token_usage_query).to_dataframe()
df

Unnamed: 0,agent,response_count,avg_tokens,total_tokens
0,agent_convo_api,4,4343.25,17373


## Latency Analysis (LLM & Tools)

Compare latency across different operation types.

In [26]:
latency_query = f"""
SELECT
    event_type,
    COUNT(*) as count,
    AVG(CAST(JSON_VALUE(latency_ms, '$.total_ms') AS INT64)) as avg_latency_ms,
    MIN(CAST(JSON_VALUE(latency_ms, '$.total_ms') AS INT64)) as min_latency_ms,
    MAX(CAST(JSON_VALUE(latency_ms, '$.total_ms') AS INT64)) as max_latency_ms
FROM `{FULL_TABLE_ID}`
WHERE event_type IN ('LLM_RESPONSE', 'TOOL_COMPLETED')
GROUP BY event_type
"""

df = bq_client.query(latency_query).to_dataframe()
df

Unnamed: 0,event_type,count,avg_latency_ms,min_latency_ms,max_latency_ms
0,LLM_RESPONSE,6,2202.666667,2018,2481
1,TOOL_COMPLETED,4,11246.5,1163,21478


## Tool Usage Statistics

See which tools are being used and their performance.

In [27]:
tool_usage_query = f"""
SELECT
    JSON_VALUE(content, '$.tool') as tool_name,
    COUNT(*) as call_count,
    AVG(CAST(JSON_VALUE(latency_ms, '$.total_ms') AS INT64)) as avg_latency_ms
FROM `{FULL_TABLE_ID}`
WHERE event_type = 'TOOL_COMPLETED'
GROUP BY tool_name
ORDER BY call_count DESC
"""

df = bq_client.query(tool_usage_query).to_dataframe()
df

Unnamed: 0,tool_name,call_count,avg_latency_ms
0,conversational_chat,2,18130.0
1,list_dataset_ids,1,7563.0
2,list_table_ids,1,1163.0


## Error Analysis

Review any errors that occurred during agent execution.

In [28]:
error_query = f"""
SELECT
    timestamp,
    agent,
    event_type,
    error_message,
    session_id
FROM `{FULL_TABLE_ID}`
WHERE status = 'ERROR' OR error_message IS NOT NULL
ORDER BY timestamp DESC
LIMIT 20
"""

df = bq_client.query(error_query).to_dataframe()
if len(df) == 0:
    print("No errors found!")
else:
    print(f"Found {len(df)} error(s):")
df

No errors found!


Unnamed: 0,timestamp,agent,event_type,error_message,session_id


## Trace a Specific Conversation

Use trace_id to follow a complete conversation flow.

In [29]:
# First, get available trace IDs and invocation IDs
# Note: trace_id may be NULL in some ADK versions; invocation_id is always populated
trace_list_query = f"""
SELECT DISTINCT 
    trace_id,
    invocation_id,
    MIN(timestamp) as started,
    COUNT(*) as event_count
FROM `{FULL_TABLE_ID}`
GROUP BY trace_id, invocation_id
ORDER BY started DESC
LIMIT 10
"""

df = bq_client.query(trace_list_query).to_dataframe()
print("Recent traces/invocations:")
print("(Use trace_id if populated, otherwise use invocation_id)")
display(df)

Recent traces/invocations:
(Use trace_id if populated, otherwise use invocation_id)


Unnamed: 0,trace_id,invocation_id,started,event_count
0,e-4999e6e4-1114-464d-962d-8d5f3002133b,e-4999e6e4-1114-464d-962d-8d5f3002133b,2026-01-09 20:56:45.392756+00:00,9
1,,e-4999e6e4-1114-464d-962d-8d5f3002133b,2026-01-09 20:56:45.378802+00:00,2
2,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,2026-01-09 20:51:24.317327+00:00,17
3,,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,2026-01-09 20:51:24.283670+00:00,2


In [31]:
# Set a specific trace_id or invocation_id to analyze (copy from above)
# Use whichever is populated in your data
TRACE_ID = "e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474"  # e.g., "e-4999e6e4-1114-464d-962d-8d5f3002133b"
INVOCATION_ID = None  # Alternative: use invocation_id if trace_id is NULL

# Determine which ID to use
id_column = "trace_id" if TRACE_ID else "invocation_id" if INVOCATION_ID else None
id_value = TRACE_ID or INVOCATION_ID

if id_value:
    # First, verify the ID exists with a simple count
    verify_query = f"""
    SELECT COUNT(*) as count
    FROM `{FULL_TABLE_ID}`
    WHERE {id_column} = '{id_value}'
    """
    count_df = bq_client.query(verify_query).to_dataframe()
    print(f"Events matching {id_column}: {count_df['count'].iloc[0]}")
    
    # Now get the details
    trace_detail_query = f"""
    SELECT 
        timestamp, 
        event_type, 
        agent,
        trace_id,
        invocation_id,
        JSON_VALUE(content, '$.response') as response_summary,
        CAST(JSON_VALUE(latency_ms, '$.total_ms') AS INT64) as latency_ms
    FROM `{FULL_TABLE_ID}`
    WHERE {id_column} = '{id_value}'
    ORDER BY timestamp ASC
    """
    df = bq_client.query(trace_detail_query).to_dataframe()
    print(f"\\nTrace details ({len(df)} rows):")
    display(df)
else:
    print("Set TRACE_ID or INVOCATION_ID above to trace a specific conversation.")

Events matching trace_id: 17
\nTrace details (17 rows):


Unnamed: 0,timestamp,event_type,agent,trace_id,invocation_id,response_summary,latency_ms
0,2026-01-09 20:51:24.317327+00:00,AGENT_STARTING,agent_convo_api,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,,
1,2026-01-09 20:51:24.325734+00:00,LLM_REQUEST,agent_convo_api,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,,
2,2026-01-09 20:51:26.470579+00:00,LLM_RESPONSE,agent_convo_api,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,call: list_dataset_ids,2144.0
3,2026-01-09 20:51:26.488168+00:00,TOOL_STARTING,agent_convo_api,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,,
4,2026-01-09 20:51:34.051275+00:00,TOOL_COMPLETED,agent_convo_api,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,,7563.0
5,2026-01-09 20:51:34.076587+00:00,LLM_REQUEST,agent_convo_api,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,,
6,2026-01-09 20:51:36.121099+00:00,LLM_RESPONSE,agent_convo_api,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,call: list_table_ids,2044.0
7,2026-01-09 20:51:36.138299+00:00,TOOL_STARTING,agent_convo_api,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,,
8,2026-01-09 20:51:37.302030+00:00,TOOL_COMPLETED,agent_convo_api,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,,1163.0
9,2026-01-09 20:51:37.321781+00:00,LLM_REQUEST,agent_convo_api,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,e-a82ada8b-760a-4f8e-b3eb-1f8e98ce0474,,


## Span Hierarchy & Duration Analysis

Analyze the call hierarchy using span IDs.

In [32]:
if id_value:
    span_hierarchy_query = f"""
    SELECT
        span_id,
        parent_span_id,
        event_type,
        timestamp,
        CAST(JSON_VALUE(latency_ms, '$.total_ms') AS INT64) as duration_ms,
        COALESCE(
            JSON_VALUE(content, '$.tool'), 
            'LLM_CALL'
        ) as operation
    FROM `{FULL_TABLE_ID}`
    WHERE {id_column} = '{id_value}'
      AND event_type IN ('LLM_RESPONSE', 'TOOL_COMPLETED')
    ORDER BY timestamp ASC
    """
    df = bq_client.query(span_hierarchy_query).to_dataframe()
    print(f"Span hierarchy ({len(df)} rows):")
    display(df)
else:
    print("Set TRACE_ID or INVOCATION_ID above to analyze span hierarchy.")

Span hierarchy (7 rows):


Unnamed: 0,span_id,parent_span_id,event_type,timestamp,duration_ms,operation
0,a6839f96-26ac-4013-8cdc-6e3834155492,c2a31c85-a7b7-49f6-ba86-9184f9386be5,LLM_RESPONSE,2026-01-09 20:51:26.470579+00:00,2144,LLM_CALL
1,e9216e9b-ffeb-4f11-add4-46bc67c2fcaa,c2a31c85-a7b7-49f6-ba86-9184f9386be5,TOOL_COMPLETED,2026-01-09 20:51:34.051275+00:00,7563,list_dataset_ids
2,6a237f88-6e60-4f2e-947f-68be83bea748,c2a31c85-a7b7-49f6-ba86-9184f9386be5,LLM_RESPONSE,2026-01-09 20:51:36.121099+00:00,2044,LLM_CALL
3,4b7b2272-8c47-47fe-be8c-67c820d5d01e,c2a31c85-a7b7-49f6-ba86-9184f9386be5,TOOL_COMPLETED,2026-01-09 20:51:37.302030+00:00,1163,list_table_ids
4,68fa7f84-87d5-4bbb-9f73-ac103198d9d3,c2a31c85-a7b7-49f6-ba86-9184f9386be5,LLM_RESPONSE,2026-01-09 20:51:39.401975+00:00,2080,LLM_CALL
5,a42c3e4a-863b-4466-9d5d-93389cbc83d2,c2a31c85-a7b7-49f6-ba86-9184f9386be5,TOOL_COMPLETED,2026-01-09 20:52:00.894605+00:00,21478,conversational_chat
6,b0314379-3fb0-4bf2-ae41-45b7464309ce,c2a31c85-a7b7-49f6-ba86-9184f9386be5,LLM_RESPONSE,2026-01-09 20:52:02.945318+00:00,2018,LLM_CALL


## Query Multimodal Content (GCS References)

Find events with images or other multimodal content offloaded to GCS.

In [33]:
multimodal_query = f"""
SELECT
    timestamp,
    event_type,
    part.mime_type,
    part.storage_mode,
    part.object_ref.uri as gcs_uri
FROM `{FULL_TABLE_ID}`,
UNNEST(content_parts) as part
WHERE part.storage_mode = 'GCS_REFERENCE'
ORDER BY timestamp DESC
LIMIT 20
"""

df = bq_client.query(multimodal_query).to_dataframe()
if len(df) == 0:
    print("No multimodal content found in GCS.")
else:
    print(f"Found {len(df)} GCS references:")
df

No multimodal content found in GCS.


Unnamed: 0,timestamp,event_type,mime_type,storage_mode,gcs_uri


## Daily Activity Summary

View agent activity trends over time.

In [34]:
daily_summary_query = f"""
SELECT
    DATE(timestamp) as date,
    agent,
    COUNT(*) as total_events,
    COUNT(DISTINCT session_id) as unique_sessions,
    COUNTIF(event_type = 'LLM_RESPONSE') as llm_responses,
    COUNTIF(event_type = 'TOOL_COMPLETED') as tool_calls
FROM `{FULL_TABLE_ID}`
GROUP BY date, agent
ORDER BY date DESC, total_events DESC
LIMIT 30
"""

df = bq_client.query(daily_summary_query).to_dataframe()
df

Unnamed: 0,date,agent,total_events,unique_sessions,llm_responses,tool_calls
0,2026-01-09,agent_convo_api,30,1,6,4
