-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Refactor BigQuery Analytics Plugin to use Structured JSON
Problem Description
The current implementation of the BigQueryAgentAnalyticsPlugin stores complex event payloads (such as LLM requests, responses, and tool calls) as concatenated Strings using pipe delimiters (e.g., Model: ... | Prompt: ...).
This approach has several significant drawbacks for production analytics:
-
Query Difficulty: Downstream analysis requires complex REGEX patterns to extract specific fields like
token_usage,model_parameters, ortool_arguments. -
Data Integrity: The current truncation logic allows arbitrary cutting of the string at a hard character limit (default 500), which often results in data loss for large prompts or documents and broken formatting.
-
Inflexibility: It forces a "log file" mentality onto a data warehouse, underutilizing BigQuery's native capabilities.
Proposed Solution
Refactor the plugin to leverage BigQuery's native JSON data type for the content column.
Key Changes:
-
Schema Update: Change the
contentcolumn definition fromSTRINGtoJSON. -
Structured Logging: Instead of formatting strings, callbacks should construct rich dictionaries (e.g., nesting
usage_metadataunder ausagekey). -
Smart Truncation: Implement a
_recursive_smart_truncateutility. Instead of chopping the entire row, it should traverse the JSON object and truncate individual string values that exceed a limit. This ensures the logged data is always valid JSON, even if specific text fields are shortened. -
Payload Mutation: Update the
content_formatterconfiguration to accept adictand return adict. This allows users to programmatically redact PII or prune fields (like massive system instructions) before serialization.
Benefits
-
Native SQL Querying: Users can access fields directly (e.g.,
JSON_VALUE(content, '$.usage.total_tokens')). -
Safety: Eliminates the risk of logging invalid/broken data structures due to truncation.
-
Observability: Provides a much cleaner, hierarchical view of the agent's lifecycle in BigQuery.
⚠️ Breaking Change Note
This change requires an update to the BigQuery table schema. Existing tables using the STRING content type will need to be migrated or replaced.