Skip to content

Refactor BigQuery Analytics Plugin to use Structured JSON #3724

@amenegola

Description

@amenegola

Refactor BigQuery Analytics Plugin to use Structured JSON

Problem Description

The current implementation of the BigQueryAgentAnalyticsPlugin stores complex event payloads (such as LLM requests, responses, and tool calls) as concatenated Strings using pipe delimiters (e.g., Model: ... | Prompt: ...).

This approach has several significant drawbacks for production analytics:

  1. Query Difficulty: Downstream analysis requires complex REGEX patterns to extract specific fields like token_usage, model_parameters, or tool_arguments.

  2. Data Integrity: The current truncation logic allows arbitrary cutting of the string at a hard character limit (default 500), which often results in data loss for large prompts or documents and broken formatting.

  3. Inflexibility: It forces a "log file" mentality onto a data warehouse, underutilizing BigQuery's native capabilities.

Proposed Solution

Refactor the plugin to leverage BigQuery's native JSON data type for the content column.

Key Changes:

  • Schema Update: Change the content column definition from STRING to JSON.

  • Structured Logging: Instead of formatting strings, callbacks should construct rich dictionaries (e.g., nesting usage_metadata under a usage key).

  • Smart Truncation: Implement a _recursive_smart_truncate utility. Instead of chopping the entire row, it should traverse the JSON object and truncate individual string values that exceed a limit. This ensures the logged data is always valid JSON, even if specific text fields are shortened.

  • Payload Mutation: Update the content_formatter configuration to accept a dict and return a dict. This allows users to programmatically redact PII or prune fields (like massive system instructions) before serialization.

Benefits

  • Native SQL Querying: Users can access fields directly (e.g., JSON_VALUE(content, '$.usage.total_tokens')).

  • Safety: Eliminates the risk of logging invalid/broken data structures due to truncation.

  • Observability: Provides a much cleaner, hierarchical view of the agent's lifecycle in BigQuery.

⚠️ Breaking Change Note

This change requires an update to the BigQuery table schema. Existing tables using the STRING content type will need to be migrated or replaced.

Metadata

Metadata

Assignees

No one assigned

    Labels

    core[Component] This issue is related to the core interface and implementation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions