Problem
The BigQuery Agent Analytics plugin doc is technically rich, but it is difficult to scan and use because setup guidance, deployment instructions, configuration reference, schema details, event payload examples, GCS offloading, query recipes, security, and operational notes are all in one long flow.
After PR #1689, the code-doc parity gaps are mostly closed. The remaining issue is information architecture and readability:
- Readers cannot quickly answer "what events are captured?" without reading several subsections and JSON examples.
- Readers cannot quickly compare configuration options because they are currently a long bullet list.
- The table of contents mixes quickstart, Agent Runtime deployment, reference material, analysis examples, security, and multiprocessing guidance at the same level.
- GCS offloading examples are nested under "Event types and payloads", even though offloading is a storage/configuration behavior rather than an event type.
- The first local-use example is long and includes plugin setup, BigQuery tool setup, Vertex/Gemini environment setup, OpenTelemetry setup, and GCS offloading all at once. This makes the minimal path harder to find.
Suggested Restructure
Proposed top-level flow:
- Overview
- Quickstart: log events to BigQuery
- Prerequisites and IAM
- Configuration
- What the plugin captures
- BigQuery schema and generated views
- Storage behavior: truncation, multimodal content, and GCS offloading
- Query recipes
- Deploy to Agent Runtime
- Security and redaction
- Operations: tracing, flushing, troubleshooting, multiprocessing
- SDK / dashboards / additional resources
The goal is to make the page answer three common reader questions quickly:
- "How do I turn it on?"
- "What data will it write?"
- "How do I query or configure it?"
Specific Suggestions
1. Add a "Captured events" table near the top
Add a concise table before the JSON payload examples. This would make the event surface discoverable without forcing readers through every example.
Suggested columns:
| Event type |
Captured when |
Key payload fields |
View |
USER_MESSAGE_RECEIVED |
A user message enters the invocation |
text summary / content parts |
v_user_message_received |
INVOCATION_STARTING |
An invocation starts |
common columns only |
v_invocation_starting |
INVOCATION_COMPLETED |
An invocation completes |
common columns only |
v_invocation_completed |
AGENT_STARTING |
Agent execution starts |
instruction summary |
v_agent_starting |
AGENT_COMPLETED |
Agent execution completes |
latency |
v_agent_completed |
LLM_REQUEST |
A model request is sent |
model, request content, config, tools |
v_llm_request |
LLM_RESPONSE |
A model response is received |
response, usage, cache metadata, latency |
v_llm_response |
LLM_ERROR |
A model call fails |
error message, latency |
v_llm_error |
TOOL_STARTING |
A tool starts |
tool name, args, origin |
v_tool_starting |
TOOL_COMPLETED |
A tool succeeds |
tool name, result, origin, latency |
v_tool_completed |
TOOL_ERROR |
A tool fails |
tool name, args, origin, error, latency |
v_tool_error |
STATE_DELTA |
Session state changes |
state delta |
v_state_delta |
HITL_CREDENTIAL_REQUEST |
Credential request is emitted |
synthetic tool name, args |
v_hitl_credential_request |
HITL_CONFIRMATION_REQUEST |
Confirmation request is emitted |
synthetic tool name, args |
v_hitl_confirmation_request |
HITL_INPUT_REQUEST |
User input request is emitted |
synthetic tool name, args |
v_hitl_input_request |
HITL_*_COMPLETED |
HITL response is received |
synthetic tool name, result |
base table only |
A2A_INTERACTION |
Remote A2A interaction metadata is present |
response, task ID, context ID, request/response metadata |
v_a2a_interaction |
The existing detailed JSON examples can stay, but they should become a "Payload examples" subsection after the overview table.
2. Convert configuration options from bullets into a table
The current configuration section is accurate but hard to compare. A table would be easier to scan.
Suggested columns:
| Option |
Type |
Default |
Use when |
Notes |
project_id |
str |
required |
Select the Google Cloud project |
Constructor parameter |
dataset_id |
str |
required |
Select the BigQuery dataset |
Constructor parameter |
table_id |
str |
agent_events |
Use a custom table name |
Constructor value overrides config value |
location |
str |
US |
Match the BigQuery dataset location |
Constructor parameter |
credentials |
Credentials | None |
None |
Use explicit service-account, impersonated, or cross-project credentials |
Falls back to ADC |
enabled |
bool |
True |
Temporarily disable logging |
Config option |
event_allowlist |
list[str] | None |
None |
Log only selected event types |
Config option |
event_denylist |
list[str] | None |
None |
Skip sensitive/noisy events |
Config option |
max_content_length |
int |
500 * 1024 |
Control inline payload size |
Config option |
gcs_bucket_name |
str | None |
None |
Offload large text/media |
Config option |
connection_id |
str | None |
None |
Use BigQuery ObjectRef / object tables |
Config option |
log_multi_modal_content |
bool |
True |
Capture content_parts details |
Config option |
batch_size |
int |
1 |
Tune write throughput vs latency |
Config option |
batch_flush_interval |
float |
1.0 |
Flush partial batches periodically |
Config option |
shutdown_timeout |
float |
10.0 |
Wait for final flush on shutdown |
Config option |
queue_max_size |
int |
10000 |
Bound in-memory event queue |
Config option |
content_formatter |
callable |
None |
Apply custom masking/formatting |
Config option |
retry_config |
RetryConfig |
default retry config |
Tune retry behavior |
Include subfields in notes |
log_session_metadata |
bool |
True |
Add session metadata to attributes |
Config option |
custom_tags |
dict[str, Any] |
{} |
Add static tags to every event |
Config option |
auto_schema_upgrade |
bool |
True |
Add new columns automatically |
Config option |
create_views |
bool |
True |
Create generated views |
Config option |
view_prefix |
str |
v |
Avoid view-name collisions |
Config option |
The existing code sample can stay below the table as "Example: common configuration patterns".
3. Separate "schema" from "generated views"
Right now the schema table, production DDL, generated views, and event payload examples run together. Consider:
- "Base table schema" for
agent_events
- "Generated views" for the 16 view table
- "Payload examples" for event-specific JSON
This helps users decide whether to query the base table or a view.
4. Move GCS offloading out of "Event types and payloads"
GCS offloading is currently under the event section, after A2A. It would be easier to find under a dedicated section such as:
## Storage behavior: truncation, multimodal content, and GCS offloading
That section could cover:
- When content is stored inline
- When content is truncated
- When content is offloaded to GCS
- How
content_parts, object_ref, gcs_bucket_name, and connection_id relate
- Example query for signed URLs
5. Shorten the first "Use with agent" example
The first example currently tries to show plugin setup, GCS offloading, BigQuery tools, Gemini/Vertex environment variables, and OpenTelemetry. Consider making the first example minimal:
- imports
BigQueryAgentAnalyticsPlugin(project_id, dataset_id, location)
App(..., plugins=[plugin])
- simple query to verify rows
Then add optional snippets:
- Enable GCS offloading
- Add OpenTelemetry tracing
- Use explicit credentials
- Add BigQuery tools
This would reduce the startup cost for readers who only need logging.
6. Move Agent Runtime deployment later
Deployment is important, but it currently appears before configuration, schema, and event capture. For a reader trying to understand the plugin, this interrupts the core narrative. Consider moving the full Agent Runtime section after the reference sections, with a short link near the quickstart:
Deploying to Agent Runtime? See Deploy to Agent Runtime.
7. Consolidate analysis examples by job-to-be-done
The advanced query section is useful. It would be easier to scan if grouped by task:
- Debug a run: trace by
trace_id, span hierarchy, errors
- Monitor cost/performance: token usage, cache hit rate, latency
- Inspect tools and HITL: tool provenance, HITL interaction analysis
- Analyze multimodal content: object refs and BigQuery ML
- Use Gemini/BigQuery ML for root cause analysis
Also consider using generated views as the default in examples wherever a view exists, with base-table JSON extraction as a fallback.
8. Keep security and operations as explicit late sections
Security/redaction, flushing/shutdown, troubleshooting, and multiprocessing are operational concerns. They should remain easy to find, but not interrupt the reference flow.
Possible structure:
## Security and redaction
## Operations and troubleshooting
- flushing and shutdown
- Agent Runtime troubleshooting
- multiprocessing and fork safety
- debug logging
Why this helps
This page serves multiple audiences:
- First-time users who need a minimal setup path.
- Operators who need IAM, deployment, flushing, and troubleshooting.
- Analysts who need event/view/query references.
- Security reviewers who need redaction and sensitive-data behavior.
The current doc has the content for all of these readers, but the sections are interleaved. A clearer structure plus two reference tables would make the doc much easier to scan without removing the detailed material.
Problem
The BigQuery Agent Analytics plugin doc is technically rich, but it is difficult to scan and use because setup guidance, deployment instructions, configuration reference, schema details, event payload examples, GCS offloading, query recipes, security, and operational notes are all in one long flow.
After PR #1689, the code-doc parity gaps are mostly closed. The remaining issue is information architecture and readability:
Suggested Restructure
Proposed top-level flow:
The goal is to make the page answer three common reader questions quickly:
Specific Suggestions
1. Add a "Captured events" table near the top
Add a concise table before the JSON payload examples. This would make the event surface discoverable without forcing readers through every example.
Suggested columns:
USER_MESSAGE_RECEIVEDv_user_message_receivedINVOCATION_STARTINGv_invocation_startingINVOCATION_COMPLETEDv_invocation_completedAGENT_STARTINGv_agent_startingAGENT_COMPLETEDv_agent_completedLLM_REQUESTv_llm_requestLLM_RESPONSEv_llm_responseLLM_ERRORv_llm_errorTOOL_STARTINGv_tool_startingTOOL_COMPLETEDv_tool_completedTOOL_ERRORv_tool_errorSTATE_DELTAv_state_deltaHITL_CREDENTIAL_REQUESTv_hitl_credential_requestHITL_CONFIRMATION_REQUESTv_hitl_confirmation_requestHITL_INPUT_REQUESTv_hitl_input_requestHITL_*_COMPLETEDA2A_INTERACTIONv_a2a_interactionThe existing detailed JSON examples can stay, but they should become a "Payload examples" subsection after the overview table.
2. Convert configuration options from bullets into a table
The current configuration section is accurate but hard to compare. A table would be easier to scan.
Suggested columns:
project_idstrdataset_idstrtable_idstragent_eventslocationstrUScredentialsCredentials | NoneNoneenabledboolTrueevent_allowlistlist[str] | NoneNoneevent_denylistlist[str] | NoneNonemax_content_lengthint500 * 1024gcs_bucket_namestr | NoneNoneconnection_idstr | NoneNonelog_multi_modal_contentboolTruecontent_partsdetailsbatch_sizeint1batch_flush_intervalfloat1.0shutdown_timeoutfloat10.0queue_max_sizeint10000content_formatterNoneretry_configRetryConfiglog_session_metadataboolTruecustom_tagsdict[str, Any]{}auto_schema_upgradeboolTruecreate_viewsboolTrueview_prefixstrvThe existing code sample can stay below the table as "Example: common configuration patterns".
3. Separate "schema" from "generated views"
Right now the schema table, production DDL, generated views, and event payload examples run together. Consider:
agent_eventsThis helps users decide whether to query the base table or a view.
4. Move GCS offloading out of "Event types and payloads"
GCS offloading is currently under the event section, after A2A. It would be easier to find under a dedicated section such as:
## Storage behavior: truncation, multimodal content, and GCS offloadingThat section could cover:
content_parts,object_ref,gcs_bucket_name, andconnection_idrelate5. Shorten the first "Use with agent" example
The first example currently tries to show plugin setup, GCS offloading, BigQuery tools, Gemini/Vertex environment variables, and OpenTelemetry. Consider making the first example minimal:
BigQueryAgentAnalyticsPlugin(project_id, dataset_id, location)App(..., plugins=[plugin])Then add optional snippets:
This would reduce the startup cost for readers who only need logging.
6. Move Agent Runtime deployment later
Deployment is important, but it currently appears before configuration, schema, and event capture. For a reader trying to understand the plugin, this interrupts the core narrative. Consider moving the full Agent Runtime section after the reference sections, with a short link near the quickstart:
7. Consolidate analysis examples by job-to-be-done
The advanced query section is useful. It would be easier to scan if grouped by task:
trace_id, span hierarchy, errorsAlso consider using generated views as the default in examples wherever a view exists, with base-table JSON extraction as a fallback.
8. Keep security and operations as explicit late sections
Security/redaction, flushing/shutdown, troubleshooting, and multiprocessing are operational concerns. They should remain easy to find, but not interrupt the reference flow.
Possible structure:
## Security and redaction## Operations and troubleshootingWhy this helps
This page serves multiple audiences:
The current doc has the content for all of these readers, but the sections are interleaved. A clearer structure plus two reference tables would make the doc much easier to scan without removing the detailed material.