feat(bqjdbc): extend OpenTelemetry instrumentation for metadata and pagination#12918
Conversation
There was a problem hiding this comment.
Code Review
This pull request integrates OpenTelemetry tracing into the BigQuery JDBC driver, focusing on ResultSet iteration and DatabaseMetaData operations. Key changes include capturing span contexts during result set initialization and propagating them during next() calls, as well as adding tracing to metadata methods like getTables, getColumns, and getSchemas. Review feedback identifies significant performance overhead from wrapping high-frequency next() methods in telemetry scopes and points out that spans for asynchronous metadata operations end prematurely, leading to inaccurate duration metrics and fragmented traces.
…er the blocking operation `buffer.take()`
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request integrates OpenTelemetry tracing across the BigQuery JDBC driver, specifically targeting metadata operations and result set processing. Key changes include wrapping blocking operations in appropriate tracing scopes, propagating span contexts to background threads, and standardizing tracer acquisition via a new utility method. Feedback focuses on refining the tracing implementation, such as correctly handling background span parents to avoid timeline anomalies, ensuring span statuses are updated on errors, and replacing redundant null checks on SpanContext with validity checks.
| try { | ||
| // Advance the cursor,Potentially blocking operation | ||
| this.cursor = this.buffer.take(); | ||
| try (Scope scope = Context.current().with(Span.wrap(originalSpanContext)).makeCurrent()) { |
There was a problem hiding this comment.
Same comment from Gemini about this as well:
Similar to the Arrow implementation, wrapping the entire next() method in a Scope creates substantial performance overhead for every row processed. Since next() is a hot path, the cost of managing the OpenTelemetry context for every iteration can lead to a noticeable regression in throughput for large result sets. It is recommended to limit the scope of context propagation to the parts of the method that actually perform work requiring context, such as the blocking buffer.take() call. Ensure the scope is closed in an exception-safe manner to prevent leaks.
There was a problem hiding this comment.
Can expand to the try block (162-180), there is not much else happening either way
There was a problem hiding this comment.
Actually in this version, this.cursor = this.buffer.take() happens on every call to next(), so it'd already create new span each time. Should we check performance & see if it is needed? All we do here is taking an item from the queue.. So maybe no real tracing in next function needed. (in arrow tracing makes sense because we're reading next block in the same thread.. So maybe we can squeeze more from Arrow by reading in the background thread)
There was a problem hiding this comment.
Yeah, I guess you are right. Since this.buffer.take() is called for every row in and it doesn't trigger any new spans or heavy processing (unlike Arrow where we deserialize a whole batch), maintaining the scope here only adds overhead without any real benefit.
I will remove it for now, and create a task to investigate performance with and without to finalise it later
|
Please use |
b/491245568
Key Changes
Core Instrumentation Logic
BigQueryDatabaseMetaData.java(getCatalogs,getSchemas,getTables,getColumns) to capture underlying API calls.fetchNextPagesinBigQueryStatement.javaand linked background pagination spans back to it, avoiding timeline anomalies.SpanContextinBigQueryBaseResultSet.javaat creation time and made it current duringnext()inBigQueryJsonResultSet.javaandBigQueryArrowResultSet.javato survive thread hops.getSafeTracertoBigQueryJdbcOpenTelemetry.javaas a static utility to ensure consistent fallback behavior across the driver.populateArrowBufferedQueueinBigQueryStatement.javato its own private methodprocessArrowStreamto improve readability and maintainability.