Skip to content

impl(o11y): introduce error attributes#12189

Open
diegomarquezp wants to merge 22 commits intomainfrom
observability/tracing-attr-error-type-transfer
Open

impl(o11y): introduce error attributes#12189
diegomarquezp wants to merge 22 commits intomainfrom
observability/tracing-attr-error-type-transfer

Conversation

@diegomarquezp
Copy link
Copy Markdown
Contributor

@diegomarquezp diegomarquezp commented Mar 24, 2026

Ports the error.type and status.message telemetry features from sdk-platform-java.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the tracing telemetry by introducing dedicated attributes for error types, exception types, and status messages. These additions provide a more detailed and standardized way to categorize and understand failures occurring within client-side operations, improving observability and debugging capabilities. The changes ensure that critical error information is consistently captured in OpenTelemetry spans, offering clearer insights into the root causes of issues.

Highlights

  • Enhanced Error Telemetry: Introduced new OpenTelemetry attributes (error.type, status.message, exception.type) to provide more granular details on client-side errors within tracing spans.
  • Standardized Error Type Extraction: Added a new utility class ErrorTypeUtil to consistently extract low-cardinality error types from Throwable objects, prioritizing google.rpc.ErrorInfo.reason, client-side network/operational errors, specific server codes, and language-specific exception names.
  • Span Attribute Population: Modified SpanTracer to automatically populate the new error telemetry attributes on attempt spans when operations fail, including a recursive search for the most relevant error message.
  • API Tracer Update: Extended the ApiTracer interface with a new default method requestSent(long requestSize) to allow for tracking the size of streaming requests.
  • Comprehensive Testing: Added extensive unit and integration tests to validate the correct extraction and recording of error types and messages across various client-side and server-side failure scenarios for both gRPC and HTTP/JSON transports.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new ErrorTypeUtil class to provide a standardized way of classifying exceptions into specific error types (e.g., client timeout, connection error, authentication error) for OpenTelemetry tracing. It adds new observability attributes (error.type, exception.type, status.message) and integrates this error classification into the SpanTracer to enrich span data upon failed attempts. The changes also include comprehensive unit and integration tests to validate the new error type extraction and tracing functionality. Feedback from the review includes correcting incorrect copyright years, removing redundant semicolons and toString() overrides, and updating Javadoc for accuracy.

@diegomarquezp diegomarquezp changed the title feat: Refine tracing telemetry for client-side attributes impl(o11y): introduce error attributes Mar 24, 2026
@diegomarquezp diegomarquezp marked this pull request as ready for review March 26, 2026 18:48
@diegomarquezp diegomarquezp requested a review from a team as a code owner March 26, 2026 18:48
import javax.annotation.Nullable;
import javax.net.ssl.SSLHandshakeException;

public class ErrorTypeUtil {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally other libraries can rely on this logic

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bigquery etc?

@diegomarquezp diegomarquezp marked this pull request as draft March 26, 2026 20:32
@diegomarquezp
Copy link
Copy Markdown
Contributor Author

01:03:56:592 [ERROR] Failed to execute goal org.graalvm.buildtools:native-maven-plugin:0.10.6:test (test-native) on project google-auth-library-credentials: Execution test-native of goal org.graalvm.buildtools:native-maven-plugin:0.10.6:test failed: Test configuration file wasn't found. -> [Help 1]
01:03:56:593 [ERROR] Failed to execute goal org.graalvm.buildtools:native-maven-plugin:0.10.6:test (test-native) on project api-common: Execution test-native of goal org.graalvm.buildtools:native-maven-plugin:0.10.6:test failed: Test configuration file wasn't found. -> [Help 1]
01:03:56:593 [ERROR] Failed to execute goal org.graalvm.buildtools:native-maven-plugin:0.10.6:test (test-native) on project google-auth-library-appengine: Execution test-native of goal org.graalvm.buildtools:native-maven-plugin:0.10.6:test failed: Test configuration file wasn't found. -> [Help 1]
01:03:56:593 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.5.2:test (default-test) on project google-auth-library-cab-token-generator: 

GraalVM failures seem unrelated

@diegomarquezp diegomarquezp marked this pull request as ready for review March 27, 2026 15:43
* The specific error type. Value will be google.rpc.ErrorInfo.reason, a specific Server Error
* Code, Client-Side Network/Operational Error (e.g., CLIENT_TIMEOUT) or internal fallback.
*/
public static final String ERROR_TYPE_ATTRIBUTE = "error.type";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, Wes is adding the same attribute in #12202. Depending on which PR gets merged in first, you may need to rebase this PR based on his changes.

private static boolean hasErrorClassInCauseChain(
Throwable t, Set<Class<? extends Throwable>> errorClasses) {
Throwable current = t;
while (current != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is nice to check all possible causes, I'm not sure it is the correct way. Do you have a concrete example that a cause is nested? It also introduces more questions like what if there is a SocketTimeoutException caused by a ConnectException?

Is this what other languages are doing? If not, I'm leaning towards only checking the top level.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To double check wrapper exceptions and specify them directly in here. The point of ambiguity is valid.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the recursive check for now and I'm confirming the need of it internally.

@diegomarquezp diegomarquezp added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 27, 2026
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 27, 2026

private String extractErrorMessage(Throwable error) {
Throwable cause = error;
while (cause != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to recursively get the error message as well.

import javax.annotation.Nullable;
import javax.net.ssl.SSLHandshakeException;

public class ErrorTypeUtil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bigquery etc?


public class ErrorTypeUtil {

public enum ErrorType {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enum can be package private. I don't think other libraries would use this enum directly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants