Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTLP traces export errors use a consistent error message prefix #3516

Merged
merged 12 commits into from
Jan 27, 2023

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented Dec 5, 2022

When a user configures both a tracing SDK and metrics SDK the, existing error-handling and logging mechanism do not indicate whether it is a tracing or a metrics failure. For SDKs and vendors that route metrics and traces to different services, it is important to indicate which of the two paths is experiencing problems.

Part of #3513.

@codecov
Copy link

codecov bot commented Dec 5, 2022

Codecov Report

Merging #3516 (8be26f1) into main (af3db6e) will increase coverage by 0.2%.
The diff coverage is 85.7%.

❗ Current head 8be26f1 differs from pull request most recent head 91e7554. Consider uploading reports for the commit 91e7554 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3516     +/-   ##
=======================================
+ Coverage   79.4%   79.7%   +0.2%     
=======================================
  Files        170     171      +1     
  Lines      12654   12673     +19     
=======================================
+ Hits       10056   10101     +45     
+ Misses      2388    2359     -29     
- Partials     210     213      +3     
Impacted Files Coverage Δ
exporters/otlp/otlptrace/exporter.go 66.6% <80.0%> (+66.6%) ⬆️
exporters/otlp/internal/wrappederror.go 86.6% <86.6%> (ø)
exporters/otlp/otlptrace/otlptracehttp/client.go 77.5% <100.0%> (ø)
sdk/trace/batch_span_processor.go 80.2% <0.0%> (-1.8%) ⬇️

@MrAlias
Copy link
Contributor

MrAlias commented Dec 8, 2022

Please fix the lint issues and add a changelog entry.

Copy link
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a lot of code to add to ensure the two client error messages have a prefix. Why not just add the prefix to the client response?

Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think adding information to the the error message is helpful, and worth the extra branch.

@jmacd
Copy link
Contributor Author

jmacd commented Dec 16, 2022

This seems like a lot of code to add to ensure the two client error messages have a prefix.

This point is well taken. This PR and #3515 can be done better and I will give it a try. My biggest concern among these three is actually #3514, which conceals the gRPC or HTTP error message when the final error was retryable. I'll close this and #3515 for now and reopen when it's in better shape.

Why not just add the prefix to the client response?

There are cases where the retry mechanism returns a local error that would not be prefixed, that's what led me to this in the first place. When you get the full gRPC error message you're likely to have enough context to determine whether it's a tracing SDK problem or a metrics SDK problem.

@jmacd
Copy link
Contributor Author

jmacd commented Dec 16, 2022

I changed this PR to use a wrapped error and restored the old tests w/ a call to errors.Unwrap(), since if you want to use os.IsTimeout() or the grpc/status package you need to have the exact error. This makes it a slight change for callers, but I've updated all the tests and the changelog to indicate that errors.Unwrap is needed.

@jmacd jmacd reopened this Dec 16, 2022
@jmacd
Copy link
Contributor Author

jmacd commented Dec 16, 2022

If the approach taken here is acceptable, I would restore #3515 with the same (using the new internal wrappedExportError type from this PR) for metrics. Thanks.

CHANGELOG.md Outdated Show resolved Hide resolved
@MrAlias MrAlias added this to the Release v1.12.0 milestone Jan 27, 2023
@MrAlias MrAlias merged commit ec13377 into open-telemetry:main Jan 27, 2023
@MrAlias MrAlias mentioned this pull request Jan 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants