Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider OTLP export failures handleable errors #1565

Merged
merged 13 commits into from
Jan 22, 2024

Conversation

plantfansam
Copy link
Contributor

@plantfansam plantfansam commented Jan 16, 2024

This PR attempts to address #1160.

The general idea is that:

  • we can consider OTLP exporter failures to be errors worthy of handling once retries have been exhausted
  • we should let the handle_error method decide whether or not these particular failures are important
  • the uri is useful information in these failures

Not everyone has a metrics reporter installed, so some log-level visibility might be desirable.

Alternatively, we could OpenTelemetry::Logger.warn (or .debug or info whatever) in these scenarios. I'm happy with that outcome too. Or whatever other great ideas folks may have 馃槃.

@@ -61,6 +61,7 @@ def initialize(endpoint: nil,
@http = http_connection(@uri, ssl_verify_mode, certificate_file)

@path = @uri.path
@uri_string = @uri.to_s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't feel worthwhile caching this just for use in the error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 165 to 170
log_request_failure(response.code)
FAILURE
when Net::HTTPRequestTimeOut, Net::HTTPGatewayTimeOut, Net::HTTPBadGateway
response.body # Read and discard body
redo if backoff?(retry_count: retry_count += 1, reason: response.code)
log_request_failure(response.code)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two calls will double count if backoff? returns false because backoff? also does @metrics_reporter.add_to_counter('otel.otlp_exporter.failure' ...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch 馃槄 - will 馃 and 馃捇

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@kaylareopelle kaylareopelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so glad you submitted this PR!

I've manually added logging like this around these errors so many times, and it'll be great to have to have that baked in. 馃崻

@fbogsany fbogsany merged commit d08bc1a into open-telemetry:main Jan 22, 2024
55 checks passed
@muripic
Copy link

muripic commented Jan 25, 2024

Hey! 馃憢
After this change we're getting a lot of error logs due to retries that eventually succeed (specifically, due to this line).

Alternatively, we could OpenTelemetry::Logger.warn (or .debug or info whatever) in these scenarios. I'm happy with that outcome too. Or whatever other great ideas folks may have 馃槃.

Going back a bit to what you said there, we appreciate the visibility 馃挴 but at least for us it would make more sense if these failures coming from retries were logged at debug, info or warning level, since it's retried anyway (a different situation would be if it's still failing after all retry attempts). What do you think? Are you all still open to discussing it? 馃槃

@fbogsany
Copy link
Contributor

fbogsany commented Feb 1, 2024

馃槵 yeah, that鈥檚 bad. We shouldn鈥檛 call log_request_failure in backoff?, both because we鈥檙e going to retry (maybe) and because there are many non-http-response-code reasons, so the message can be incorrect.

@plantfansam
Copy link
Contributor Author

Sorry for the trouble @muripic, we will address this.

@muripic
Copy link

muripic commented Feb 2, 2024

Thank you! 鉂わ笍

@plantfansam
Copy link
Contributor Author

Thank you! 鉂わ笍

Of course! Should be addressed by #1589

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants