Skip to content

Commit

Permalink
Add section on retry config, new OTLP Troubleshooting page
Browse files Browse the repository at this point in the history
  • Loading branch information
jack-berg committed May 1, 2024
1 parent 733f1b6 commit 1aa714c
Show file tree
Hide file tree
Showing 3 changed files with 61 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
title: New Relic OTLP Troubleshooting
tags:
- Integrations
- Open source telemetry integrations
- OpenTelemetry
- OTLP
- Troubleshoot
metaDescription: Troubleshoot common OTLP ingest errors
freshnessValidatedDate: never
---

New Relic has supported [native OTLP ingest](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/) for several years. In the process of working through support cases that come up from time to time, we've learned about common issues users face. For some, the problem is easy to identify and fix. Others are deviously tricky, given that the internet is unreliable and there are many components (software, networking, hardware, etc) involved under the control of various parties (customers, New Relic, and public networking infrastructure outside the control of either). With so much complexity, configuration, and failure points, it can be difficult to determine which is at fault an how to best address.

Filing and working through a support case can be time consuming and at times frustrating for customers (and for New Relic!). Therefore, we've put together this troubleshooting guide to help establish a shared understanding, and provide tools to self-diagnose and fix issues when possible.

First, please review the New Relic [OTLP configuration requirements / recommendations](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#configuration). It contains essential advice and context that anyone looking to use OTLP with New Relic should be aware of.

The [Issues Catalog](#issue-catalog) lists a variety of different errors we've seen customers experience, with mitigation steps which often reference items from [OTLP configuration requirements / recommendations](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#configuration).

# Issue Catalog [#issue-catalog]

| OTLP Protocol Version | Type | Language / Ecosystem | Fingerprint | Known Resolution | Notes |
|---|---|---|---|---|---|
| HTTP | 401 - Unauthorized | Java | `io.opentelemetry.exporter.internal.http.HttpExporter - Failed to export spans. Server responded with HTTP status code 401.` | [Include API Key](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#api-key) | Missing `api-key` header |
| HTTP | 401 - Unauthorized | Collector | `Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "data_type": "traces", "name": "otlphttp", "error": "Permanent error: error exporting items, request to https://otlp.nr-data.net/v1/traces responded with HTTP Status Code 401, Message=, Details=[]", "dropped_items": 4}` | [Include API Key](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#api-key) | Missing `api-key` header |
| HTTP | 401 - Unauthorized | Go | `failed to upload metrics: failed to send metrics to https://otlp.nr-data.net/v1/metrics: 401 Unauthorized` | [Include API Key](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#api-key) | Missing `api-key` header |
| HTTP | 403 - Forbidden | Java | `io.opentelemetry.exporter.internal.http.HttpExporter - Failed to export spans. Server responded with HTTP status code 403.` | [Verify API Key](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#api-key) | Invalid `api-key` header |
| HTTP | 403 - Forbidden | Java | `Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "data_type": "traces", "name": "otlphttp", "error": "Permanent error: error exporting items, request to https://otlp.nr-data.net/v1/traces responded with HTTP Status Code 403, Message=, Details=[]", "dropped_items": 14}` | [Verify API Key](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#api-key) | Invalid `api-key` header |
| HTTP | 403 - Forbidden | Go | `traces export: failed to send to https://otlp.nr-data.net/v1/traces: 403 Forbidden` | [Verify API Key](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#api-key) | Invalid `api-key` header |
| HTTP | 403 - Forbidden | .NET | `Exporter failed send data to collector to {0} endpoint. Data will not be sent. Exception: {1}{https://otlp.nr-data.net:4317/v1/traces}{System.Net.Http.HttpRequestException: Response status code does not indicate success: 403 (Forbidden).` | [Verify API Key](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#api-key) | Invalid `api-key` header |
| HTTP | Timeout | Java | `io.opentelemetry.exporter.internal.http.HttpExporter - Failed to export spans. The request could not be executed. Full error message: timeout` | [Tune batching / timeout](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#payload) | Occurs after export times out. Check timeout settings and client network status.<br></br>If you've ruled out client side network and configuration, open support case. |
| HTTP | Timeout | Collector | `max elapsed time expired failed to make an HTTP request: Post \"https://otlp.nr-data.net/v1/traces\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)` | [Tune batching / timeout](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#payload) | Typically occurs after retry attempts fail and export times out. Can be related to client network, client retry / timeout configuration, or New Relic network / servers.<br></br>If you've ruled out client side network and configuration, open support case. |
| HTTP | Timeout | Go | `failed to upload metrics: context deadline exceeded: retry-able request failure` | [Tune batching / timeout](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#payload) | Occurs after export times out. Check timeout settings and client network status.<br></br>If you've ruled out client side network and configuration, open support case. |
| HTTP | Rate limit | Collector | `Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlphttp", "error": "Throttle (29s), error: error exporting items, request to https://otlp.nr-data.net:443/v1/metrics responded with HTTP Status Code 429", "interval": "29s"}` | [Tune batching](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#payload) | Rate limit exceeded.<br></br>Adjust batching configuration to reduce request rate. |
| gRPC | Code 2 - Unknown<br></br>Timeout | Java | `io.opentelemetry.exporter.internal.grpc.GrpcExporter - Failed to export spans. Server responded with gRPC status code 2. Error message: timeout` | [Tune batching / timeout](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#payload) | Occurs after export times out. Check timeout settings and client network status.<br></br>If you've ruled out client side network and configuration, open support case. |
| gRPC | Code 2 - Unknown<br></br>HTTP 500 | Collector | `rpc error: code = Unknown desc = unexpected HTTP status code received from server: 500 (Internal Server Error); malformed header: missing HTTP content-type` | | New Relic networking vendor produced non-retriable status code for transient error.<br></br>If this happens repeatedly, open support case. |
| gRPC | Code 2 - Unknown<br></br>HTTP 530 | Collector | `rpc error: code = Unknown desc = unexpected HTTP status code received from server: 530 (); transport: received unexpected content-type \"text/html; charset=UTF-8\"` | | New Relic networking vendor produced non-retriable status code for transient error.<br></br>If this happens repeatedly, open support case. |
| gRPC | Code 4 - DeadlineExceeded | Collector | `rpc error: code = DeadlineExceeded desc = context deadline exceeded` | [Tune batching / timeout](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#payload) | Typically occurs after retry attempts fail and export times out. Can be related to client network, client retry / timeout configuration, or New Relic network / servers.<br></br>If you've ruled out client side network and configuration, open support case. |
| gRPC | Code 7 - Unauthenticated | Java | `io.opentelemetry.exporter.internal.grpc.GrpcExporter - Failed to export spans. Server responded with gRPC status code 7.` | [Include API Key](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#api-key) | Missing `api-key` header |
| gRPC | Code 7 - Unauthenticated | .NET | `Exporter failed send data to collector to {0} endpoint. Data will not be sent. Exception: {1}{https://otlp.nr-data.net:4317/}{Grpc.Core.RpcException: Status(StatusCode="Unauthenticated", Detail="")` | [Include API Key](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#api-key) | Missing `api-key` header |
| gRPC | Code 8 - ResourceExhausted | Collector | `rpc error: code = ResourceExhausted desc = Too many requests", "dropped_items": 1024` | [Tune batching](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#payload) | Rate limit exceeded.<br></br>Adjust batching configuration to reduce request rate. |
| gRPC | Code 13 - Internal | Java | `io.opentelemetry.exporter.internal.grpc.GrpcExporter - Failed to export spans. Server responded with gRPC status code 13.` | | Not enough information to diagnose. Could be New Relic networking vendor produced non-retriable status code for a transient error.<br></br>If this happens repeatedly, open a support case. |
| gRPC | Code 13 - Internal<br></br>HTTP 400 | Collector | `rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request)` | | New Relic networking vendor produced non-retriable status code for a transient error.<br></br>If this happens repeatedly, open a support case. |
| gRPC | Code 14 - Unavailable<br></br>Connection reset | Collector | `rpc error: code = Unavailable desc = error reading from server: read tcp 100.127.0.171:47470->162.247.241.110:4317: read: connection reset by peer` | [Tune retry](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#retry) | Should solve with retry. Ensure collector has sufficient resources to handle retry backpressure. |
| gRPC | Code 14 - Unavailable<br></br>HTTP 502 | Collector | `rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "text/html"` | [Tune retry](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#retry) | Should solve with retry. Ensure collector has sufficient resources to handle retry backpressure. |
| gRPC | Code 14 - Unavailable<br></br>HTTP 503 | Collector | `rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 503 (Service Unavailable)` | [Tune retry](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#retry) | Should solve with retry. Ensure collector has sufficient resources to handle retry backpressure. |
| gRPC | Code 16 - PermissionDenied | Java | `io.opentelemetry.exporter.internal.grpc.GrpcExporter - Failed to export spans. Server responded with gRPC status code 16.` | [Verify API Key](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#api-key) | Invalid `api-key` header |
| gRPC | Code 16 - PermissionDenied | .NET | `Exporter failed send data to collector to {0} endpoint. Data will not be sent. Exception: {1}{https://otlp.nr-data.net:4317/}{Grpc.Core.RpcException: Status(StatusCode="PermissionDenied", Detail="")` | [Verify API Key](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp/#api-key) | Invalid `api-key` header |
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,16 @@ The mechanism to configure the endpoint will vary, but OpenTelemetry language SD

If using the collector, `gzip` is the default compression, but `zstd` can be optionally configured.

### Retry [#retry]

Requirement level: **Recommended**

In order to send OTLP data to New Relic, you should configure your OTLP exporter to retry when transient errors occur. The internet is unreliable, and failure to retry increases the likelihood of data loss.

The mechanism to configure retry will vary. Some OpenTelemetry SDKs may have language specific environment variables (for example [java supports](https://github.com/open-telemetry/opentelemetry-java/tree/main/sdk-extensions/autoconfigure) setting `OTEL_EXPERIMENTAL_EXPORTER_OTLP_RETRY_ENABLED=true`)), but there is no general mechanism. Programmatic configuration may be required.

If using the collector, the `otlphttpexporter` and `otlpexporter` retry by default. See [exporterhelper](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md) for more details.

### Metric Aggregation Temporality

Requirement level: **Recommended**
Expand Down
2 changes: 2 additions & 0 deletions src/nav/opentelemetry.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ pages:
pages:
- title: New Relic OTLP Endpoint
path: /docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp
- title: OTLP Troubleshooting
path: /docs/more-integrations/open-source-telemetry-integrations/opentelemetry/best-practices/opentelemetry-otlp-troubleshooting
- title: OTLP Data mapping
pages:
- title: Logs
Expand Down

0 comments on commit 1aa714c

Please sign in to comment.