New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(OTEL): move troubleshooting topics to separate pages #17055
Conversation
Hi @ally-sassman 👋 Thanks for your pull request! Your PR is in a queue, and a writer will take a look soon. We generally publish small edits within one business day, and larger edits within three days. We will automatically generate a preview of your request, and will comment with a link when the preview is ready (usually 10 to 20 minutes). If you add any more commits, you can comment |
✅ Deploy Preview for docs-website-netlify ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@@ -0,0 +1,29 @@ | |||
--- | |||
title: "Troubleshoot OpenTelemetry: Collector issues" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better title for this page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another situation where I wonder if this really belongs as a "troubleshooting" document. I think it makes sense for us to have some documentation pertaining to the OpenTelemetry Collector, but this is one of those situations where we need to decide how much we should document ourselves versus directing people to the official OpenTelemetry documentation. For example https://opentelemetry.io/docs/collector/troubleshooting/
|
||
For solutions to common collector problems, see the [OpenTelemetry troubleshooting documentation](https://opentelemetry.io/docs/collector/troubleshooting/). | ||
|
||
Within New Relic, we recommend using this NRQL query to show all collector metrics sent to New Relic: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are some examples of things they can find/troubleshoot with this data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the metrics the query below pertains to are documented here https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/monitoring.md.
|
||
## Solution | ||
|
||
OpenTelemetry entities are synthesized based on the public rules described for the [`EXT-SERVICE`](https://github.com/newrelic/entity-definitions/blob/b8e75d839eed7859044a9bb601a62e5323107350/definitions/ext-service/definition.yml) entity type. The standard rule to match relies on the presence of the `service.name` dimension (which follows the OpenTelemetry [semantic conventions](https://github.com/open-telemetry/opentelemetry-specification/blob/527206ea49e384de957eda105d9425aa8fefca25/specification/overview.md#semantic-conventions)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The standard rule to match relies on the presence of the service.name
dimension (which follows the OpenTelemetry semantic conventions)" is a bit too passive. How about this revision for clarity?
OpenTelemetry entities are synthesized based on the public rules described for the [`EXT-SERVICE`](https://github.com/newrelic/entity-definitions/blob/b8e75d839eed7859044a9bb601a62e5323107350/definitions/ext-service/definition.yml) entity type. The standard rule to match relies on the presence of the `service.name` dimension (which follows the OpenTelemetry [semantic conventions](https://github.com/open-telemetry/opentelemetry-specification/blob/527206ea49e384de957eda105d9425aa8fefca25/specification/overview.md#semantic-conventions)). | |
OpenTelemetry entities are synthesized based on the public rules described for the [`EXT-SERVICE`](https://github.com/newrelic/entity-definitions/blob/b8e75d839eed7859044a9bb601a62e5323107350/definitions/ext-service/definition.yml) entity type. How the entity name appears in New Relic is based on the `service.name` dimension, which follows the OpenTelemetry [semantic conventions](https://github.com/open-telemetry/opentelemetry-specification/blob/527206ea49e384de957eda105d9425aa8fefca25/specification/overview.md#semantic-conventions)). |
|
||
OpenTelemetry entities are synthesized based on the public rules described for the [`EXT-SERVICE`](https://github.com/newrelic/entity-definitions/blob/b8e75d839eed7859044a9bb601a62e5323107350/definitions/ext-service/definition.yml) entity type. The standard rule to match relies on the presence of the `service.name` dimension (which follows the OpenTelemetry [semantic conventions](https://github.com/open-telemetry/opentelemetry-specification/blob/527206ea49e384de957eda105d9425aa8fefca25/specification/overview.md#semantic-conventions)). | ||
|
||
To set the `service.name` with the OpenTelemetry Java SDK, include it in your [resource](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/opentelemetry-concepts/#resources): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this available with other language SDKs, or just Java?
...urce-telemetry-integrations/opentelemetry/troubleshooting/missing-entities-relationships.mdx
Outdated
Show resolved
Hide resolved
|
||
* For all other SDK languages, set `service.name` by declaring it in the `OTEL_RESOURCE_ATTRIBUTES` or `OTEL_SERVICE_NAME` [environment variables](https://github.com/open-telemetry/opentelemetry-specification/blob/20c82de552d08428e8cadaaef3e6cb46812f7c00/specification/sdk-environment-variables.md#general-sdk-configuration). | ||
|
||
For New Relic <InlinePopover type="logs" />, you can use a structured log template to inject the `service.name`. See [Logs in context with Log4j2](https://github.com/newrelic/newrelic-opentelemetry-examples/blob/e3f5ee85b4dcd8dd29f8f69d78d122b82a9638ba/other-examples/java/logs-in-context-log4j2/Log4j2EventLayout.json#L2) for an example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would they inject this via logs vs. declaring it in the OTEL_RESOURCE_ATTRIBUTES
or `OTEL_SERVICE_NAME?
For New Relic <InlinePopover type="logs" />, you can use a structured log template to inject the `service.name`. See [Logs in context with Log4j2](https://github.com/newrelic/newrelic-opentelemetry-examples/blob/e3f5ee85b4dcd8dd29f8f69d78d122b82a9638ba/other-examples/java/logs-in-context-log4j2/Log4j2EventLayout.json#L2) for an example. | ||
|
||
<Callout variant="tip"> | ||
For more OpenTelemetry examples with New Relic, visit the [newrelic-opentelemetry-examples](https://github.com/newrelic/newrelic-opentelemetry-examples) repository on GitHub. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a better home/context (maybe after the "For all other SDKs...)
@@ -13,7 +13,7 @@ redirects: | |||
freshnessValidatedDate: never | |||
--- | |||
|
|||
Setting up OpenTelemetry and getting the most from it can be a challenge. To help you optimize your experience, we've created these best practice guides. If you're trying to resolve a specific problem, see our [troubleshooting guide](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/opentelemetry-troubleshooting). | |||
Setting up OpenTelemetry and getting the most from it can be a challenge. To help you optimize your experience, we've created these best practice guides. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting up OpenTelemetry and getting the most from it can be a challenge. To help you optimize your experience, we've created these best practice guides. | |
Setting up OpenTelemetry and getting the most from it can be a challenge. To help you optimize your experience, we've created these best practice guides. If you're trying to resolve a specific problem, see our [troubleshooting topics](/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/troubleshooting/collector-issues). |
…rations/opentelemetry/troubleshooting/missing-entities-relationships.mdx
|
||
## Solution | ||
|
||
To correlate your logs with trace data, the logs need to include the trace context in `trace_id` and `span_id`. However, to ensure your logs show up in the New Relic UI, you'll need to configure rules in your log pipeline to translate `trace_id` and `span_id` to `trace.id` and `span.id`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do they do this? Is there an example we can provide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document lacks a lot of context. That is, this is not a general problem people will experience. It is a problem that may occur in specific use cases. We can elaborate and improve this doc in a separate PR.
That said, we have an example here. https://github.com/newrelic/newrelic-opentelemetry-examples/tree/main/other-examples/java/logs-in-context-log4j2#introduction.
|
||
If you've checked the above, try out these New Relic features: | ||
|
||
* Check your data management hub to [facet data ingest](/docs/data-apis/manage-data/manage-data-coming-new-relic/#facet-data-ingest) and determine how much data is arriving from various sources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you facet-ing with in the data management hub? If the recommendation is instrumentation.provider
or newrelic.source
, then I should delete this bullet point (and depend on the second bullet point)
path: /docs/more-integrations/open-source-telemetry-integrations/opentelemetry/troubleshooting/collector-issues | ||
- title: Missing entities or relationships | ||
path: /docs/more-integrations/open-source-telemetry-integrations/opentelemetry/troubleshooting/missing-entities-relationships | ||
- title: Missing logs | ||
path: /docs/more-integrations/open-source-telemetry-integrations/opentelemetry/troubleshooting/missing-logs | ||
- title: Missing OTLP data | ||
path: /docs/more-integrations/open-source-telemetry-integrations/opentelemetry/troubleshooting/missing-otlp-data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider if the pattern of a page per troubleshooting topic is what we want. I understand these pages were broken out from the original mega troubleshooting doc, but this list of pages strikes me as a strange/incomplete set of topics.
It might make sense to start with a high level organization of topics we believe we should cover, then decide if it makes sense in one doc or multiple.
@@ -0,0 +1,42 @@ | |||
--- | |||
title: "Troubleshoot OpenTelemetry: Missing entities or relationships" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document is 1) limited to service entities and 2) barely addresses relationships in any meaningful way.
A document about entities and relationships is sorely needed. Though, it's strange as a "troubleshooting" doc. We really need a top-level document addressing this subject holistically. Including, service, host, and other entity types that can be synthesized from otel data. Plus how relationships work between these entity types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it is worth reorganzing the existing troubleshooting document into separate documents.
@jack-berg and I discussed and we'd like to suggest a different plan of action.
- @jack-berg will submit a PR that better documents the "missing data" issue.
- Remove the "missing logs" document. Consider moving the content to the existing logs documentation. @jack-berg will make this assessment.
- Remove the "collector issues" document. Add a link (maybe here) to the official opentelemetry.io collector troubleshooting documentation.
- Remove the "entities/relationships" document. I will draft proper documentation about entities and relationships. It will not be a troubleshooting document. The existing content will be unnecessary.
In essence, I think all of this content just needs to be rewritten and the existing content just needs to be deleted.
In this PR, I’m separating out troubleshooting sections into subpages within a new troubleshooting section (to be linked to from other places). Each page will follow the same format (traditionally two headers titled Problem/Solution).