New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTEL Telemetry Exporter isn't setting service.name #7791
Comments
Potentially related to #8494 |
Perhaps I'm over-simplifying this, so please correct me if I'm wrong. Per https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/trace/v3/opentelemetry.proto.html:
We should be populating the Should this be the name of the service or the name of the gateway that this is defined on? |
I'd say the gateway name fits well for this value as it's defined in the |
@sam-heilbron Feedback from another customer is that setting this to the gateway name also fits their use-case and requirements. So I agree with @anessi, let's set this to the gateway name, keeping in mind that we need to support a shared environment with multiple gateways. |
I want to confirm the expected behavior with a specific example, using the example from https://docs.solo.io/gloo-edge/latest/guides/observability/tracing/otel/ This is the gateway config:
Where right now we are displaying
We would instead display:
Should |
I have a solution that depends upon the gateway populating some "parent" metadata with the gateway name. This happens in the "classic" gloo gateway, and I suspect it is not happening in ggv2 (investigation in progress). I also think that since no one is on ggv2 yet, there's value in this approach, and we can add the translation to ggv2 at the appropriate time (if its not currently implemented). I expect to have a PR for this approach ready Monday. Also some questions for how to handle some edge cases (which shouldn't occur, but we should protect against).
These edge cases wouldn't be expected to be a matter of bad user configuration (at least for classic gateway), and would be more of a matter of bad translation/application logic. The two main approaches to handling this would be:
Any opinions on which approach to use? |
IMO a See example-traces.json for two example traces, one from rate-limit and one from gateway-proxy. If a trace can't be clearly related to a gateway, then a more sensible values like the type of service should be used. Stop translating is not an option IMO. It should be possible to always put a sensible value there (hopefully :) ). |
@anessi , thanks for the feedback. I agree we shouldn't stop translating on these edge cases. However, in most of these cases I don't know that there will be a way to find a more sensible value. At the point in translation where we create the envoy Open Telemetry Configuration, we have already translated Gateways and VirtualServices to Proxys and VirtualHosts, and we don't have access to the runtime information/span definitions. The internal OpenTelemetry configuration only refers to the upstream ref. These edge cases can't be reached by updating configuration, and would only be encountered if changes to the gateway (or a new gateway) were introduced. If I wanted to create them in a running cluster, I would start by intentionally writing some bad gateway translation code, I don't think I could do it with just yaml and out-of-the-box gloo. So if the metadata for the gateway is wrong, all of our metadata is likely wrong (as in case 1a/1b). The "multiple gateways" use case would also not occur when multiple gateways refer to the same upstream collector. Current translation would create separate Otel Configurations that would each refer to a single gateway. This might be something easier discussed in a quick call/huddle with interested parties. |
@sheidkamp : thanks for the details. I don't know the internals, so if it's not possible to set any meaningful value, setting a string as you proposed above and logging additional details is the best that can be done I guess. If those are edge cases, it's maybe not such a big deal anyway. |
Since there was only one option, we decided to wait to introduce the API until there was a second option to choose from. |
Merged into 1.17/main |
Zendesk ticket #3478 has been linked to this issue. |
Gloo Edge Version
1.13.x (latest stable)
Kubernetes Version
1.23.x
Describe the bug
When using the opentelemetry configuration with gloo the resource attribute isn't having the
service.name
set. It's currently being set asunknown
Steps to reproduce the bug
Follow the OpenTelemetry Tracing guide: https://docs.solo.io/gloo-edge/latest/guides/observability/tracing/otel/
And in Step 8 you will see the following in the logs:
Relevant configuration:
Expected Behavior
Spans and traces should have the resource attribute of
service.name
properly.Additional Context
No response
┆Issue is synchronized with this Asana task by Unito
The text was updated successfully, but these errors were encountered: