Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extension: Add Span::setTag for setting attributes in OTLP Tracer #21893

Merged
merged 1 commit into from
Jun 27, 2022

Conversation

AlexanderEllis
Copy link
Contributor

Signed-off-by: Alex Ellis ellisonjtk@gmail.com

Commit Message: Add Span::setTag for setting attributes in OTLP Tracer
Additional Description: Adding attributes to the OTLP traces generated in the OTLP tracer (see #9958).

Here's a sampling of the attributes Envoy creates in a basic test setup (Envoy making a child span for its call to the Nighthawk test server) by just adding this method:

image

Risk Level: low
Testing: Added a unit test (with fleshed out proto expectations)
Docs Changes: N/A
Part of #9958

Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
@repokitteh-read-only
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #21893 was opened by AlexanderEllis.

see: more, trace.

@AlexanderEllis
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Check envoy-presubmit isn't fully completed, but will still attempt retrying.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #21893 (comment) was created by @AlexanderEllis.

see: more, trace.

@AlexanderEllis
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #21893 (comment) was created by @AlexanderEllis.

see: more, trace.

@AlexanderEllis AlexanderEllis marked this pull request as ready for review June 26, 2022 00:35
@AlexanderEllis
Copy link
Contributor Author

Hi @htuch! When this auto-added you for review, I realized I added you for this extension under CODEOWNERS in the last PR without checking with you — happy to update that if you'd like, as I'm sure you get plenty of review requests already :)

@moderation
Copy link
Contributor

@AlexanderEllis initial testing and it looks like this is working well. Fantastic! And thanks for working on this

@AlexanderEllis
Copy link
Contributor Author

Awesome, glad to hear it, and thanks for testing it out!

@moderation
Copy link
Contributor

One thing I'm stuck on is the health check filter. I have a cluster health check on my front Envoy

                "health_checks": [
                    {
                        "always_log_health_check_failures": true,
                        "healthy_threshold": 2,
                        "http_health_check": {
                            "path": "/z/",
                            "codec_client_type": "HTTP2",
                            "service_name_matcher": {
                                "exact": "sync_service"
                            }
                        },
                        "interval": "5s",
                        "interval_jitter": "0.5s",
                        "timeout": "1s",
                        "unhealthy_threshold": 2
                    }
                ],

and then a health check filter on the upstream Envoy

                                        {
                                            "name": "envoy.filters.http.health_check",
                                            "typed_config": {
                                                "@type": "type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck",
                                                "headers": [
                                                    {
                                                        "name": ":path",
                                                        "string_match": {
                                                            "exact": "/z/"
                                                        }
                                                    }
                                                ],
                                                "pass_through_mode": false
                                            }
                                        },

But the OpenTelemetry Collector is still seeing and sending health checks with user_agent Envoy/HC.

Feels like this should be handled in the HCM. I'm pretty sure the filter worked when using the Lightstep tracer previously.

Any ideas?

@htuch
Copy link
Member

htuch commented Jun 27, 2022

LGTM, but we probably should have a non-Googler maintainer consistently reviewing these as well, @mattklein123 can you take a look at this short PR as you also reviewed the main OT extension?

I'm happy to remain CODEOWNERS, but it's possible @yanavlasov or someone else in our Envoy Platform team might take it if they have cycles.

@mattklein123 mattklein123 self-assigned this Jun 27, 2022
@AlexanderEllis
Copy link
Contributor Author

AlexanderEllis commented Jun 27, 2022

@moderation Thanks for calling that out! Just to make sure I'm on the same page, you have your front Envoy health checking your upstream Envoy, and the OTel collector is not seeing the traces from that check with the right user agent? I'm going to be afk this week, but I should be able to dig in and try that setup out myself after the 4th 👍

Edit: ah I think I may see what's going on: you're seeing the traces, but you don't want to, since it's just a health check. From a quick check, I think this is because this new OTel tracer isn't checking each Span's isSampled before it sends it. Contrast that with the Zipkin tracer, which checks first. I think this is the culprit because the health check filter sets sampled to false in order to prevent the trace from being sent (code pointer, PR where this was added), but it's not much help if we don't check it ourselves! I can update this as well, and I'll add a comment to the bug to track.

@htuch Sounds great to me, thanks!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines +69 to +76
// Attribute keys MUST be unique.
// If a value already exists for this key, overwrite it.
for (auto& key_value : *span_.mutable_attributes()) {
if (key_value.key() == name) {
key_value.mutable_value()->set_string_value(std::string{value});
return;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of N^2 behavior seems sub-optimal, though I guess there aren't usually that many tags. It might be worth it to optimize this later.

@moderation
Copy link
Contributor

@AlexanderEllis yep your edit is exactly right. The health checks are being traced. A check like the Zipkin tracer should fix this. Thanks again

Amila-Rukshan pushed a commit to Amila-Rukshan/envoy that referenced this pull request Jun 28, 2022
…voyproxy#21893)

Add Span::setTag for setting attributes on OTLP span

Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Amila Senadheera <amila.15@cse.mrt.ac.lk>
@AlexanderEllis AlexanderEllis deleted the otel-tracer branch July 7, 2022 01:01
@AlexanderEllis
Copy link
Contributor Author

@moderation Should be all set with #22047 merged in 👍 Thanks again for trying this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants