Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenTelemetry support via ActivitySource #1261

Merged
merged 1 commit into from Jan 26, 2024

Conversation

stebet
Copy link
Collaborator

@stebet stebet commented Oct 11, 2022

Proposed Changes

This is the basic implementation to add OpenTelemetry support to the .NET RabbitMQ Client. This implements an ActivitySource that creates and propagates Activities for BasicPublish (send), BasicDeliver (receive) and when consumers receive messages (process). This enables distributed tracing scenarios such as showing traces. Below is a screenshot from Honeycomb.io to show what happens when tracing is enabled.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

Further Comments

This should have minimal overhead when tracing is not enabled since no Activity generation occurs and no tags need to be populated.

Currently, to propagate the trace and span ids to correlate events, I'm taking advantage of the BasicProperties.Headers to send and parse the information needed to correlate individual traces.

Here is a screenshot from Honeycomb of what this can look like:
image

@stebet
Copy link
Collaborator Author

stebet commented Oct 11, 2022

It would be great to get some feedback from @cijothomas, @MikeGoldsmith and @martinjt if they have time. I think I've populated the most important tags that are easy to get, but in case I'm missing some, it'd be great to get feedback.

@lukebakken
Copy link
Contributor

Hello @stebet - would you have a moment to give an update here?

@stebet
Copy link
Collaborator Author

stebet commented Oct 13, 2022

This should hopefully be ready now. Feel free to give it another pass @cijothomas if you have time :) @bollhals might find this interesting as well.

@michaelklishin michaelklishin added this to the 7.0.0 milestone Oct 13, 2022
@bollhals
Copy link
Contributor

This should have minimal overhead when tracing is not enabled since no Activity generation occurs and no tags need to be populated.

Would you mind running some tests on this (with and without tracing enabled)? I suspect some boxing occurring even in the disabled case.
For the enabled case, this will cause significant impact (performance and allocation wise), as it has lots of tostrings and boxing and stuff.

One question I had was do we need a separate option to enable it? E.g. my business has tracing enabled, but we'd might not want to know this level of detail from RabbitMQ.

@stebet
Copy link
Collaborator Author

stebet commented Oct 13, 2022

Would you mind running some tests on this (with and without tracing enabled)? I suspect some boxing occurring even in the disabled case. For the enabled case, this will cause significant impact (performance and allocation wise), as it has lots of tostrings and boxing and stuff.

Yeah, will do. Although even when tracing is enabled, if the activity isn't sampled, it'll still skip A LOT of the processing since it checks first if the Activity is requesting all data.

B.t.w, overhead is expected, but the APIs are very fast and used for both runtime stuff (SQL client, HTTP client) as well as as third party libraries like StackExchange.Redis so i.m.o the overhead when tracing is enabled is acceptable and known. That's why Sampling is a built-in feature for Activities.

One question I had was do we need a separate option to enable it? E.g. my business has tracing enabled, but we'd might not want to know this level of detail from RabbitMQ.

Yes. For example if you are using the OpenTelemetry libraries you'll need to do something like this:

builder.Services.AddOpenTelemetryTracing(builder =>
{
    builder
    .Configure((IServiceProvider serviceProvider, TracerProviderBuilder traceBuilder) =>
    {
        traceBuilder
        .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("test-app", serviceVersion: "1.0.0"))
        .AddSource("RabbitMQ.Client") // Enable tracing for the RabbitMQ.Client library. Not setting this causes no activities to be created within the library.
        .AddAspNetCoreInstrumentation(options => options.RecordException = true)
        .... // more initialization stuff here
        ;
    });
});

@bollhals
Copy link
Contributor

Would you mind running some tests on this (with and without tracing enabled)? I suspect some boxing occurring even in the disabled case. For the enabled case, this will cause significant impact (performance and allocation wise), as it has lots of tostrings and boxing and stuff.

Yeah, will do. Although even when tracing is enabled, if the activity isn't sampled, it'll still skip A LOT of the processing since it checks first if the Activity is requesting all data.

B.t.w, overhead is expected, but the APIs are very fast and used for both runtime stuff (SQL client, HTTP client) as well as as third party libraries like StackExchange.Redis so i.m.o the overhead when tracing is enabled is acceptable and known. That's why Sampling is a built-in feature for Activities.

Yes, sure it's expected if you enable such a feature, but there are still things we could optimize for to avoid allocations. But this can also wait for later. But we should look at overhead of the two other options of no tracing enabled and tracing enabled but not for RabbitMQ.

One question I had was do we need a separate option to enable it? E.g. my business has tracing enabled, but we'd might not want to know this level of detail from RabbitMQ.

Yes. For example if you are using the OpenTelemetry libraries you'll need to do something like this:

builder.Services.AddOpenTelemetryTracing(builder =>
{
    builder
    .Configure((IServiceProvider serviceProvider, TracerProviderBuilder traceBuilder) =>
    {
        traceBuilder
        .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("test-app", serviceVersion: "1.0.0"))
        .AddSource("RabbitMQ.Client") // Enable tracing for the RabbitMQ.Client library. Not setting this causes no activities to be created within the library.
        .AddAspNetCoreInstrumentation(options => options.RecordException = true)
        .... // more initialization stuff here
        ;
    });
});

Ah yes, forgot about it, as I only had to do it once for our project :D Thanks for reminding me.

@stebet
Copy link
Collaborator Author

stebet commented Oct 21, 2022

@bollhals
I did some simple benchmarks with the MassTest app modified to do 1000 batches (up from 100) with 500 items each, and seems the overhead is minimal (after I fixed a small bug):

Current master build
image

ActivitySource build with OpenTelemetry initialized but not listening for RabbitMQ.Client events
image

@xsoheilalizadeh
Copy link

Hi @stebet, thanks for making this PR. Do you think we could have it in the next coming months?

@stebet
Copy link
Collaborator Author

stebet commented Mar 13, 2023

Rebased this on the latest master and removed the need for OpenTelemetry.API reference as suggested by @rwkarg :)

Copy link

@Kielek Kielek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a lot of changes related to tag-naming for messaging. Still, it is not stable version, but it reflects current state.

projects/RabbitMQ.Client/client/impl/Connection.cs Outdated Show resolved Hide resolved
projects/RabbitMQ.Client/client/impl/Connection.cs Outdated Show resolved Hide resolved
projects/RabbitMQ.Client/client/impl/Connection.cs Outdated Show resolved Hide resolved
projects/RabbitMQ.Client/client/impl/Connection.cs Outdated Show resolved Hide resolved
projects/Unit/ActivitySource/TestActivitySource.cs Outdated Show resolved Hide resolved
@stebet
Copy link
Collaborator Author

stebet commented Mar 14, 2023

Updated to be in line with current spec. Thanks @Kielek, would love it if you could verify that I got things correcly :)

Copy link

@Kielek Kielek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One obvious typo.
I am not an subject expert for messaging system so please double check the second one.

Kielek
Kielek previously approved these changes Mar 14, 2023
Copy link

@Kielek Kielek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traces` attributes LGTM

@stebet
Copy link
Collaborator Author

stebet commented Jan 3, 2024

I would leave out the baggage support completely and document it as such. This PR is big enough that we don't need to block on that one feature. Let's open a new issue and come up with something reasonable outside of this PR.

In that case, this PR is pretty much ready to go @michaelklishin @lukebakken. Documentation/examples is handled in another repo AFAIK. I'm assuming there is still some time before a new stable major version of the NuGet package (which would include this PR) is released?

That should give us tome to iterate a bit on this and get better baggage handling, otel metrics etc. in.

{
public static class RabbitMQActivitySource
{
// These constants are defined in the OpenTelemetry specification:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would you point to a specific version, as these conventions are not stable ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @stebet ?

Copy link

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
Would you add examples/docs as a follow up? The name of the ActivitySource used must be documented so OTel consumers can enable this instrumentation.
Also - please call out that this relies on experimental semantic convention and is subject to change, so users are not caught with surprise if the conventions change before they stabilize.

@rabbitmq rabbitmq deleted a comment from julealgon Jan 24, 2024
@rabbitmq rabbitmq deleted a comment from julealgon Jan 24, 2024
@lukebakken
Copy link
Contributor

If people need this feature soon, a user has found the following workaround that uses AMQP headers:

https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/examples/MicroserviceExample

@lukebakken
Copy link
Contributor

@cijothomas I personally know next to nothing about OpenTelemetry and the changes in this PR, so someone who is more familiar will have to address your concerns:

  • Adding examples/docs. The name of the ActivitySource used must be documented so OTel consumers can enable this instrumentation
  • Call out that this relies on experimental semantic convention and is subject to change, so users are not caught with surprise if the conventions change before they stabilize.

@lukebakken
Copy link
Contributor

@stebet I'm merging the latest main into this PR at the moment.

@lukebakken
Copy link
Contributor

@stebet I have completed rebasing this PR on main ... please go over the code again to make sure I didn't change something inadertently.

@stebet
Copy link
Collaborator Author

stebet commented Jan 25, 2024

@stebet I have completed rebasing this PR on main ... please go over the code again to make sure I didn't change something inadertently.

I'll take a look in a bit.

@lukebakken
Copy link
Contributor

@stebet everything seems to be fine, except that the Test.Integration.TestActivitySource.TestPublisherAndBasicGetActivityTags is flaky on GitHub actions - https://github.com/rabbitmq/rabbitmq-dotnet-client/actions/runs/7658052226/job/20870377811

Of course it works every time on my machine.

@stebet
Copy link
Collaborator Author

stebet commented Jan 26, 2024

@stebet everything seems to be fine, except that the Test.Integration.TestActivitySource.TestPublisherAndBasicGetActivityTags is flaky on GitHub actions - https://github.com/rabbitmq/rabbitmq-dotnet-client/actions/runs/7658052226/job/20870377811

Of course it works every time on my machine.

I'll take a look now and see if I see anything obvious

@stebet
Copy link
Collaborator Author

stebet commented Jan 26, 2024

@lukebakken Moved the Activity tests over to the sequential tests since they are manipulating static variables.

@davidfowl
Copy link

Glad to see this making progress!

… enable OpenTelemetry scenarios.

Linking existing context for publish if one exists

dotnet format

Adding standard tags without allocating.

Updating code after comments

Moving ActivitySource tests to Integration

Making sure TaskCompletionSources execute asynchronously

Moving activity source tests to sequential integration
Copy link
Contributor

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stebet if you have time to address these comments that would be great:

@stebet
Copy link
Collaborator Author

stebet commented Jan 26, 2024

Will do.

  • Documentation/examples is handled in another repo AFAIK - @stebet are you referring to the repo that is used to create this web site?

Yes.

@lukebakken
Copy link
Contributor

@cijothomas I would appreciate example code and doc text that I can use to update the web site. Thanks.

#1478

@lukebakken lukebakken merged commit 220f5a5 into rabbitmq:main Jan 26, 2024
11 checks passed
@davidfowl
Copy link

Excellent! I think we'd like to get this backported to the older version. Before we do that though, is it possible to get a preview build that we can ship with an aspire preview so we can get some mileage on it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evaluating how to support tracing and OpenTelemetry