Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a http/protobuf opentelemetry reporter #1102

Merged
merged 26 commits into from
Apr 4, 2022

Conversation

hughsimpson
Copy link
Contributor

@hughsimpson hughsimpson commented Feb 1, 2022

Existing opentelemetry reporter only supports grpc. This adds a new reporter to support the http/protobuf protocol. I'm keeping it in a separate module to the grpc opentelemetry reporter because it has a completely different set of dependencies, and by virtue of directly using the opentelemetry java sdk, quite a different structure. replaces the existing approach to more heavily leverage the opentelemetry sdk. It should provide a more elegant approach to introducing log and metric bindings in the future

@hughsimpson hughsimpson changed the title http/protobuf for otel add a http/protobuf opentelemetry reporter Feb 1, 2022
@ivantopo
Copy link
Contributor

ivantopo commented Feb 2, 2022

Hey @hughsimpson, this is a really amazing piece of work! I'm really happy to see this now because we are recently looking at checking all the boxes for becoming "otel-compatible". Timing couldn't be better 😄

Now, one little thing that bugs me here is that there is certain amount of logic dedicated to translating from Kamon's model to OTLP, and that logic will now be repeated in both reporters. Have you experimented with the idea of using this same base for both OTLP/grpc and OTLP/http? In my head it sounds like the translation to protobuf-generated classes should be the same in both cases and then only transport will be a different concern, but I didn't dig in the code yet to figure it out.

Also, I would like to bring @pnerg around in case he has the time to check things out. He contributed the otel reporter we have at the moment.

@pnerg
Copy link
Contributor

pnerg commented Feb 3, 2022

On the surface the grpc and http reporters are very similar.
The obvious difference is of course the protocol but also the carrier object into which we convert from the internal Kamon span.
In the grpc variant we import

import io.opentelemetry.proto.common.v1.{AnyValue, InstrumentationLibrary, KeyValue}
import io.opentelemetry.proto.resource.v1.Resource
import io.opentelemetry.proto.trace.v1.{InstrumentationLibrarySpans, ResourceSpans, Status, Span => ProtoSpan}

The counterpart in this code is

import io.opentelemetry.sdk.common.InstrumentationLibraryInfo
import io.opentelemetry.sdk.resources.Resource
import io.opentelemetry.sdk.trace.data.SpanData

It would of course be possible abstract the actual protocol layer into own modules (like they are now) and keep a common library (kamon-otel-common-lib) for the OpenTelemetryHttpTraceReporter and SpanWrapper doing the conversion from Kamon data to io.opentelemetry.sdk.common. The actual protocol layer class would need to be pluggable.

But that would require refactoring the TraceService class from the grpc variant to accept carrier classes from io.opentelemetry.sdk.common package

in essence something like this

kamon-otel-common-lib
  + OpenTelemetryTraceReporter
  + SpanWrapper
kamon-otel-grpc
  + GrpcTraceService
kamon-otel-http
  + HttpProtoTraceService

@hughsimpson
Copy link
Contributor Author

hughsimpson commented Feb 4, 2022

Thanks for taking a look at this! I wasn't expecting such a prompt response, or I'd have checked back here sooner! I did have a think about how this could be combined with grpc reporting -- I think it would just be a matter of also including the opentelemetry-exporter-otlp-trace library in the dependencies and switching the delegate to OtlpGrpcSpanExporter based on the protocol configuration, which would indeed permit deduplication of the modules (and would let us acknowledge the otel.exporter.otlp.protocol java property if set). I didn't push that up here because I felt like deleting the entire existing module would be a little contentious 😅 . Perhaps that could be subsequently explored? Let me know how you want to proceed, anyway.

It did occur to me that there's a marginal inefficiency in translating to the otel internal format before serialisation, rather than translating directly as the existing grpc one does, but thought it was probably worth it to reduce the cost of maintaining compatibility if there are any future changes to the standard... Perhaps this was the wrong approach and I should have looked into using okhttp directly as an alternative transfer agent in the existing module 🤔 I don't know how much time I'll have to take another look at this next week but I'll try to have a stab at that approach and see if it was easier than I made it out the first time around

PS have pushed a fix for building under scala 2.11. I forgot to test that locally 🤦

@ivantopo
Copy link
Contributor

ivantopo commented Feb 8, 2022

Thanks for taking a look at this! I wasn't expecting such a prompt response

You happen to do this precisely when I'm deep into bringing support for OpenTelemetry, both in Kamon Telemetry and Kamon APM. So, interestes aligned 😂

I think it would be ideal to have both reporters in the same artifact so that we can implement the configurations that all OTLP exporter must accept. I didn't dig in the full details, but your comment about maybe using OkHttp directly sounds like something that could be added to the existent reporter 🤔 That would be a nice thing to explore.

BTW, remember you can always come over to our Discord and bounce ideas before getting deep into them. I'm hanging out there every day.

@hughsimpson
Copy link
Contributor Author

hughsimpson commented Feb 9, 2022

OK so I have an alternative take on this in #1106 -- I'm not sure I prefer it, since there's an awful lot of re-implementation of the stuff we can get for free from the otel sdk (this proves a lot nicer if you try implementing metrics reporting, for example) ... For a fairer comparison, I've pushed to this branch with the entire opentelemetry-reporter module migrated to the sdk

@ivantopo
Copy link
Contributor

Out of this and #1106, I think this PR seems to be more aligned with what we expect in the near future, including metrics. Are you guys fine with closing #1106 and focusing on getting this one merged?

@hughsimpson
Copy link
Contributor Author

hughsimpson commented Feb 15, 2022

I personally agree with that decision @ivantopo . If @pnerg has spotted something, then I guess I could change my mind, but will leave it at that for now

@hughsimpson
Copy link
Contributor Author

I see I dropped some META-INF stuff from this pr by mistake. I'm guessing it's important and should be re-added? Tbqh I don't know what it does

@dpsoft
Copy link
Contributor

dpsoft commented Feb 15, 2022

Hey @hughsimpson, the META-INF/native-image/* stuff is necessary in order to support Graal Native Image generation

@hughsimpson hughsimpson force-pushed the http_protobuf_for_otel branch from 7ea5515 to b6d035c Compare February 16, 2022 09:17
@hughsimpson
Copy link
Contributor Author

OK. I've re-included those dropped files, reverted some naming changes I'd left in by miskate, and added any metricsTags to the reported attributes. Think this might be ready to go in / get some more detailed review

@hughsimpson
Copy link
Contributor Author

I noticed some differences between how this reports errors in traces and how other sdks do it. Have made a related pr to increase compatibility for that (#1112)

Copy link
Contributor

@ivantopo ivantopo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @hughsimpson, there are a few minor changes to make and after that it kind of looks good to me.

Do you have any experience to share from using a private build of this reporter?

@ivantopo
Copy link
Contributor

Something that just came to mind.. this PR is creating a single resource that uses Kamon as the instrumentation library, but in OTel they generate the data using each individual instrumentation "module" as the instrumentation library, so there are instrumentation library spans from a bunch of different places on each batch sent to the OTel collector. We don't have that concept, but we do have the component tag that is kind of the local equivalent of a library. I don't know how important would it be to have spans segregated by instrumentation library, but if it was, there is a way to sort of get it.

@hughsimpson
Copy link
Contributor Author

hughsimpson commented Feb 24, 2022

Thanks for the thoughts! Will start tackling tomorrow. In terms of experiences to report:

  • There's a very standard set of tags from opentelemetry, and I find that they don't always match kamon's. I've partially addressed this with the flag for attatching errors as events to spans, rather than just inline in the attributes, but there are plenty of places whether the default attribute names differ (especially around kafka). It might be worth introducing a feature flag that configured the attributes names in-bulk... although I've not put much thought into how one could sensibly do this.
  • I have spotted a couple of instances (esp. on startup) where some traces couldn't be sent and got dropped. Now that I think about it I should probably look into that a bit more 😅 It looked pretty mild but I'm sure there's some relatively trivial amelioration we could do for that - my default timeout is probably too low. That turned out to be an unrelated issue with the collector I was forwarding to
  • The kafka propagation doesn't play nicely with apps that use the default opentelemetry approach (they generally just have plaintext header values with the same names you'd see in an http trace header -- e.g. traceparent), so I've had to manually tweak that. It's possible I've overlooked some configuration there...?
  • Generally it works well! These issues are mostly ones that can be solved with some judicious wrapping of kamon usage at the local ecosystem level, and I've had relatively little trouble rolling this out. I think there's an argument to be made for making it easier to comply with the default opentelemetry attributes, but obviously nobody's gonna be happy if that changes between releases without at least an opt-in feature flag (or an opt-out and a major version bump). The fundamental mechanism seems to be pretty robust

I had initially been expecting to be getting into opentelemetry metrics by now, but that branch is rather on-hold for the time being. It seemed pretty trivial to do via the sdk, but also potentially a distraction, and not immediately vital for me. The only thing that struck me as potentially thorny there was unit conversions - but since it's a case of kamon having a richer notion of units than opentelemetry, it's probably not really an issue. Certainly I couldn't spot anything problematic in my initial spikes, but it was a very small set of test cases.

Hugh Simpson added 4 commits March 12, 2022 12:46
- Always include environment tags in otel resource
- Add service.instance.id and host.name variables to resource
- Remove the -/. replacement (not required)
- s/fullEndpoint/full-endpoint/
- Remove the protocol/host/port fields (they're unecessary given they're
  subsumed by endpoint/full-endpoint config fields)
- rename otelProtocol to protocol
@hughsimpson
Copy link
Contributor Author

@ivantopo Sorry about the delay - had a bit of a break. Have addressed your comments (nice catch about the instrumentation library... you know, I've never even really noticed that field 😅). Think it's ready for another review!

@hughsimpson
Copy link
Contributor Author

@ivantopo bump?

@hughsimpson hughsimpson requested a review from ivantopo March 25, 2022 19:39
Copy link
Contributor

@ivantopo ivantopo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @hughsimpson, I went through with a finer comb and only found two minor issues. When you push those changes I'll blindly merge the PR 😄

@hughsimpson
Copy link
Contributor Author

@ivantopo pushed fixes for those two issues. Also spotted that the StatusData message should probably be null rather than the empty string if there's no corresponding ErrorMessage tag. Tests still passing for me on all scala versions, so 🤞 hopefully ready to go in

@hughsimpson hughsimpson requested a review from ivantopo March 30, 2022 07:40
@ivantopo ivantopo merged commit 1856d5d into kamon-io:master Apr 4, 2022
@hughsimpson hughsimpson deleted the http_protobuf_for_otel branch April 4, 2022 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants