Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: instruments finagle's netty-based stack #10141

Merged
merged 28 commits into from
Feb 15, 2024

Conversation

dmarkwat
Copy link
Contributor

@dmarkwat dmarkwat commented Dec 29, 2023

Instruments the twitter finagle http (netty-based) stack for otel tracing, building on the existing netty-4 instrumentation. This includes the necessary bits in finagle-netty4, finagle-http2, and finagle-base-http modules. finagle-http is the aggregate module and so this segments along those lines.

Covered explicitly:

  • server & client http/1.1 tracing
  • server & client http/2 tracing (limited -- see doc in code)
  • netty-4 HTTP 101 (Switching Protocols) handling

Covered implicitly (by other instrumentations, e.g.), and therefore not explicitly instrumented:

  • twitter's Futures (doc) are covered by the existing Java Executor instrumentations, as everything bound for a Future via all the typical twitter/finagle means end up in appropriately-wrapped Callables, Runnables, etc.
  • down-stack Netty integration is already thoroughly covered; no attempt was made here to reimplement any of the existing otel bits or yield yet another full-blown framework instrumentation

Not covered:

  • finagle's other protocol integrations, such as thrift
  • http/2-specific finagle tests: the armeria-based underlying framework seems to handle only http/1.1 and would otherwise have required quite the overhaul; as finagle uses the http2-to-1 netty codec (already assumed by netty's own instrumentation), this seemed like an OK thing to omit at the onset -- to be handled, perhaps, at a later time

A word on netty HTTP 101 support

I'm no HTTP ref spec expert, but based on the documentation for Protocol upgrade mechanism provided by MDN I pieced together the handling of the response code as presented by a netty-based server and experienced by a netty-based client to not end the span when 101 is seen as the existing netty instrumentation would normally do.

Why: simple reason is because status code 101 -- unlike the others, afaict -- represents a response continuation (preamble?) rather than a termination, and the existing netty instrumentation treats all status codes as terminations. IOW, the existing netty instrumentation unsuccessfully handles these 101s, which finagle emits (afaict, correctly), and afterwards -- upon seeing the final response -- tries to set attributes on already-ended Spans, resulting in test failures.

So given the situation, there seemed to me to be two options: a) try to rewrite things to handle this condition a bit more fancily (i.e. statefully), or b) simply emit this as an event. I chose the latter, simply because in the course of creating and consuming 101s, it strikes me as more a preamble -- and this can't be stressed enough -- without any client acknowledgement, or reciprocation with, the server. And while it's indeed bytes sent back from a server to a client, it's more "cumulative metadata" than trace-worthy and so warranted capturing as something -- just not a full span.

Very much curious about any thoughts regarding this, as option (a) to rewrite things seemed far less straightforward.

@dmarkwat dmarkwat requested a review from a team as a code owner December 29, 2023 00:47
@dmarkwat dmarkwat marked this pull request as draft December 29, 2023 17:45
@dmarkwat
Copy link
Contributor Author

Looking into netty test failures before considering this ready for review.

@dmarkwat dmarkwat marked this pull request as ready for review December 29, 2023 20:33
@dmarkwat
Copy link
Contributor Author

Netty errors addressed -- was my bad. Ready for review.

@dmarkwat
Copy link
Contributor Author

@laurit I made changes according to your suggestions. At this point, to my eye the finagle side looks good, lean, and hopefully more intelligible.

There's still the open netty questions. When you get a chance, can you provide some direction on what you might like to see wrt those? Specifically:

  • would you like to see the netty server/client wrappers moved over to netty proper, or simply leave them as is? (per my above question in this comment)
  • simply remove the event I'm creating? (your question here) This would tidy up the code and avoid (perhaps needlessly) creating an event which is as of now undocumented.

Copy link
Contributor

@laurit laurit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmarkwat
Copy link
Contributor Author

Review feedback applied, doc updated, and rebased. Just need an answer to #10141 (comment) and possibly recommendations on how to proceed.

@dmarkwat
Copy link
Contributor Author

dmarkwat commented Feb 6, 2024

Thank you for the time and attention 🙏. I've rebased and checked back on all the docs and notes I've left -- happy with where this landed. I'll get over to semantic-conventions with the question about HTTP 101, but otherwise eager to see this finally make it if there's nothing left you want me to address.

@trask trask added this to the v2.1.0 milestone Feb 13, 2024
@trask trask merged commit 205100e into open-telemetry:main Feb 15, 2024
47 checks passed
steverao pushed a commit to steverao/opentelemetry-java-instrumentation that referenced this pull request Feb 16, 2024
Co-authored-by: Lauri Tulmin <ltulmin@splunk.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants