Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reworking context propagation for Netty 4.0 #2323

Merged
merged 2 commits into from Feb 22, 2021

Conversation

iNikem
Copy link
Contributor

@iNikem iNikem commented Feb 17, 2021

I started to attack my old grudge against what context we propagate in our Netty instrumentation.

This PR changes one important aspect of Netty instrumentation. Before, we remembered the context that was active when Netty establishes a network connection. Now we remember the context that was active during write operation on the channel. This is the lowest level operation in Netty to trigger the actual request execution.

Now, integrations with specific http client libraries which are based on Netty have to ensure the context propagation from request initiation method in their high level API to the point where library's implementation calls Netty. To demonstrate this I added such propagation to Vert.x http client library. This allowed VertxHttpClientTest to pass high concurrency test that failed before.

Unfortunately, this broke Play 2.4 integration :D. I plan to write the similar integration for it as a follow up PR.

This PR also rewrites Netty40ClientTest to actually use Netty, and not higher level AsyncHttpClient.

Similar PR for Netty 4.1 will also follow if this gets approved.

Copy link
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! If the tests are passing then this is a net improvement. It wasn't immediately clear to me what write*() methods were actually being targeted, so maybe some clarification around that (comments?) is in order.

@Override
public Map<? extends ElementMatcher<? super MethodDescription>, String> transformers() {
return singletonMap(
isMethod().and(nameStartsWith("write")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't see any write method on Channel but instead on its inner interface Channel$Unsafe. I see several write* methods on things like AbstractChannel, but those are implementing ChannelOutboundInvoker. I'm confused about this working as intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://netty.io/4.0/api/index.html -> io.netty.channel -> Channel -> 4 write methods :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're much more trusting of docs than I am. :). Here is the sourcehttps://github.com/netty/netty/blob/4.0/transport/src/main/java/io/netty/channel/Channel.java but I think my problem is that I didn't expect such a big change from 4.0 -> 4.1...those write methods are moved in 4.1: https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/Channel.java

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... wait.. I have to go check my 4.1 PR :) But this one is good, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, and I failed to realize there was even a separate one. 😎

@trask
Copy link
Member

trask commented Feb 22, 2021

😅😅😅

thoughts:

It's not clear to me (yet) that WRITE is the best place to attach the context to the netty channel, because WRITE can be called multiple times in multiple ways (e.g. if we use WRITE, I think the vertx HttpClientRequest instrumentation may need to attach context on more methods, e.g. sendHead and the public write* methods?)

I see why you migrated away from attaching the context to the netty channel in CONNECT though, as some of those calls under concurrency look painful to propagate down to properly:

	at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908)
	at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)
	at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
	at java.base/java.lang.Thread.run(Thread.java:834)

I also looked at attaching to the netty channel in the existing ChannelPipelineAddAdvice in case that might be better. Doing this, the problematic (under concurrency) call stack is better (I only tried with the vertx test):

	at io.netty.channel.DefaultChannelPipeline.addLast(DefaultChannelPipeline.java:244)
	at io.netty.bootstrap.Bootstrap.init(Bootstrap.java:182)
	at io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:308)
	at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:134)
	at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:116)
	at io.vertx.core.http.impl.HttpClientImpl.internalConnect(HttpClientImpl.java:653)
	at io.vertx.core.http.impl.HttpClientImpl.access$000(HttpClientImpl.java:59)
	at io.vertx.core.http.impl.HttpClientImpl$1.connect(HttpClientImpl.java:91)
	at io.vertx.core.http.impl.ConnectionManager$ConnQueue.createNewConnection(ConnectionManager.java:154)
	at io.vertx.core.http.impl.ConnectionManager$ConnQueue.connectionClosed(ConnectionManager.java:170)
	at io.vertx.core.http.impl.HttpClientImpl.lambda$createConn$72(HttpClientImpl.java:722)
	at io.vertx.core.net.impl.ConnectionBase.handleClosed(ConnectionBase.java:158)
	at io.vertx.core.http.impl.ClientConnection.handleClosed(ClientConnection.java:299)
	at io.vertx.core.impl.ContextImpl.lambda$wrapTask$15(ContextImpl.java:312)
	at io.vertx.core.impl.ContextImpl.executeFromIO(ContextImpl.java:217)
	at io.vertx.core.net.impl.VertxHandler.channelInactive(VertxHandler.java:99)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208)

ConnectionManager$ConnQueue has tricky threading pattern around waiters, though there is io.vertx.core.impl.ContextImpl which could help?

oh no, and I just saw the pipelining option in ConnectionManager$ConnQueue. that doesn't sound good for us...

@iNikem
Copy link
Contributor Author

iNikem commented Feb 22, 2021

I think the vertx HttpClientRequest instrumentation may need to attach context on more methods, e.g. sendHead

That may be true. I thought about this and decided to leave it for now to be "complaint-driven". As am not an Vert.x expert, I am comfortable with waiting for bug reports with clear use-cases.

"Connect" is certainly a wrong time to attach context, because channels/connections can be reused. Thus "writing" the specific request is IMO the correct time to capture the context of that request. This also makes it very clear how to write an integration with higher level http APIs: one have to propagate context from the higher level API for making a http request to the point where this request is being written to Netty channel. And this should be very easy by attaching context to that very same request object.

Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear when netty channels are reused. Are they reused on keep-alive? I don't recall seeing any channel reuse in the vertx client test, maybe our TestHttpServer doesn't implement keep alives? It would be super helpful to add keep alive to the basic netty client test so we could see/verify the channel reuse behavior. And pipelining would still be broken I assume, even with the new changes? Anyways, definitely a huge improvement to fix the concurrency test failures 👍

@iNikem
Copy link
Contributor Author

iNikem commented Feb 22, 2021

"Connection reuse" tests are next in my todo list. I will start thinking about how to introduce them into HttpClientTest this week.

@iNikem iNikem merged commit 573e512 into open-telemetry:main Feb 22, 2021
@iNikem iNikem deleted the vertx-concurrency branch February 22, 2021 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants