Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When use_tasks is true, Netty seemingly can supply a freed ByteBuf to SslHandler::decode #680

Closed
ljnelson opened this issue Nov 18, 2021 · 4 comments · Fixed by netty/netty#11854

Comments

@ljnelson
Copy link

I am not sure whether it is more appropriate to file the bug here or in the main Netty repository. I'll file it here because I think any fix will have to be done in the C code.

Both grpc-java and Helidon (disclaimer: I work on it, but not in the area affected) tried to upgrade Netty past 4.1.63.Final with no success. The root cause of both projects' tests' failures seems to be related to the conditional blocks in sslcontext.c that perform different behaviors depending on whether use_tasks is true or not. I haven't gone deeper than that.

grpc-java issue: grpc/grpc-java#8605 (comment)

Helidon issue: helidon-io/helidon#3423 (comment)

In both issues, I can cause the failing tests in question to succeed by setting the Netty Java system property io.netty.handler.ssl.openssl.useTasks to false. This System property was introduced by netty/netty#11242 in version 4.1.64.Final.

The presenting issue seems to be that this results in a ByteBuf being supplied to SslHandler::decode whose refCnt() method returns 0, so SslHandler tries to read something from it and gets an IllegalReferenceCountException.

I can't speak for the grpc-java project, but in Helidon we noticed this problem only when doing mutual authentication. I believe those are the circumstances under which the grpc-java test fails as well, since the failure outputs io.grpc.netty.NettyClientTransportTest > tlsNegotiationServerExecutorShouldSucceed FAILED.

In Helidon's case, and maybe in grpc-java as well?, we're using OpenSSL and so I wonder if this is also related to netty/netty#11489.

@normanmaurer
Copy link
Member

Thanks I will have a look.

@normanmaurer
Copy link
Member

@ljnelson I have found the root-cause and fixed it (its a bug in netty itself). Currently working on a unit test, so stay tuned.

normanmaurer added a commit to netty/netty that referenced this issue Nov 23, 2021
…tException

Motivation:

These days our native SSL implementation makes use of "tasks" to allow off-loading operations. Due a bug this could lead to an re-entrancy issue when the Executor used is directly executing the task by the same thread that submits it. As an end-result it was possible to see IllegalReferenceCountException when the tryDecode(...) method will cause the original buffer to be released.

Modifications:

- Always submit the resume to the eventloop to ensure we never have an re-entrancy problem
- Add unit tests that would fail before the fix.

Result:

Fixes netty/netty-tcnative#680
normanmaurer added a commit to netty/netty that referenced this issue Nov 24, 2021
…tException (#11854)

Motivation:

These days our native SSL implementation makes use of "tasks" to allow off-loading operations. Due a bug this could lead to an re-entrancy issue when the Executor used is directly executing the task by the same thread that submits it. As an end-result it was possible to see IllegalReferenceCountException when the tryDecode(...) method will cause the original buffer to be released.

Modifications:

- Always submit the resume to the eventloop to ensure we never have an re-entrancy problem
- Add unit tests that would fail before the fix.

Result:

Fixes netty/netty-tcnative#680
normanmaurer added a commit to netty/netty that referenced this issue Nov 24, 2021
…tException (#11854)

Motivation:

These days our native SSL implementation makes use of "tasks" to allow off-loading operations. Due a bug this could lead to an re-entrancy issue when the Executor used is directly executing the task by the same thread that submits it. As an end-result it was possible to see IllegalReferenceCountException when the tryDecode(...) method will cause the original buffer to be released.

Modifications:

- Always submit the resume to the eventloop to ensure we never have an re-entrancy problem
- Add unit tests that would fail before the fix.

Result:

Fixes netty/netty-tcnative#680
@ljnelson
Copy link
Author

ljnelson commented Dec 7, 2021

Is there a rough idea of when 4.1.71.Final (containing this fix) will be released? Thanks for your work.

@normanmaurer
Copy link
Member

@ljnelson yes... this week

laosijikaichele pushed a commit to laosijikaichele/netty that referenced this issue Dec 16, 2021
…tException (netty#11854)

Motivation:

These days our native SSL implementation makes use of "tasks" to allow off-loading operations. Due a bug this could lead to an re-entrancy issue when the Executor used is directly executing the task by the same thread that submits it. As an end-result it was possible to see IllegalReferenceCountException when the tryDecode(...) method will cause the original buffer to be released.

Modifications:

- Always submit the resume to the eventloop to ensure we never have an re-entrancy problem
- Add unit tests that would fail before the fix.

Result:

Fixes netty/netty-tcnative#680
laosijikaichele pushed a commit to laosijikaichele/netty that referenced this issue Dec 16, 2021
…tException (netty#11854)

Motivation:

These days our native SSL implementation makes use of "tasks" to allow off-loading operations. Due a bug this could lead to an re-entrancy issue when the Executor used is directly executing the task by the same thread that submits it. As an end-result it was possible to see IllegalReferenceCountException when the tryDecode(...) method will cause the original buffer to be released.

Modifications:

- Always submit the resume to the eventloop to ensure we never have an re-entrancy problem
- Add unit tests that would fail before the fix.

Result:

Fixes netty/netty-tcnative#680
raidyue pushed a commit to raidyue/netty that referenced this issue Jul 8, 2022
…tException (netty#11854)

Motivation:

These days our native SSL implementation makes use of "tasks" to allow off-loading operations. Due a bug this could lead to an re-entrancy issue when the Executor used is directly executing the task by the same thread that submits it. As an end-result it was possible to see IllegalReferenceCountException when the tryDecode(...) method will cause the original buffer to be released.

Modifications:

- Always submit the resume to the eventloop to ensure we never have an re-entrancy problem
- Add unit tests that would fail before the fix.

Result:

Fixes netty/netty-tcnative#680
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants