Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSLHandler may throw AssertionError if writes occur before channelAct… #8486

Merged
merged 2 commits into from
Nov 11, 2018

Conversation

normanmaurer
Copy link
Member

…ive fired

Motivation:

If you attempt to write to a channel with an SslHandler prior to channelActive being called you can hit an assertion. In particular - if you write to a channel it forces some handshaking (through flush calls) to occur.

The AssertionError only happens on Java11+.

Modifications:

  • Replace assert by an "early return" in case of the handshake be done already.
  • Add unit test that verifies we do not hit the AssertionError anymore and that the future is correctly failed.

Result:

Fixes #8479.

…ive fired

Motivation:

If you attempt to write to a channel with an SslHandler prior to channelActive being called you can hit an assertion. In particular - if you write to a channel it forces some handshaking (through flush calls) to occur.

The AssertionError only happens on Java11+.

Modifications:

- Replace assert by an "early return" in case of the handshake be done already.
- Add unit test that verifies we do not hit the AssertionError anymore and that the future is correctly failed.

Result:

Fixes #8479.
@normanmaurer
Copy link
Member Author

@tbrooks8 PTAL as well... Thanks for reporting!

@normanmaurer
Copy link
Member Author

@netty-bot test this please

@normanmaurer
Copy link
Member Author

You can ignore the JDK12 failure on the CI.

Copy link
Contributor

@bryce-anderson bryce-anderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it should work but I would have personally thought that the right thing to do would be to defer writes/flushes until the channel was active. It looks like the flush is the key reason we get to this state: is it normal to want to flush a pipeline before it's considered active?

@@ -1745,9 +1745,12 @@ public void operationComplete(Future<Channel> future) throws Exception {
// is in progress. See https://github.com/netty/netty/issues/4718.
return;
} else {
if (handshakePromise.isDone()) {
// If the handshake is done already lets just return directly as there is no need to trigger it again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include the condition under which we can expect this, specifically that the handshake must have already happened/started (and failed?) before the channelActive event was fired.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryce-anderson yep.. Let me update the comment

@normanmaurer
Copy link
Member Author

@bryce-anderson the thing is the channel is active already but we did not propagate the channelActive(...) already so yep I think this is expected to work. The thing is that all the promises will fire before the even is propagated through the pipeline, so yep its expected to work

@normanmaurer
Copy link
Member Author

@bryce-anderson made the scenario more clear in the comments... PTAL.

Copy link
Contributor

@bryce-anderson bryce-anderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take your word on the event ordering issue: I believe that is what's happening but it still seems strange not to wait on the handshake until after we get the channelActive event, but I suspect it's not a big deal.

@normanmaurer
Copy link
Member Author

@bryce-anderson yeah it is really about sequencing here.

Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this fixes my particular issue.

I have not spent a significant amount of time going through the SslHandler code in a while. But I am suspicious that there being multiple different code paths to start a handshake is introducing other issues.

If the handshake at the engine level gets kickstarted by a flush, it looks like there is other setup that does not happen:

    private void handshake(final Promise<Channel> newHandshakePromise) {
        final Promise<Channel> p;
        if (newHandshakePromise != null) {
            final Promise<Channel> oldHandshakePromise = handshakePromise;
            if (!oldHandshakePromise.isDone()) {
                // There's no need to handshake because handshake is in progress already.
                // Merge the new promise into the old one.
                oldHandshakePromise.addListener(new FutureListener<Channel>() {
                    @Override
                    public void operationComplete(Future<Channel> future) throws Exception {
                        if (future.isSuccess()) {
                            newHandshakePromise.setSuccess(future.getNow());
                        } else {
                            newHandshakePromise.setFailure(future.cause());
                        }
                    }
                });
                return;
            }

            handshakePromise = p = newHandshakePromise;
        } else if (engine.getHandshakeStatus() != HandshakeStatus.NOT_HANDSHAKING) {
            // Not all SSLEngine implementations support calling beginHandshake multiple times while a handshake
            // is in progress. See https://github.com/netty/netty/issues/4718.
            return;
        } else {
            // Forced to reuse the old handshake.
            p = handshakePromise;
            assert !p.isDone();
        }

        // Begin handshake.
        final ChannelHandlerContext ctx = this.ctx;
        try {
            engine.beginHandshake();
            wrapNonAppData(ctx, false);
        } catch (Throwable e) {
            setHandshakeFailure(ctx, e);
        } finally {
           forceFlush(ctx);
        }
        applyHandshakeTimeout(p);
    }

For example, in handshake(Promise) it looks like you will leave the method early if the engine's handshaking status is something besides HandshakeStatus.NOT_HANDSHAKING. Which means that a handshake timeout never gets scheduled.

@normanmaurer
Copy link
Member Author

@tbrooks8 yep seems like we should cleanup this a bit more. Let me take a look

@normanmaurer
Copy link
Member Author

@tbrooks8 actually I think the current implementation is fine as if the handshake status is anything different we should have already finished the handshake or it should be still in progress which means a timeout should already have been scheduled.

@normanmaurer
Copy link
Member Author

So to make it short I think what we have here as fix is fine... if there will be other issues we should fix these in a separate PR with a unit test included.

@Tim-Brooks
Copy link
Contributor

@normanmaurer -

I agree the fix here is fine for my case. But I think the following is possible currently (a different issue):

  1. A client (like my case) initiates the handshake in a flush call. The client hello message is generated and flushed. The SSLEngine now returns a handshake status of NEED_UNWRAP.

  2. The channelActive call happens, since the handshake status is not HandshakeStatus.NOT_HANDSHAKING (line 1740), the method returns immediately without scheduling a timeout. There is now no timeout for this handshake.

The flush route for starting a handshake does not called startHandshakeProcessing. That is only called when the handler is added and the channel is already active or in the channelActive call. The flush route allows for the SSLEngine handshake to start prior to these things happening. And then the status of the engine, causes the handshake(final Promise<Channel> newHandshakePromise) method to exit early.

@normanmaurer
Copy link
Member Author

normanmaurer commented Nov 10, 2018 via email

@normanmaurer
Copy link
Member Author

@tbrooks8 #8493

Copy link
Member

@carl-mastrangelo carl-mastrangelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants