Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deregister and re-register, ClosedChannelException is thrown #5125

Closed
purpleknightleo opened this issue Apr 12, 2016 · 16 comments
Closed

deregister and re-register, ClosedChannelException is thrown #5125

purpleknightleo opened this issue Apr 12, 2016 · 16 comments
Assignees
Labels

Comments

@purpleknightleo
Copy link

Hi,
I'm doing a project on middle-ware for distributed database, and I want to use Netty as client connecting to MySQL reading data. But a problem gets me stuck for such a long time so I have to ask for help. The scenario is as below:

  1. Netty dispatches SQL query to MySQL;
    1. MySQL returns a bunch of resultset which might be very large so I have to prevent it from OOM;
    2. When the data received exceed a user-defined high water mark threshold, I use channel.deregister() to stop the incoming data for a while;
    3. The business logic consumes the data received, when it reaches a low water mark threshold, I use eventloop.register(channel) to continue to receive the remaining data;
    4. Do the same thing until all data needed is received and processed.

But here the strange thing is when I run a benchmark tool with N concurrent clients executing the same SQL constantly, and the NioEventLoopGroup is with M threads created(N>M), after a while the N-M connections are shown closed, leaving M connections active to the end.
After debugging, I find during the deregister and re-register, the channel bounded to the later failed connection will be closed(seems like the SelectionKey is set invalid) , so when trying to re-register, ClosedChannelException will be thrown.
And it should be mentioned that, if I remove the deregister and re-register logic it works fine, so obviously my usage of deregister and re-register could probably be wrong in some place, could you do me a favor instructing me to locate the reason? Thank you so much in advance.

Bootstrap bootstrap = new Bootstrap();
bootstrap
    .group(MySQLConnectorConfig.group)
    .channel(NioSocketChannel.class)
    .handler(new ChannelInitializer<SocketChannel>() {
        @Override
        protected void initChannel(SocketChannel ch)
                    throws Exception {
            ch.pipeline().addLast(new MySQLClientDecoder(conn));
            ch.pipeline().addLast(new MySQLClientHandler(conn));
            ch.config().setAllocator(PooledByteBufAllocator.DEFAULT);
           }
     })
    .option(ChannelOption.SO_KEEPALIVE, true)
    .option(ChannelOption.TCP_NODELAY, true)
    .option(ChannelOption.SO_REUSEADDR, true)
    .option(ChannelOption.SO_RCVBUF, MySQLConnectorConfig.DEFAULT_SO_RCVBUF);

I deregister the channel in MySQLClientDecoder, and re-register in my business logic when consuming the data received.

@Scottmitch
Copy link
Member

the channel bounded to the later failed connection will be closed(seems like the SelectionKey is set invalid)

Can you provide some details as to why you say the SelectionKey is "set invalid", and what state is it in? Also is it possible to provide a reproducer for this scenario?

When the data received exceed a user-defined high water mark threshold, I use channel.deregister() to stop the incoming data for a while;

Sounds like this can be accomplished using a combination of primitives provided by Netty, and deregister/register could potentially be avoided:

@purpleknightleo
Copy link
Author

First thanks very much for your in-time reply.
The reason I say the SelectionKey is set invalid is when debugging the program goes into the following logic in NioEventLoop:

private static void processSelectedKey(SelectionKey k, AbstractNioChannel ch) {
        final NioUnsafe unsafe = ch.unsafe();
        if (!k.isValid()) {
            // close the channel if the key is not valid anymore
            unsafe.close(unsafe.voidPromise());
            return;
        }
...

channel close is triggered because of the invalidity of the SelectionKey, that's what I saw.

Actually, I just want to control the flow of the incoming data, so I deregister the channel from eventloop when overflow and re-register it back to the same eventloop when underflow. I don't know if it's the typical usage of deregister/re-register paradigm, you said this could potentially be avoided by the way you suggest, but here Netty acts as a client reading data from MySQL Server, what I should control is the read buffer I think, not the write buffer, so why it's all channel write related methods you offer? Could you plz explain? (maybe a naive question,:-D)
And also the high/low water mark I mention is not simply the accumulated bytes coming, but the accumulated bytes for complete packets I define in my process logic

@Scottmitch
Copy link
Member

What is the state of the Channel and the associated selector at this point? The SelectionKey will not be valid under the following circumstances:

A key is valid upon creation and remains so until it is cancelled,
* its channel is closed, or its selector is closed.

The SelectionKey is cancelled on deregister and an new SelectionKey is created on register ... can you tell me the state of ch.isRegistered() when the channel is closed in this condition (on the unsafe.close(...) line you pasted above)?

but here Netty acts as a client reading data from MySQL Server, what I should control is the read buffer I think, not the write buffer, so why it's all channel write related methods you offer? Could you plz explain?

The typical way to control how much you are reading is to disable autoRead and instead manually call read(). You can use w/e condition makes sense for your use case but it is also common that you want to stop reading if you are unable to write data to somewhere (for example stop reading requests if you are unable to write responses). This is what the writability stuff is useful for, but if you have other application semantics that is fine too.

@purpleknightleo
Copy link
Author

Yes, I really appreciate your sincere help on my problem. And glad to tell you it's been solved with your constructive suggestions.
I have dived deeply into the autoRead paradigm and found it's what I'm looking for. Actually here in my scenario I just want to control the incoming flow rate, and set autoRead is enough(remove the OP_READ and add it back later) , and it works. On the other hand, the water mark is still controlled by the application semantics, not the built-in writability control.
Thank you again.

@Scottmitch
Copy link
Member

@purpleknightleo - Let me reopen this issue as I think you uncovered a bug. Would you mind keeping your current code base around so you can verify a PR I will submit?

@Scottmitch Scottmitch reopened this Apr 14, 2016
@Scottmitch Scottmitch self-assigned this Apr 14, 2016
@Scottmitch Scottmitch added this to the 4.0.37.Final milestone Apr 14, 2016
@purpleknightleo
Copy link
Author

Ok, no problem. The versions I tried include 4.0.32/33/35, I don't know if it has been fixed in later version. As I mentioned, my need should be done with autoRead, but I think deregister/re-register can also satisfy though It's too heavy-weight.
I don't really know what you want me to do now, keep code base around so to verify a PR? Could you plz explain? I'm a newbie in github and open-source projects participation. Thank you.

Scottmitch added a commit to Scottmitch/netty that referenced this issue Apr 20, 2016
Motivation:
If a channel is deregistered from an NioEventLoop the associated SelectionKey is cancelled. If the NioEventLoop has yet to process a pending event as a result of that SelectionKey then the NioEventLoop will see the SelecitonKey is invalid and close the channel. The NioEventLoop should not close a channel if it is not registered with that NioEventLoop.

Modifications:
- NioEventLoop.processSelectedKeys should check that the channel is still registered to itself before closing the channel

Result:
NioEventLoop doesn't close a channel that is no longer registered to it when the SelectionKey is invalid
Fixes netty#5125
Scottmitch added a commit to Scottmitch/netty that referenced this issue Apr 20, 2016
Motivation:
If a channel is deregistered from an NioEventLoop the associated SelectionKey is cancelled. If the NioEventLoop has yet to process a pending event as a result of that SelectionKey then the NioEventLoop will see the SelecitonKey is invalid and close the channel. The NioEventLoop should not close a channel if it is not registered with that NioEventLoop.

Modifications:
- NioEventLoop.processSelectedKeys should check that the channel is still registered to itself before closing the channel

Result:
NioEventLoop doesn't close a channel that is no longer registered to it when the SelectionKey is invalid
Fixes netty#5125
@Scottmitch
Copy link
Member

Scottmitch commented Apr 20, 2016

my need should be done with autoRead, but I think deregister/re-register can also satisfy though It's too heavy-weight.

Sorry for the confusion...

I would still recommend going the autoRead and controlling when you call read. I would like you to run your code which caused you to create this issue against #5167 to verify the PR fixes the original issue ... when you run with this PR let me know if you still observe the original issue or not.

Scottmitch added a commit that referenced this issue Apr 21, 2016
Motivation:
If a channel is deregistered from an NioEventLoop the associated SelectionKey is cancelled. If the NioEventLoop has yet to process a pending event as a result of that SelectionKey then the NioEventLoop will see the SelecitonKey is invalid and close the channel. The NioEventLoop should not close a channel if it is not registered with that NioEventLoop.

Modifications:
- NioEventLoop.processSelectedKeys should check that the channel is still registered to itself before closing the channel

Result:
NioEventLoop doesn't close a channel that is no longer registered to it when the SelectionKey is invalid
Fixes #5125
@purpleknightleo
Copy link
Author

@Scottmitch Sorry to tell you your fix doesn't solve my previous issue, and I think the code logic you changed has a problem. You try to judge whether the channel is still registered to the eventloop by:

if (eventLoop != this || eventLoop == null) {
                return;
}

But in my case, when I set one thread in NioEventLoopGroup, and two clients connect to my middleware, the final result is one client succeeds and the other dies due to the reason you have known. On the other hand, after deregistered from eventloop, we can still use ch.eventloop() to get the previously bound eventloop(of course, their real binding is dismissed due to deregistration). So when the selectionKey is invalid, it goes into the code logic pasted above, but actually here it satisfies eventloop !=null && eventloop == this(because here we have only one eventloop, it can't be another one), the close process still gets in as we don't expect.
Thus, I think why we can't use the following logic to take a judge:

if (!k.isValid()) {
            final EventLoop eventLoop;
            try {
                eventLoop = ch.eventLoop();
            } catch (Throwable ignored) {
                // If the channel implementation throws an exception because there is no event loop, we ignore this
                // because we are only trying to determine if ch is registered to this event loop and thus has authority
                // to close ch.
                return;
            }
            // Only close ch if ch is still registerd to this EventLoop. ch could have deregistered from the event loop
            // and thus the SelectionKey could be cancelled as part of the deregistration process, but the channel is
            // still healthy and should not be closed.
            // See https://github.com/netty/netty/issues/5125
            if (eventLoop != this || eventLoop == null || (eventloop == this && !ch.isRegistered())) {
                return;
            }
            // close the channel if the key is not valid anymore
            unsafe.close(unsafe.voidPromise());
            return;
        }

@Scottmitch
Copy link
Member

Scottmitch commented Apr 23, 2016

@purpleknightleo - Thanks for reporting back. The issue is that the isRegistered() will not be updated until the next event loop run, and I can't repo in this state (or any state :/). Let me think about this a bit. If you could provide a unit test which reproduces the behavior that would be much appreciated ... seems non-trivial to get into this state.

@purpleknightleo
Copy link
Author

Ok, after I try the newer version, I will give you a replay ASAP. Thanks.

Scottmitch added a commit to Scottmitch/netty that referenced this issue Apr 25, 2016
Motivation:
04e33fd was incomplete. The EventLoop may not change even if the channel is deregistered.

Modifications:
- Check if the channel is registered, and if not return before closing

Result:
Fixes netty#5125
@purpleknightleo
Copy link
Author

Sorry again, your fix still can't solve my problem and sorry for not offering the unit test for some reason.

if (eventLoop != this || !ch.isRegistered() || ch.selectionKey() != k) {
                  return;
 }

From you fixed code, I think since you said "isRegistered() will not be updated until the next event loop run", the !ch.isRegistered() judgement is of no use. On the other hand, ch.selectionKey() != k, why this? ch is got from k.attachment(), so I think these two objects are linked that ch.selectionKey() will always return k.

@Scottmitch
Copy link
Member

On the other hand, ch.selectionKey() != k, why this?

The AbstractNioChannel.selecitonKey is re-assigned under a few conditions ... one of these is when the channel is registered. This was added in case the channel is re-registered to the same EventLoop but an "old selection key" is being processed (and is now inactive) ... this should not result in the channel being closed.

!ch.isRegistered() judgement is of no use

In summary I can't reproduce the issue you are experiencing so I'm including things that should not be logically wrong ... but not sure it will impact the scenario you are running. I will need more information to debug this more:

  • Can you re-confirm you hit the line in NioEventLoop.processSelectedKey which calls unsafe.close(unsafe.voidPromise()); in this case?
  • Can you enumerate the state of the k, ch, and provide a stack trace in this state?
  • If you could relate when the NioEventLoop.processSelectedKey method is called relative to your call to register/dergister that would be helpful (e.g. I call degregister, processSelectedKey is called, then I call register).

sorry for not offering the unit test for some reason.

I also tried to create a unit test to recreate the sequence of events which I think would repro this issue (deregister, processSelectedKey, register) but was not able to get processSelectedKey to run in the middle. If you can confirm this is the sequence of events I should be able to figure out a way to get the state updated so we can see it "immediately" instead of waiting until the next event loop run.

@Scottmitch
Copy link
Member

@purpleknightleo - ping

@normanmaurer
Copy link
Member

No update so closing

pulllock pushed a commit to pulllock/netty that referenced this issue Oct 19, 2023
Motivation:
If a channel is deregistered from an NioEventLoop the associated SelectionKey is cancelled. If the NioEventLoop has yet to process a pending event as a result of that SelectionKey then the NioEventLoop will see the SelecitonKey is invalid and close the channel. The NioEventLoop should not close a channel if it is not registered with that NioEventLoop.

Modifications:
- NioEventLoop.processSelectedKeys should check that the channel is still registered to itself before closing the channel

Result:
NioEventLoop doesn't close a channel that is no longer registered to it when the SelectionKey is invalid
Fixes netty#5125
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants