Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using channel Local : java.lang.IllegalStateException: await*() in I/O thread causes a dead lock or sudden performance drop. Use addListener() instead or call await*() from a different thread. #387

Closed
fredericBregier opened this issue Jun 9, 2012 · 17 comments
Assignees
Labels
Milestone

Comments

@fredericBregier
Copy link
Member

Hi,

Between version 3.5.0.beta and 3.5.0 final, a change occurs concerning Local Channel. This brings issue in my project.

Previously, there were no issue to have an await*() in such channel, since it is in thread only.
Now, it always sends the following error :

java.lang.IllegalStateException: await_() in I/O thread causes a dead lock or sudden performance drop. Use addListener() instead or call await_() from a different thread.

My point of view is that for Channel Local, it should not be necessary to add in addition a new thread layer between the channel handling and the work to be done. It could have IMHO some huge impact in performance to be obliged to add in addition such a thread. Most of the time, those Channel Local are already within a thread (and for me always) that is separated from a real Channel interface (networked one).

Is there any reason for this change ?

Not to say it will have a huge impact in my code, but if there is a real reason, ok I will do the necessary changes on my side, but I do think it seems unecessary for such Local Channel handling...

@fredericBregier
Copy link
Member Author

Oh, and by the way, I just try to add a new ExecutionHandler in the pipeline to ensure I have one and ...
(I had one on server side, but none on client side, so I simply add one on client side)

The error still remains there ! So I suppose there is an issue in the detection ?

But it does not impact "Network" channels...

@fredericBregier
Copy link
Member Author

The only thing I saw that might be related is the following fix:

7800187#src/main/java/org/jboss/netty
but not sure it is related... This seems related to the new :

public final void run() {
try {
DeadLockProofWorker.PARENT.set(executor);
runTask();
} finally {
DeadLockProofWorker.PARENT.remove();
}
}

I used a different ExecutorService but with the same Executor behind on Server and Client side.
I even try to use a very different Executor, but it does not work either.

Any idea ?

@fredericBregier
Copy link
Member Author

Small correction on the behavior, here is the scenario I found out:

  1. A network channel received a message (before the handler is an OrderedMemoryAwareThreadPoolExecutor)
  2. It forwards it to a local channel
  3. the local channel server (before the handler is an OrderedMemoryAwareThreadPoolExecutor) (LocalServerHandler.messageReceived)
  4. then decide to send back a message to the remote client (network channel) (writeAbstractLocalPacket)
  5. and to wait for the message to be sent (same writeAbstractLocalPacket)
    Then the "checkDeadLock" is raizing an error.

But there is an OMATPE in each step, so why ?
I believe the check if checkDeadLock is not ok then ?

Here is the exception trace:

   Unexpected exception from downstream Ref Channel: [id: 0x52cab854, local:ephemeral => local:0]
    java.lang.IllegalStateException: await*() in I/O thread causes a dead lock or sudden performance drop. Use addListener() instead or call await*() from a different thread.
at org.jboss.netty.channel.DefaultChannelFuture.checkDeadLock(DefaultChannelFuture.java:343) ~[netty-3.5.0.Final.jar:na]
at org.jboss.netty.channel.DefaultChannelFuture.await(DefaultChannelFuture.java:230) ~[netty-3.5.0.Final.jar:na]
at openr66.protocol.utils.ChannelUtils.writeAbstractLocalPacket(ChannelUtils.java:278) ~[classes/:na]
at openr66.protocol.localhandler.LocalServerHandler.request(LocalServerHandler.java:1209) ~[classes/:na]
at openr66.protocol.localhandler.LocalServerHandler.messageReceived(LocalServerHandler.java:327) ~[classes/:na]
at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45) [netty-3.5.0.Final.jar:na]
at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:64) [netty-3.5.0.Final.jar:na]
at org.jboss.netty.handler.execution.OrderedMemoryAwareThreadPoolExecutor$ChildExecutor.run(OrderedMemoryAwareThreadPoolExecutor.java:315) [netty-3.5.0.Final.jar:na]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_31]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_31]
at java.lang.Thread.run(Thread.java:662) [na:1.6.0_31]

So the exception is raized while the wait is really not called from an IO thread (see the stack trace)...

@fredericBregier
Copy link
Member Author

Finally by making step by step, it seems it is more in ChannelEventRunnable that the issue is.
When it set the "DeadLockProofWorker.PARENT.set(executor);", the stack trace is the following:

    at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:64)
at org.jboss.netty.handler.execution.OrderedMemoryAwareThreadPoolExecutor$ChildExecutor.run(OrderedMemoryAwareThreadPoolExecutor.java:315)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

and later on when the wait occurs:

   DeadLockProofWorker.PARENT.get() != null        is true

So, have you any idea ?
I know this code is very new in the 3.5, but I do not know the impact of it...

@normanmaurer
Copy link
Member

Could you have show me code to Reproduce it?

@fredericBregier
Copy link
Member Author

And to finish with this from my analysis part:

I double check, and the Executor that is set in ChannelEventRunnable and checked later on is my Local OrderedMemoryAwareThreadPoolExecutor, not an Executor from Netty on IO thread part...

@fredericBregier
Copy link
Member Author

Hmm, for a code to reproduce, I will try (I have to go to a meeting right now).
But I think that it seems that the set of Executor to PARENT should maybe check if the Executor is from a OMATPE before ? but how to do this ?

@trustin
Copy link
Member

trustin commented Jun 9, 2012

DefaultChannelFuture.setUseDeadLockChecker(false) will fix your problem although it's not recommended.

@trustin trustin closed this as completed Jun 9, 2012
@fredericBregier
Copy link
Member Author

@trustin I now this way, but as you said, it is not recommended.
And I think this might hide some other issues than just mine.
So I am trying right now to create an example to illustrate the issue. If I can, then I will reopen the issue, since it is really not fixed. What you proposed is only a bypass, not a fix.

@fredericBregier
Copy link
Member Author

Hi,

I managed to make an almost clean example. Almost since the client does not finished correctly with the answer, but that was not the goal of this example.
The main goal is to illustrate what happens on server side.

The logic is :

  1. client side : Http Codec that simply sends one request
  2. server side : Http Codec on Network Channel with all necessary OMATPE
    a) at reception it creates a connection to a Local Channel with all necessary OMATPE again (new one)
    b) send a simple message such that the Local Channel server will be able to retrieve the Network Channel
    c) the local channel server receives the message, and send back an answer to the client directly to Network Channel
    d) the local channel server makes a wait on the write and close the local channel after (I know that I can do it differently, but this is only to illustrate the wait, not the close)

In Netty 3.5.beta, everything is ok (except the client that did not finished with the data, but that's out of concern here).
In Netty 3.5.final, I got the exception again on step 2d, but I should not. So I believe there is something to fix in Netty.

The code can be downloaded from the following link:

https://docs.google.com/open?id=0B-tOt1H9VwrFbkFTM3FyOVhvbE0

@trustin
Copy link
Member

trustin commented Jun 10, 2012

Please do not create a milestone when not necessary. Setting the milestone to 3.5.1.Final.

@fredericBregier
Copy link
Member Author

First sorry, I mis type the milestone when I add my comment

I try to understand how to fix it. For the moment, I am able to fix it by testing if the executor is of type MATPE (and therefore also OMATPE and subtypes).

But I would like to highlight that I don't understand how a single object (static) could be shared among all threads when they want to test the executor behaviour is correct while this logic is local to one channel. So there might be another solution more stable or more clean...

By the way, I simplify the test example, and I uploaded it again to the following. I tried also by using only LocalChannel but no issue were encoutentered. So it means the issue comes when we have both Network and Local Channel (which I do to enable multiplexing).

https://docs.google.com/open?id=0B-tOt1H9VwrFQ2QzMm9ZUDFJVWc

@normanmaurer
Copy link
Member

@fredericBregier sorry I was quite busy.. I will try to have a look today or tomorrow latest.

@normanmaurer
Copy link
Member

@fredericBregier Will check tomorrow.. To busy today :(

normanmaurer added a commit that referenced this issue Jun 13, 2012
@normanmaurer
Copy link
Member

@fredericBregier could you please retest with latest code from 3 branch ?

@fredericBregier
Copy link
Member Author

@normanmaurer Yes I confirm that with the latest 3 branch it is ok.
Make the PARENT different mades the trick !
Thank you Norman !

I close this issue and the proposal.

@dimmacro
Copy link

@normanmaurer hey,I am facing the same issue under the same situation,and my deal way is using a temp param to save the Executor before call write and read from another netty server,after finishing the connection,then set the temp Executor back,just like this:
Executor temp = DeadLockProofWorker.PARENT.get();
DeadLockProofWorker.PARENT.set(null);
obj.getChannel().write(request).awaitUninterruptibly();
Object o = obj.getHandler().read(NettyConnectionPoolUtil.read_timeout, TimeUnit.MILLISECONDS);
DeadLockProofWorker.PARENT.set(temp);
though it works,but I doubt whether this way is OK?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants