Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConcurrentModificationException in NioEventLoop #654

Closed
fbettag opened this issue Oct 15, 2012 · 26 comments
Closed

ConcurrentModificationException in NioEventLoop #654

fbettag opened this issue Oct 15, 2012 · 26 comments
Assignees
Labels
Milestone

Comments

@fbettag
Copy link

fbettag commented Oct 15, 2012

Under heavy load, i often get this exception:

Oct 15, 2012 7:32:38 PM io.netty.channel.socket.nio.NioEventLoop
WARNING: Unexpected exception in the selector loop.
java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
        at java.util.HashMap$KeyIterator.next(Unknown Source)
        at io.netty.channel.socket.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:268)
        at io.netty.channel.socket.nio.NioEventLoop.run(NioEventLoop.java:211)
        at io.netty.channel.SingleThreadEventExecutor$1.run(SingleThreadEventExecutor.java:80)
        at java.lang.Thread.run(Unknown Source)

Reporting the issue as request by norman :)

@normanmaurer
Copy link
Member

@trustin maybe you could check this out :)

@trustin
Copy link
Member

trustin commented Oct 16, 2012

Thanks for reporting this, @fbettag. Let me take a look. Do you have any additional information (how to reproduce, etc)?

@fbettag
Copy link
Author

fbettag commented Oct 16, 2012

We have no idea, other than the fact that we penetrated the server with ~3-4k concurrent real connections (in one second after startup) - thats peak because theyre long polling/comet

Sent from my iPhone

On 16.10.2012, at 19:31, Trustin Lee notifications@github.com wrote:

Thanks for reporting this, @fbettag. Let me take a look. Do you have any additional information (how to reproduce, etc)?


Reply to this email directly or view it on GitHub.

@trustin
Copy link
Member

trustin commented Oct 16, 2012

I tried to reproduce the problem by making a bunch of HTTP connections to the HTTP snoop example, but couldn't. Would you mind if you could modify the HTTP snoop example a little bit to reproduce the problem? I'm using httperf to drive it.

@fbettag
Copy link
Author

fbettag commented Oct 16, 2012

Hm i will try this tonight.

@trustin
Copy link
Member

trustin commented Oct 16, 2012

Thanks in advance!

@fbettag
Copy link
Author

fbettag commented Oct 16, 2012

we're basically just holding 2k longpolling connects (comet/websockets), the snoop example simply answers them, but does not hold them. maybe that's the key difference?

@kelaneren
Copy link

I got the same exception with my loadtest when there are more than 2k socket connections. I am using netty with long open raw socket.

@normanmaurer
Copy link
Member

@kelaneren would be possible to weite some testcase ?

normanmaurer pushed a commit that referenced this issue Nov 7, 2012
… raise a ConcurrentModificationException when using the nio transport
@ghost ghost assigned normanmaurer Nov 7, 2012
@normanmaurer
Copy link
Member

@kelaneren @fbettag I think I fixed it... Could you please retest with latest master ?

@fbettag
Copy link
Author

fbettag commented Nov 7, 2012

Yup in a second

@fbettag
Copy link
Author

fbettag commented Nov 8, 2012

We don't see the exception anymore! Not sure if the problem is gone tho. @kelaneren anything?

@normanmaurer
Copy link
Member

w00t :) I will close it as I'm sure it's fixed... Anyway would be nice if @kelaneren could also test it out ...

@kelaneren
Copy link

I upgraded netty in my project from 4.0.0.alpha1 to 4.0.0.alpha8 with this fix and got 100% cpu usage when I ran the same loadtest with 2000 connections. I am going to dig more

@normanmaurer
Copy link
Member

Would it be possible to share your "loadtest" ?

Norman Maurer
nmaurer@redhat.com

Am 08.11.2012 um 13:11 schrieb kelaneren notifications@github.com:

I upgraded netty in my project from 4.0.0.alpha1 to 4.0.0.alpha8 with this fix and got 100% cpu usage when I ran the same loadtest with 2000 connections. I am going to dig more


Reply to this email directly or view it on GitHub.

@kelaneren
Copy link

Sorry, it's specific to our project. During netty upgrading, I changed

b = new ServerBootstrap()

from b.eventLoop(new NioEventLoop(), new NioEventLoop())
to b.group(new NioEventLoopGroup(), new NioEventLoopGroup())

And the buff used in channel.write(buff) is changed from

var buff = new DynamicByteBuf(size);

to

var buff = new DirectByteBuf(size)

I will do some profiling. probably extract it into a more generic test in future

@normanmaurer
Copy link
Member

Two things..

  1. Do you create many ServerBootstrap instances ? In this case you should share your NioEventLoopGroup instances
  2. Replace DirectByteBuf(size) with Unpooled.buffer(size);

@kelaneren
Copy link

Thanks Norman. It was our own issue caused by config change. I confirmed this ConcurrentModificationException is fixed. Cheers

@fbettag
Copy link
Author

fbettag commented Nov 8, 2012

@kelaneren could you elaborate on that 100% cpu issue? we're having the same thing over here and no idea what's causing it.

@normanmaurer
Copy link
Member

Could get some thread dumps and identify the stack of it ..

http://www.semicomplete.com/blog/geekery/debugging-java-performance.html

Am 08.11.2012 um 19:21 schrieb Franz Bettag notifications@github.com:

@kelaneren could you elaborate on that 100% cpu issue? we're having the same thing over here and no idea what's causing it.


Reply to this email directly or view it on GitHub.

@fbettag
Copy link
Author

fbettag commented Nov 8, 2012

io ~ # jps

  • jps is not available for oracle-jre-bin-1.7 on x86_64
  • IMPORTANT: some Java tools are not available on some VMs on some architectures
    io ~ # jstack
  • jstack is not available for oracle-jre-bin-1.7 on x86_64
  • IMPORTANT: some Java tools are not available on some VMs on some architectures

:(

@fbettag
Copy link
Author

fbettag commented Nov 8, 2012

we'll try it with icedtea now!

@normanmaurer
Copy link
Member

You need to install the jdk...

Also you can use kill -3 as replacement for jstack

Sent from my iPhone. Excuse any typos....

Am 08.11.2012 um 19:55 schrieb Franz Bettag notifications@github.com:

io ~ # jps

jps is not available for oracle-jre-bin-1.7 on x86_64
IMPORTANT: some Java tools are not available on some VMs on some architectures io ~ # jstack
jstack is not available for oracle-jre-bin-1.7 on x86_64
IMPORTANT: some Java tools are not available on some VMs on some architectures
:(


Reply to this email directly or view it on GitHub.

@kelaneren
Copy link

I got 100% cpu usage again after increase numbers of connections and cpu usage didn't drop after all clients are closed. I did the same test with netty 3.5.9.final. CPU usage is normal. However, I can't increase number of long open connections. After the number is increased to 4k, I got

2012-11-09 11:18:30,810-[New I/O worker #1] ERROR c.s.b.net.FramedSocketServerHandler Unexpected exception from downstream.
java.nio.channels.ClosedChannelException: null
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:785) ~[game-service.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:729) ~[game-service.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:89) ~[game-service.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:472) ~[game-service.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:333) ~[game-service.jar:na]
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35) ~[game-service.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [na:1.7.0_07]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [na:1.7.0_07]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0_07]

not sure these are related to this ticket or not. will dig more

@fbettag
Copy link
Author

fbettag commented Nov 9, 2012

i can confirm this issue

@normanmaurer
Copy link
Member

@fbettag what you can confirm ?
@kelaneren please open another issue with as much infos as possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants