Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConnectException for no reason #746

Closed
pveentjer opened this issue Oct 1, 2015 · 4 comments
Closed

ConnectException for no reason #746

pveentjer opened this issue Oct 1, 2015 · 4 comments
Assignees
Labels
Milestone

Comments

@pveentjer
Copy link
Contributor

Just got this exception.

When trying again, it works. So something is broken.

INFO  10:04:48 ClientConnector C -> C_A2 sends to /10.212.10.102:9500
INFO  10:04:48 HarakiriMonitor is not enabled or not running on EC2
INFO  10:04:48 Killing 9 Agents
INFO  10:04:48 Killing Agent 10.212.10.101
INFO  10:04:48 Killing Agent 10.212.10.102
INFO  10:04:48 Killing Agent 10.212.10.103
INFO  10:04:48 Killing Agent 10.212.10.105
INFO  10:04:48 Killing Agent 10.212.10.104
INFO  10:04:48 Killing Agent 10.212.10.109
INFO  10:04:48 Killing Agent 10.212.10.108
INFO  10:04:48 Killing Agent 10.212.10.107
INFO  10:04:48 Killing Agent 10.212.10.106
INFO  10:04:48 Successfully killed 9 Agents
FATAL 10:04:48 Could not start CoordinatorConnector
java.lang.RuntimeException: java.net.ConnectException: Connection refused: /10.212.10.105:9500
    at com.hazelcast.simulator.utils.ThreadSpawner.awaitCompletion(ThreadSpawner.java:140)
    at com.hazelcast.simulator.coordinator.Coordinator.startCoordinatorConnector(Coordinator.java:233)
    at com.hazelcast.simulator.coordinator.Coordinator.initAgents(Coordinator.java:159)
    at com.hazelcast.simulator.coordinator.Coordinator.run(Coordinator.java:113)
    at com.hazelcast.simulator.coordinator.Coordinator.main(Coordinator.java:533)
Caused by: java.net.ConnectException: Connection refused: /10.212.10.105:9500
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:225)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:527)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:467)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:381)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:353)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:742)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
    at java.lang.Thread.run(Thread.java:745)

This is my run script:

#!/bin/bash

provisioner --kill
provisioner --install

for i in `seq 1 1`;
do
    coordinator --duration 3m \
                --monitorPerformance  \
                --workerVmOptions      "-ea -server -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log\
                                        -Dhazelcast.performance.monitor.delay.seconds=30" \
        map.properties
done


@pveentjer pveentjer added the bug label Oct 1, 2015
@pveentjer pveentjer added this to the 0.7 milestone Oct 1, 2015
@pveentjer
Copy link
Contributor Author

And again:

FO  10:23:19 ClientConnector C -> C_A2 sends to /10.212.10.102:9500
INFO  10:23:19 ClientConnector C -> C_A1 sends to /10.212.10.101:9500
INFO  10:23:19 ClientConnector C -> C_A8 sends to /10.212.10.108:9500
INFO  10:23:19 ClientConnector C -> C_A4 sends to /10.212.10.104:9500
INFO  10:23:19 ClientConnector C -> C_A5 sends to /10.212.10.105:9500
INFO  10:23:19 ClientConnector C -> C_A3 sends to /10.212.10.103:9500
INFO  10:23:19 ClientConnector C -> C_A6 sends to /10.212.10.106:9500
INFO  10:23:19 ClientConnector C -> C_A9 sends to /10.212.10.109:9500
INFO  10:23:19 HarakiriMonitor is not enabled or not running on EC2
INFO  10:23:19 Killing 9 Agents
INFO  10:23:19 Killing Agent 10.212.10.101
INFO  10:23:19 Killing Agent 10.212.10.103
INFO  10:23:19 Killing Agent 10.212.10.106
INFO  10:23:19 Killing Agent 10.212.10.104
INFO  10:23:19 Killing Agent 10.212.10.102
INFO  10:23:19 Killing Agent 10.212.10.109
INFO  10:23:19 Killing Agent 10.212.10.105
INFO  10:23:19 Killing Agent 10.212.10.107
INFO  10:23:19 Killing Agent 10.212.10.108
INFO  10:23:19 Successfully killed 9 Agents
FATAL 10:23:19 Could not start CoordinatorConnector
java.lang.RuntimeException: java.net.ConnectException: Connection refused: /10.212.10.107:9500
    at com.hazelcast.simulator.utils.ThreadSpawner.awaitCompletion(ThreadSpawner.java:140)
    at com.hazelcast.simulator.coordinator.Coordinator.startCoordinatorConnector(Coordinator.java:233)
    at com.hazelcast.simulator.coordinator.Coordinator.initAgents(Coordinator.java:159)
    at com.hazelcast.simulator.coordinator.Coordinator.run(Coordinator.java:113)
    at com.hazelcast.simulator.coordinator.Coordinator.main(Coordinator.java:533)
Caused by: java.net.ConnectException: Connection refused: /10.212.10.107:9500
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:225)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:527)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:467)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:381)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:353)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:742)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
    at java.lang.Thread.run(Thread.java:745)

@pveentjer
Copy link
Contributor Author

I guess it is a timing related issue, since if I restart the run script than the problem eventually goes away.

@Donnerbart Donnerbart self-assigned this Oct 1, 2015
@Donnerbart
Copy link
Contributor

There should be no timing issue since the new protocol listener is started synchronously before the old one - and the old network connection is still in place to see if the Agent is started.

Beside that a ton of code has been change since that bug occurred. Please re-open if this happens again. I would need the full logs of Coordinator and Agents.

@Donnerbart
Copy link
Contributor

Argh, the startup order was indeed wrong. We had the same issue in the Jenkins builds...

Should be fixed via e13518a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants