New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hazelcast startup hangs in EC2 #2217

Closed
caarlos0 opened this Issue Apr 3, 2014 · 12 comments

Comments

Projects
None yet
4 participants
@caarlos0
Copy link
Contributor

caarlos0 commented Apr 3, 2014

Our app tries to connect in the EC2 machine where the Hazelcast is Running. This works as expected.

But, if the Hazelcast server is down for any reason, looks like hazelcast enters in some kind of loop and never came back, causing the whole app deployment to fail due to timeout.

I'm not able to reproduce this in my machine, only in EC2.


Info:

  • Hazelcast Version: 3.2
  • Client version: 3.2 (java)
  • App: Java EE 6 application, running on jBoss AS 7 with jRockit 6.
@noctarius

This comment has been minimized.

Copy link
Contributor

noctarius commented Apr 3, 2014

Are you able to get a thread dump on EC2 when it happens? Sounds like a potential deadlock.

@caarlos0

This comment has been minimized.

Copy link
Contributor

caarlos0 commented Apr 3, 2014

I'm not sure if understood you question, but, yes, I can stop the jBoss (kill -9) and the machine still working.

Right now I'm trying to isolate this issue so it should be easier identify what exactly is happening.

Will update here as soon as I have more info.

@caarlos0

This comment has been minimized.

Copy link
Contributor

caarlos0 commented Apr 3, 2014

I was able to isolate the problem. It's not even in EC2, it is seems to be in any server.

Please take a look: https://github.com/caarlos0/hazelcast-bug

@noctarius

This comment has been minimized.

Copy link
Contributor

noctarius commented Apr 3, 2014

Thanks Carlos, I'll have a look tomorrow since I just about to go to sleep :-)

Thanks for your efforts.

Chris

@velo

This comment has been minimized.

Copy link
Contributor

velo commented Apr 3, 2014

Cool =D

@caarlos0

This comment has been minimized.

Copy link
Contributor

caarlos0 commented Apr 3, 2014

Thank you very much, @noctarius!

BTW, it ends up shutting down, but it takes a loooot of time (almost 10 minutes), so, jBoss rollbacks the deploy. Anyways, it is really too much time IMHO, it doesn't feel right to me.

Please let me know if I can help with anything else!
Cheers 🍺

@pveentjer pveentjer added aws labels Apr 4, 2014

@caarlos0

This comment has been minimized.

Copy link
Contributor

caarlos0 commented Apr 5, 2014

The issue is in ClientConnectorManagerImpl, line 295. It is a timeout, but looks like it is ignored...

@caarlos0

This comment has been minimized.

Copy link
Contributor

caarlos0 commented Apr 5, 2014

I might have fixed that! =D

@noctarius

This comment has been minimized.

Copy link
Contributor

noctarius commented Apr 5, 2014

We will have a look at your contribution, thanks for fixing it. To make sure we can include it into the Hazelcast core source I have to ask you to sign the Hazelcast Contributor Agreement (https://hazelcast.atlassian.net/wiki/display/COM/Hazelcast+Contributor+Agreement)

Thanks Carlos

@caarlos0

This comment has been minimized.

Copy link
Contributor

caarlos0 commented Apr 5, 2014

OK, I'll try to do this by monday.

Thanks Christoph

@caarlos0

This comment has been minimized.

Copy link
Contributor

caarlos0 commented Apr 7, 2014

Hi @noctarius, I just sent it by email. Cheers!

@noctarius

This comment has been minimized.

Copy link
Contributor

noctarius commented Apr 7, 2014

Ok thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment