Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server memory leak with client connections #450

Closed
oma- opened this issue Mar 23, 2013 · 23 comments

Comments

Projects
None yet
2 participants
@oma-
Copy link

commented Mar 23, 2013

HazelCastClients connecting to the server cause a memory leak on the server.
Steps to reproduce:
Create and run basic server: Hazelcast.newHazelcastInstance(cfg);

Create basic client in a loop (1000 iterations) and connect/disconnect.
HazelcastClient hzc=HazelcastClient.newHazelcastClient(cfg);
System.out.println(hzc.getCluster());
hzc.shutdown();

Monitor server JVM with Java VisualVM. Heap usage grows relatively fast.
It seems that failed logins (bad pwd) cause a bigger leak but successful logins leak as well. You can create a HazelCast server OOM Exception just by connecting and disconnecting without ever storing any data on it!

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jun 27, 2013

Which hazelcast version are you using? I'll see if I can reproduce it.

@oma-

This comment has been minimized.

Copy link
Author

commented Jun 27, 2013

I was trying 2.5 and 3.0. It happened on both. I was mainly using clients but it might happen with light members too. The key is do connect and disconnect hundreds of times. I think connects with bad passwords cause it quicker but it happens even with good ones. Let me know if you have problems recreating it and I can give you some code. It should only be very few lines. Thanks, Oli

Sent via the Samsung GALAXY S®4 Active™, an AT&T 4G LTE smartphone

-------- Original message --------
From: Peter Veentjer notifications@github.com
Date: 06/27/2013 02:48 (GMT-05:00)
To: hazelcast/hazelcast hazelcast@noreply.github.com
Cc: Oli oma.000@gmail.com
Subject: Re: [hazelcast] server memory leak with client connections (#450)

Which hazelcast version are you using? I'll see if I can reproduce it.


Reply to this email directly or view it on GitHub.

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jun 27, 2013

Do you have the server running in a different jvm? And where is the OOME happening, on the client or on the server?

[edit]
I see that you already gave the answers :)

@oma-

This comment has been minimized.

Copy link
Author

commented Jun 27, 2013

Server and client each run in their own jvm. The oom leak happens on the
server. For the client you can just use one thread and connect/disconnect
in a loop. Let the client run the loop with a few 100 millisecond sleep and
watch your server memory with jvisualvm java 1.7.
On Jun 27, 2013 9:57 AM, "Peter Veentjer" notifications@github.com wrote:

Do you have the server running in a different jvm? And where is the OOME
happening, on the client or on the server?


Reply to this email directly or view it on GitHubhttps://github.com//issues/450#issuecomment-20120706
.

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jun 27, 2013

Thanks. I'm already running in JProfiler and see the heap growning.. slowly.. but.. growing..

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jun 27, 2013

Hmm... I see a gc happening and the heap went down again.

screen shot 2013-06-27 at 5 07 57 pm

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jun 27, 2013

This is the program I'm using btw:

import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;

public class Server {
public static void main(String[] args) throws Exception {
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
}
}

import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;

public class Main {

public static void main(String[] args) throws Exception {
    //HazelcastInstance hz = Hazelcast.newHazelcastInstance();

    final ClientConfig clientConfig = new ClientConfig();
    clientConfig.setReconnectionAttemptLimit(1000000);
    clientConfig.addAddress("localhost:5701");

    for(int k=0;k<Integer.MAX_VALUE;k++){
        HazelcastClient hzc=HazelcastClient.newHazelcastClient(clientConfig);
        System.out.println(k+" "+hzc.getCluster());
        hzc.shutdown();
    }
}

}

@oma-

This comment has been minimized.

Copy link
Author

commented Jun 27, 2013

Yes, it will zig zag but the lows and h8ghts will be creeping up. I was
running with -Xmx30m and wad able to OOM in under 10 minutes
On Jun 27, 2013 10:08 AM, "Peter Veentjer" notifications@github.com wrote:

Thanks. I'm already running in JProfiler and see the heep growning..
slowly.. but.. growing..


Reply to this email directly or view it on GitHubhttps://github.com//issues/450#issuecomment-20121704
.

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jun 27, 2013

I still don't see anything bad happening. I'm have already started/stopped 400 clients.

Which Java version are you using?

screen shot 2013-06-27 at 5 11 26 pm

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jun 27, 2013

Ok. I'll let it running for some time and see what happens.

Nasty little critters..

@oma-

This comment has been minimized.

Copy link
Author

commented Jun 27, 2013

I was using 1.6 and 1.7 mostly. Can you add a password to your connection.
Write the client so that it will fail authentication on the server.
On Jun 27, 2013 10:13 AM, "Peter Veentjer" notifications@github.com wrote:

Ok. I'll let it running for some time and see what happens.

Nasty little critters..


Reply to this email directly or view it on GitHubhttps://github.com//issues/450#issuecomment-20122250
.

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jun 27, 2013

This is the client:

public class Main {

public static void main(String[] args) throws Exception {
    //HazelcastInstance hz = Hazelcast.newHazelcastInstance();

    final ClientConfig clientConfig = new ClientConfig();
    clientConfig.getGroupConfig().setPassword("banana");
    clientConfig.setReconnectionAttemptLimit(1000000);
    clientConfig.addAddress("localhost:5701");

    for (int k = 0; k < Integer.MAX_VALUE; k++) {
        try {
            System.out.println("At iteration: " + k);
            HazelcastClient hzc = HazelcastClient.newHazelcastClient(clientConfig);
        } catch (Exception e) {
        }

        //System.out.println(k+" "+hzc.getCluster());
        //hzc.shutdown();
    }
}

}

Running now with it.

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jun 27, 2013

I guess this is it:

screen shot 2013-06-27 at 5 35 25 pm

I'm seeing some count on hazelcast structures also increasing.. trying to finding the root of them.

@oma-

This comment has been minimized.

Copy link
Author

commented Jun 27, 2013

Yup looks good. Try jvisualvm it has great optional plugins but you need to
run 1.7. Thanks for looking at it!
On Jun 27, 2013 10:37 AM, "Peter Veentjer" notifications@github.com wrote:

I guess this is the it:

[image: screen shot 2013-06-27 at 5 35 25 pm]https://f.cloud.github.com/assets/105243/716481/d9b26fbe-df36-11e2-853f-6aa9f6ee6598.png

I'm seeing some count on hazelcast structures also increasing.. trying to
finding the root of them.


Reply to this email directly or view it on GitHubhttps://github.com//issues/450#issuecomment-20123868
.

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jun 27, 2013

I think I have the class that is the root of the evil, the ClientEndPoint.

I'll take care of this issue. Thanks for reporting it!

@oma-

This comment has been minimized.

Copy link
Author

commented Jun 27, 2013

Awesome!
On Jun 27, 2013 10:53 AM, "Peter Veentjer" notifications@github.com wrote:

I think I have the class that is the root of the evil, the ClientEndPoint.

I'll take care of this issue. Thanks for reporting it!


Reply to this email directly or view it on GitHubhttps://github.com//issues/450#issuecomment-20124965
.

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jul 10, 2013

It took a bit longer than expected, but we have found the cause of the authentication OOME issue. Will be fixed shorty.

But I need to verify that I get an OOME when there is no authentication problem. Perhaps it is caused by the same defect.

@oma-

This comment has been minimized.

Copy link
Author

commented Jul 10, 2013

Great! I was playing around with freeing some byte buffers and got it almost stable on good logins. Was it something tricky? Is it possible in GIT to see a Diff for a specific fix. I would love to see what you guys changed. Thanks, Oliver

Sent via the Samsung GALAXY S®4 Active™

-------- Original message --------
From: Peter Veentjer notifications@github.com
Date: 07/10/2013 06:00 (GMT-05:00)
To: hazelcast/hazelcast hazelcast@noreply.github.com
Cc: Oli oma.000@gmail.com
Subject: Re: [hazelcast] server memory leak with client connections (#450)

It took a bit longer than expected, but we have found the cause of the authentication OOME issue. Will be fixed shorty.

But I need to verify that I get an OOME when there is no authentication problem. Perhaps it is caused by the same defect.


Reply to this email directly or view it on GitHub.

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jul 10, 2013

Hi Oli,

I have not offered something yet since someone needs to verify if the fix
we have in mind is not going to cause problems.

We know exactly what is the cause:

there is a method in ClientHandlerService called getClientEndpoint:

public ClientEndpoint getClientEndpoint(Connection conn) {
ClientEndpoint clientEndpoint = mapClientEndpoints.get(conn);
if (clientEndpoint == null) {
clientEndpoint = new ClientEndpoint(node, conn);
mapClientEndpoints.put(conn, clientEndpoint);
}
return clientEndpoint;
}

This method gets called by the ClientRequestHandler.java after the
ClientEndpoint has been removed and causes a ClientEndpoint to be
recreated.

So deleting is 'impossible' and therefor the ClientEndpoints keep being
added.

The idea I have in mind is to have another getClientEndpoint method (and
rename the original one) that does a pure get; so no creation. When the
ClientRequestHandler calls this get, it needs to verify if a null
ClientEndpoint is returned. If not.. he can do its original logic.. if it
is null... afaik the call should be ignored. But if this is correct is out
of my current knowledge. So hence the verification.

Peter

On Wed, Jul 10, 2013 at 3:21 PM, Oli notifications@github.com wrote:

Great! I was playing around with freeing some byte buffers and got it
almost stable on good logins. Was it something tricky? Is it possible in
GIT to see a Diff for a specific fix. I would love to see what you guys
changed. Thanks, Oliver

Sent via the Samsung GALAXY S®4 Active™

-------- Original message --------
From: Peter Veentjer notifications@github.com
Date: 07/10/2013 06:00 (GMT-05:00)
To: hazelcast/hazelcast hazelcast@noreply.github.com
Cc: Oli oma.000@gmail.com
Subject: Re: [hazelcast] server memory leak with client connections (#450)

It took a bit longer than expected, but we have found the cause of the
authentication OOME issue. Will be fixed shorty.

But I need to verify that I get an OOME when there is no authentication
problem. Perhaps it is caused by the same defect.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHubhttps://github.com//issues/450#issuecomment-20738143
.

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jul 10, 2013

I have a fix for the 3.0 client authentication problem:

#556

On the 2.x (master) and 3.0 branch I don't get any problems if I let a (native) client run for a long time without authentication problems. So I'm not able to reproduce that. I'm going to see what happens when on the 2.x branch the lite member is going to run into the authentication problem so that we at least can establish if it is an issue or not.

@oma-

This comment has been minimized.

Copy link
Author

commented Jul 10, 2013

Ok. I am pretty sure I saw it with a successful authentication as well just smaller. With many, many connect and disconnect cycles. Unfortunately I've not been on that project for a while now. 

Sent via the Samsung GALAXY S®4 Active™

-------- Original message --------
From: Peter Veentjer notifications@github.com
Date: 07/10/2013 11:49 (GMT-05:00)
To: hazelcast/hazelcast hazelcast@noreply.github.com
Cc: Oli oma.000@gmail.com
Subject: Re: [hazelcast] server memory leak with client connections (#450)

I have fot a fix 3.0 client authentication problem as well:

#556

On the 2.x (master) and 3.0 branch I don't get any problems if I let a (native) client run for a long time without authentication problems. So I'm not able to reproduce that. I'm going to see what happens when on the 2.x branch the lite member is going to run into the authentication problem so that we at least can establish if it is an issue or not.


Reply to this email directly or view it on GitHub.

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jul 10, 2013

I'll let it run this night and see what happens.

@pveentjer

This comment has been minimized.

Copy link
Member

commented Jul 11, 2013

For Hazelcast 3.0 I also saw the memory leak when there is no authentication problem. We are working on a fix. We know exactly what the cause is (it is a listener which isn't unregistered and keeps the CliendEndpoint alive). But we need to find the correct way to deregister the listener.

screen shot 2013-07-11 at 7 23 42 pm

Thank you very much for this very valuable bug report.

@pveentjer pveentjer closed this in 03c0174 Jul 12, 2013

mdogan added a commit that referenced this issue Jul 12, 2013

Merge pull request #562 from pveentjer/fix/2.x/oome-client-authentica…
…tion-failure

Fixes #450 : OOME on client authentication failure

@ghost ghost assigned pveentjer Jul 12, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.