Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prepared queries memory leak #59

Closed
justmara opened this issue Jun 7, 2013 · 16 comments

Comments

Projects
None yet
2 participants
@justmara
Copy link

commented Jun 7, 2013

This line of code attaches handler to Connection object:

connection.OnFailure += ConnectionOnOnFailure;

But there is no code to remove this handler. So connection keeps handler reference to this query object forever.
It looks like this in ants memory profiler. Results in constantly growing gc gen2 and massive memory leak.

@pchalamet

This comment has been minimized.

Copy link
Owner

commented Jun 7, 2013

This should not be a problem.
If the connection fails, it should be disposed, freeing in turn all subscribers.

Unless connection is not disposed, which could be the real problem if any.
I'll look at it.

  • Pierre

-----Original Message-----
From: "justmara" notifications@github.com
Sent: ‎07/‎06/‎2013 14:06
To: "pchalamet/cassandra-sharp" cassandra-sharp@noreply.github.com
Subject: [cassandra-sharp] prepared queries memory leak (#59)

This line of code attaches handler to Connection object:
connection.OnFailure += ConnectionOnOnFailure;
But there is no code to remove this handler. So connection keeps handler reference to this query object forever.
It looks like this in ants memory profiler. Results in constantly growing gc gen2 and massive memory leak.

Reply to this email directly or view it on GitHub.

@pchalamet

This comment has been minimized.

Copy link
Owner

commented Jun 7, 2013

But prepared connections should be cached and reused ? Do you ?

  • Pierre

-----Original Message-----
From: "justmara" notifications@github.com
Sent: ‎07/‎06/‎2013 14:06
To: "pchalamet/cassandra-sharp" cassandra-sharp@noreply.github.com
Subject: [cassandra-sharp] prepared queries memory leak (#59)

This line of code attaches handler to Connection object:
connection.OnFailure += ConnectionOnOnFailure;
But there is no code to remove this handler. So connection keeps handler reference to this query object forever.
It looks like this in ants memory profiler. Results in constantly growing gc gen2 and massive memory leak.

Reply to this email directly or view it on GitHub.

@justmara

This comment has been minimized.

Copy link
Author

commented Jun 7, 2013

As of reusing- I do not cache prepared queries forever, because it will bind all these queries to single connection.
I create prepared querie for 'batch' of 5-10 queries in a row, that comes in single service request.
So in this case my service eats tonns of memory because of prepared queries.
Default (and only available so far?) connection type is LongRunning, it keeps single connection to each node which is,used by all the queries. As I can see by netstat during my loadtest- there is only 1-2 opened connections to each node in cluster.
I do not see any way to manually refresh connections, and for sure closing/reopening connections will affect performance compared to persistent ones.

@justmara

This comment has been minimized.

Copy link
Author

commented Jun 7, 2013

And regarding handler- even if nothing fails, connection will keep reference to every once created prepared query.

@pchalamet

This comment has been minimized.

Copy link
Owner

commented Jun 7, 2013

I do not agree. You should really cache prepared queries. Otherwise there is no point in preparing.

If you are preparing several queries, they will use and stick to different endpoints if you are using the RoundRobin endpoint strategy.

  • Pierre

-----Original Message-----
From: "justmara" notifications@github.com
Sent: ‎07/‎06/‎2013 20:04
To: "pchalamet/cassandra-sharp" cassandra-sharp@noreply.github.com
Cc: "Pierre Chalamet" pierre@chalamet.net
Subject: Re: [cassandra-sharp] prepared queries memory leak (#59)

As of reusing- I do not cache prepared queries forever, because it will bind all these queries to single connection.
I create prepared querie for 'batch' of 5-10 queries in a row, that comes in single service request.
So in this case my service eats tonns of memory because of prepared queries.
Default (and only available so far?) connection type is LongRunning, it keeps single connection to each node which is,used by all the queries. As I can see by netstat during my loadtest- there is only 1-2 opened connections to each node in cluster.
I do not see any way to manually refresh connections, and for sure closing/reopening connections will affect performance compared to persistent ones.

Reply to this email directly or view it on GitHub.

@pchalamet

This comment has been minimized.

Copy link
Owner

commented Jun 7, 2013

I can go the weak reference way (and will) to cover disposal of prepared queries.

But cache them as possible.

  • Pierre

-----Original Message-----
From: "justmara" notifications@github.com
Sent: ‎07/‎06/‎2013 20:06
To: "pchalamet/cassandra-sharp" cassandra-sharp@noreply.github.com
Cc: "Pierre Chalamet" pierre@chalamet.net
Subject: Re: [cassandra-sharp] prepared queries memory leak (#59)

And regarding handler- even if nothing fails, connection will keep reference to every once created prepared query.

Reply to this email directly or view it on GitHub.

@pchalamet

This comment has been minimized.

Copy link
Owner

commented Jun 7, 2013

prepared queries are now holding a weak reference on connection. This mean prepared queries can gc'ed if no more used. Available on master (build from source).

Let me know if this fix your problem.

@pchalamet

This comment has been minimized.

Copy link
Owner

commented Jun 8, 2013

Version 3.2.3 should fix your issue (available on NuGet).

@justmara

This comment has been minimized.

Copy link
Author

commented Jun 10, 2013

So let me explain what am I talking about. Maybe I understand anything wrong about connection choosing strategy and you can point me how to deal with it:
I'm making a service. Its pretty stupid CRUD with 4 operations. Some of'em can be 'batched' (one request can contain a set of similar operations). Assume 'Create' method also makes background-priority request with multiple queries that is sent trough 'BEGIN/APPLY BATCH'. So no prepared query in it`s case ;)
Assume service, and 4 cassandra nodes in cluster, named C1,C2,C3,C4.
So if I follow your strategy of caching prepared I can easily achieve this situation:

  1. 'Create' service query populates prepared 'create' query. It binds to first available connection.So since then every 'create' c* query will go trough this connection to this single node C1.
  2. Aslo 'create' service request makes background request to casandra, which uses next connection from round-robin cluster pool, C2.
  3. Three 'Create' requests in a row will result in 3 requests to C1 (prepared) and one to each of C2,C3,C4 nodes.
  4. The comes 'Update' request. It requires new 'update' prepared query. Service populates it and round-robin choses C1 connection for it! just because its available now and this is its turn to be chosen.
    So this way we can easily achieve situation where MANY very frequent queries are forever attached to single connection to single node (oh, sure, until node or just connection to it dies).

And coming back to event handling.
I've looked at your code and imho you misunderstood me. In your case Connection is long-lived object. And the IPreparedQuery adds its eventhandler to connection's event. So even if query is released it cannot be garbage collected until Connection releases too. Connection still holds reference to query object in its event handlers collection. There must be something like this imo:

public interface IPreparedQuery<out T> : IDisposable
...
public void Dispose()
{
    if (_connection != null)
        _connection.OnFailure -= ConnectionOnOnFailure;
    _connection = null;
}
...
using (var prepared = Cluster.CreatePocoCommand().Prepare(cqlString))
{
    var tasks = queue.Select(obj => prepared.Execute(obj).AsFuture()).ToArray();
    Task.WaitAll(tasks);
}
@pchalamet

This comment has been minimized.

Copy link
Owner

commented Jun 10, 2013

Hi,

What you would like is a way to round robin prepared queries among available nodes. That's reasonable. --> could you create an improvement issue on github please ?

The changes I made to the way events are manages does fix your problem. Connections are now holding a weak reference on the target handler (and here a prepared query specifically) so that if you release a prepared query, it will be unregistered if gc'ed.
I would rather make prepared query disposable - this would be wiser anyway.

  • Pierre

-----Original Message-----
From: "justmara" notifications@github.com
Sent: ‎6/‎10/‎2013 11:10
To: "pchalamet/cassandra-sharp" cassandra-sharp@noreply.github.com
Cc: "Pierre Chalamet" pierre@chalamet.net
Subject: Re: [cassandra-sharp] prepared queries memory leak (#59)

So let me explain what am I talking about. Maybe I understand anything wrong about connection choosing strategy and you can point me how to deal with it:
I'm making a service. Its pretty stupid CRUD with 4 operations. Some of'em can be 'batched' (one request can contain a set of similar operations). Assume 'Create' method also makes background-priority request with multiple queries that is sent trough 'BEGIN/APPLY BATCH'. So no prepared query in it`s case ;)
Assume service, and 4 cassandra nodes in cluster, named C1,C2,C3,C4.
So if I follow your strategy of caching prepared I can easily achieve this situation:

  1. 'Create' service query populates prepared 'create' query. It binds to first available connection.So since then every 'create' c* query will go trough this connection to this single node C1.
  2. Aslo 'create' service request makes background request to casandra, which uses next connection from round-robin cluster pool, C2.
  3. Three 'Create' requests in a row will result in 3 requests to C1 (prepared) and one to each of C2,C3,C4 nodes.
  4. The comes 'Update' request. It requires new 'update' prepared query. Service populates it and round-robin choses C1 connection for it! just because its available now and this is its turn to be chosen.
    So this way we can easily achieve situation where MANY very frequent queries are forever attached to single connection to single node (oh, sure, until node or just connection to it dies).
    And coming back to event handling.
    I've looked at your code and imho you misunderstood me. In your case Connection is long-lived object. And the IPreparedQuery adds its eventhandler to connection's event. So even if query is released it cannot be garbage collected until Connection releases too. Connection still holds reference to query object in its event handlers collection. There must be something like this imo:
    public interface IPreparedQuery : IDisposable
    ...
    public void Dispose()
    {
    if (_connection != null)
    _connection.OnFailure -= ConnectionOnOnFailure;
    _connection = null;
    }
    ...
    using (var prepared = Cluster.CreatePocoCommand().Prepare(cqlString))
    {
    var tasks = queue.Select(obj => prepared.Execute(obj).AsFuture()).ToArray();
    Task.WaitAll(tasks);
    }

    Reply to this email directly or view it on GitHub.
@justmara

This comment has been minimized.

Copy link
Author

commented Jun 13, 2013

And is it possible? I mean I did not gone deep to cql specifications: are prepared queries shared between nodes in cluster? Sure it will be nice to allow prepared queries to conform chosen connection strategy (random/round robin/..).
But accourding to this - it's impossible. also node restart will kill the prepared statement at all 8(

@pchalamet

This comment has been minimized.

Copy link
Owner

commented Jun 13, 2013

The problem is not what is possible server side but client side.

Prepared queries are not shared at all obviously but a smart client can
easily execute on different endpoints using a given strategy. Just prepare
on different nodes and switch accordingly.

This would require a strategy for prepared queries but that's not really a
problem imho.

Btw, turned prepared query as disposable in master. This is far more
elegant than garbage'able event but as of 3.2 this was not possible to fix this without changing interface contract.

  • Pierre
@justmara

This comment has been minimized.

Copy link
Author

commented Jun 17, 2013

  1. you've updated c*# lib to 3.2.3 in NuGet, but *.Interfaces library is still 3.1.6, so .Prepare command still returns non-disposable IPreparedQuery ;)
  2. looks like client must be "uber-smart" then. It must obey connection strategy for prepared queries, prepare itself for each new query and 'remember' for which connection it is already prepared.
@pchalamet

This comment has been minimized.

Copy link
Owner

commented Jun 17, 2013

1/

3.2 is the version released through NuGet. This version won’t have disposable IPreparedQuery because this is a change in contract. Only gc’able prepared queries are supported in 3.2 (3.2.3 exactly) to solve your so-called memory leak.

IDisposable prepared queries are available on gihub/master. And then will be available on NuGet when 3.3 will be feature complete. If you want IDisposable prepared queries, use the sources from github.

2/

If you want something smart, build your own cache on top of cassandra-sharp (ConcurrentDictionary<string, IPreparedQuery> probably the easiest to build). It really just depend on the way client want to cache such queries: ttl, batch…

  • Pierre Chalamet

From: justmara [mailto:notifications@github.com]
Sent: Monday, June 17, 2013 8:05 AM
To: pchalamet/cassandra-sharp
Cc: Pierre Chalamet
Subject: Re: [cassandra-sharp] prepared queries memory leak (#59)

  1. you've updated c*# lib to 3.2.3 in NuGet, but *.Interfaces library is still 3.1.6, so .Prepare command still returns non-disposable IPreparedQuery ;)
    
  2. looks like client must be "uber-smart" then. It must obey connection strategy for prepared queries, prepare itself for each new query and 'remember' for which connection it is already prepared.
    


Reply to this email directly or view it on GitHub #59 (comment) . https://github.com/notifications/beacon/_W-KgiNRE22l2TYlJFJ2rL_0Qyn7UH_yNKQK9xtDyMKdTKhoQxL853DeoQWf1-s5.gif

@pchalamet

This comment has been minimized.

Copy link
Owner

commented Jun 17, 2013

Just thinking about this: there is a project named cassandra-sharp-contrib (https://github.com/pchalamet/cassandra-sharp-contrib) where you could eventually submit a patch to support such feature (the prepared query cache).

Let me know if you'd like some help !

Thanks !

  • Pierre

-----Original Message-----
From: "justmara" notifications@github.com
Sent: ‎6/‎17/‎2013 8:05
To: "pchalamet/cassandra-sharp" cassandra-sharp@noreply.github.com
Cc: "Pierre Chalamet" pierre@chalamet.net
Subject: Re: [cassandra-sharp] prepared queries memory leak (#59)

you've updated c*# lib to 3.2.3 in NuGet, but *.Interfaces library is still 3.1.6, so .Prepare command still returns non-disposable IPreparedQuery ;)
looks like client must be "uber-smart" then. It must obey connection strategy for prepared queries, prepare itself for each new query and 'remember' for which connection it is already prepared.

Reply to this email directly or view it on GitHub.

@pchalamet pchalamet closed this Jul 14, 2013

@pchalamet

This comment has been minimized.

Copy link
Owner

commented Jul 14, 2013

Released in 3.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.