Black Listing To Aggressive #51

Closed
edweip3 opened this Issue Aug 23, 2012 · 57 comments

Comments

Projects
None yet
7 participants
@edweip3
Contributor

edweip3 commented Aug 23, 2012

In the RoundRobinServerManager, if a connection cannot be made to a server, it is added to the blacklist. Once it's in the blacklist, it is there for the life time of RoundRobinServerManager, which is a problem if a connection pool is used, and the CassandraContext is held open for a long time.

@ghost ghost assigned nberardi Aug 26, 2012

@nberardi

This comment has been minimized.

Show comment Hide comment
@nberardi

nberardi Aug 26, 2012

Contributor

Please mock up a change and submit a pull request this is probably a valuable addition to have.

Contributor

nberardi commented Aug 26, 2012

Please mock up a change and submit a pull request this is probably a valuable addition to have.

@lhowlader

This comment has been minimized.

Show comment Hide comment
@lhowlader

lhowlader Aug 28, 2012

I am currently working on implementing the Circuit Breaker pattern along with unit tests. Once it is implemented and tested, we will not need to blacklist a server. I believe the Circuit Breaker pattern is relevant to this issue. Please feel free to comment if you have considered the pattern and have any insight on why it should not be used. Current plan is to be able to configure the number of attempts that trip the breaker. Also, confugrable min (millisec) and max (millsec) params that allows the retries (to see if Breaker in Open state can be Closed) progrssively between min and max. Once max is reached, the Circuit Breaker for the node will publish an event that can be subsribed to indicating that the node require special attention/intervention.

I am currently working on implementing the Circuit Breaker pattern along with unit tests. Once it is implemented and tested, we will not need to blacklist a server. I believe the Circuit Breaker pattern is relevant to this issue. Please feel free to comment if you have considered the pattern and have any insight on why it should not be used. Current plan is to be able to configure the number of attempts that trip the breaker. Also, confugrable min (millisec) and max (millsec) params that allows the retries (to see if Breaker in Open state can be Closed) progrssively between min and max. Once max is reached, the Circuit Breaker for the node will publish an event that can be subsribed to indicating that the node require special attention/intervention.

@nberardi

This comment has been minimized.

Show comment Hide comment
@nberardi

nberardi Aug 29, 2012

Contributor

I think this will be a great addition. I never really liked the blacklist, so I am happy to hear somebody is trying to get rid of it.

Contributor

nberardi commented Aug 29, 2012

I think this will be a great addition. I never really liked the blacklist, so I am happy to hear somebody is trying to get rid of it.

@dlbromen

This comment has been minimized.

Show comment Hide comment
@dlbromen

dlbromen Oct 12, 2012

Contributor

Update:
@Smoles and myself have taken over this functionality from lhowlander. There will probably be a lot of discussion about what we've done so please let me know if you'd like to move it to another issue or a different communication channel.

Some of the design principles:

  1. Do not blacklist a server forever.
  2. Blacklist on a configurable number of failures (not just one).
  3. Automatically retry a server after a configurable amount of time. If it fails again, reblacklist. If it succeeds, re-enable it.
Contributor

dlbromen commented Oct 12, 2012

Update:
@Smoles and myself have taken over this functionality from lhowlander. There will probably be a lot of discussion about what we've done so please let me know if you'd like to move it to another issue or a different communication channel.

Some of the design principles:

  1. Do not blacklist a server forever.
  2. Blacklist on a configurable number of failures (not just one).
  3. Automatically retry a server after a configurable amount of time. If it fails again, reblacklist. If it succeeds, re-enable it.
@dlbromen

This comment has been minimized.

Show comment Hide comment
@dlbromen

dlbromen Oct 12, 2012

Contributor

Sorry, accidently hit submit...

  1. Don't open the floodgates when retrying a server. Just try a single connection
  2. When re-enabling a server, make it available for use (especially true when using Pooling=True)
  3. Don't consider everything a failure. Set of specific I/O related exceptions.

.. and there are probably more.

We've got something staged up, here: https://github.com/W3i/fluentcassandra/tree/circuit_breaker

Let us know how you would like to proceed. Thank you.

Contributor

dlbromen commented Oct 12, 2012

Sorry, accidently hit submit...

  1. Don't open the floodgates when retrying a server. Just try a single connection
  2. When re-enabling a server, make it available for use (especially true when using Pooling=True)
  3. Don't consider everything a failure. Set of specific I/O related exceptions.

.. and there are probably more.

We've got something staged up, here: https://github.com/W3i/fluentcassandra/tree/circuit_breaker

Let us know how you would like to proceed. Thank you.

@ilushka85

This comment has been minimized.

Show comment Hide comment
@ilushka85

ilushka85 Oct 23, 2012

Is this ready for prime time?

Is this ready for prime time?

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 13, 2012

Contributor

I've implemented something similar that I am currently using in production but also takes it a step further and classify's various exception types as retry/clienthealthy -- so while you might get a connection that's good from CassandraSession.GetClient(), it might go down between the time it was opened successfully and when a command is executed against it. If we have more than one server in the list try to move to another one up to a max retry count.

Contributor

eplowe commented Nov 13, 2012

I've implemented something similar that I am currently using in production but also takes it a step further and classify's various exception types as retry/clienthealthy -- so while you might get a connection that's good from CassandraSession.GetClient(), it might go down between the time it was opened successfully and when a command is executed against it. If we have more than one server in the list try to move to another one up to a max retry count.

@ilushka85

This comment has been minimized.

Show comment Hide comment
@ilushka85

ilushka85 Nov 13, 2012

eplowe: do you have the code you can submit for it? I;d like to take a look and possibly use it in our scenario as well.

eplowe: do you have the code you can submit for it? I;d like to take a look and possibly use it in our scenario as well.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 13, 2012

Contributor

Sure -- When I get home I'll commit into my fork. My retry implementation
is definitely not as elegant as yours, but works like a charm. I just
create a Timer in RoundRobinServerManager that is triggered on
Blacklisting. Also added two properties to CassandraException to signify
retry/healthy-ness of a node. And in Operation I modified TryExecute to
inspect the the type of exception thrown and then build a proper
cassandraexception from it as well as if it should retry and if the node is
healthy. ExecutionOperation() in CassandraSession was then modified to loop
while there was no exception or until a counter reaches the max defined
retry count.

On Tue, Nov 13, 2012 at 4:27 PM, ilushka85 notifications@github.com wrote:

eplowe: do you have the code you can submit for it? I;d like to take a
look and possibly use it in our scenario as well.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10344412.

Contributor

eplowe commented Nov 13, 2012

Sure -- When I get home I'll commit into my fork. My retry implementation
is definitely not as elegant as yours, but works like a charm. I just
create a Timer in RoundRobinServerManager that is triggered on
Blacklisting. Also added two properties to CassandraException to signify
retry/healthy-ness of a node. And in Operation I modified TryExecute to
inspect the the type of exception thrown and then build a proper
cassandraexception from it as well as if it should retry and if the node is
healthy. ExecutionOperation() in CassandraSession was then modified to loop
while there was no exception or until a counter reaches the max defined
retry count.

On Tue, Nov 13, 2012 at 4:27 PM, ilushka85 notifications@github.com wrote:

eplowe: do you have the code you can submit for it? I;d like to take a
look and possibly use it in our scenario as well.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10344412.

@nberardi

This comment has been minimized.

Show comment Hide comment
@nberardi

nberardi Nov 13, 2012

Contributor

If any of you guys want to send a pull request, I would be happy to take it. I know this is an issue that we need to get resolved soon. Eventually I would like to make the a ServiceManager that is ring aware so that the RoundRobin can be retired.

Contributor

nberardi commented Nov 13, 2012

If any of you guys want to send a pull request, I would be happy to take it. I know this is an issue that we need to get resolved soon. Eventually I would like to make the a ServiceManager that is ring aware so that the RoundRobin can be retired.

@tjake

This comment has been minimized.

Show comment Hide comment
@tjake

tjake Nov 14, 2012

I was also thinking of the circuit breaker pattern. I'll try out @dlbromen's patch

tjake commented Nov 14, 2012

I was also thinking of the circuit breaker pattern. I'll try out @dlbromen's patch

@ilushka85

This comment has been minimized.

Show comment Hide comment
@ilushka85

ilushka85 Nov 14, 2012

eplowe did you get a chance to upload your changes?

eplowe did you get a chance to upload your changes?

@tjake

This comment has been minimized.

Show comment Hide comment
@tjake

tjake Nov 14, 2012

tjake commented Nov 14, 2012

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 14, 2012

Contributor

No sorry been kinda crazy with the day job. Let me get it by today.
On Nov 14, 2012 11:08 AM, "Jake Luciani" notifications@github.com wrote:

@eplowe https://github.com/eplowe ^


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10371782.

Contributor

eplowe commented Nov 14, 2012

No sorry been kinda crazy with the day job. Let me get it by today.
On Nov 14, 2012 11:08 AM, "Jake Luciani" notifications@github.com wrote:

@eplowe https://github.com/eplowe ^


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10371782.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 15, 2012

Contributor

https://github.com/eplowe/fluentcassandra/tree/blacklisting-enhancements

Let me know what you think. So far from my testing, it appears to be done what I expect. With that said, there may be some edge cases not accounted for.

Contributor

eplowe commented Nov 15, 2012

https://github.com/eplowe/fluentcassandra/tree/blacklisting-enhancements

Let me know what you think. So far from my testing, it appears to be done what I expect. With that said, there may be some edge cases not accounted for.

@nberardi

This comment has been minimized.

Show comment Hide comment
@nberardi

nberardi Nov 15, 2012

Contributor

Should we really be using RoundRobin anymore? It was created at a time when there were no mechanisms to describe the nodes in a cluster. But now, but we now have that ability with the describe_ring function.

https://github.com/managedfusion/fluentcassandra/blob/master/src/Operations/CassandraClientWrapper.cs#L154

So I guess my question is should we keep putting lipstick on this pig, or move to something that is better suited for Cassandra moving forward?

Contributor

nberardi commented Nov 15, 2012

Should we really be using RoundRobin anymore? It was created at a time when there were no mechanisms to describe the nodes in a cluster. But now, but we now have that ability with the describe_ring function.

https://github.com/managedfusion/fluentcassandra/blob/master/src/Operations/CassandraClientWrapper.cs#L154

So I guess my question is should we keep putting lipstick on this pig, or move to something that is better suited for Cassandra moving forward?

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 15, 2012

Contributor

Agreed 100% --- but for now either dlbromens patch, or my patch can act as a stop-gap until this new mechanism is in place.

Contributor

eplowe commented Nov 15, 2012

Agreed 100% --- but for now either dlbromens patch, or my patch can act as a stop-gap until this new mechanism is in place.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 15, 2012

Contributor

With that said, I'd be more than happy to work on an auto discovery mechanism.

Contributor

eplowe commented Nov 15, 2012

With that said, I'd be more than happy to work on an auto discovery mechanism.

@tjake

This comment has been minimized.

Show comment Hide comment
@tjake

tjake Nov 15, 2012

What does autodiscover have todo with round robin? Wouldn't you use autodiscover to get the list of servers to round robin? Also auto discover means you need to be sure you don't connect across DCs.

Some clients like Astynax try to connect to the one of the replicas so you cut down on coordinators hops. But even then you would still need some kind of round robin across replicas.

We've been running with @dlbromen's patch and so far so good.

tjake commented Nov 15, 2012

What does autodiscover have todo with round robin? Wouldn't you use autodiscover to get the list of servers to round robin? Also auto discover means you need to be sure you don't connect across DCs.

Some clients like Astynax try to connect to the one of the replicas so you cut down on coordinators hops. But even then you would still need some kind of round robin across replicas.

We've been running with @dlbromen's patch and so far so good.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 15, 2012

Contributor

You'd need something similar for sure. The idea is that as long as your
code can connect to at least one c* box it'll update itself with a list of
servers. So you could add/remove servers to your cluster without having to
update your configs/code in most situations.

Hector does this and uses policies to determine which machine to connect to
-- load balanced, round robin , etc
On Nov 15, 2012 8:56 AM, "Jake Luciani" notifications@github.com wrote:

What does autodiscover have todo with round robin? Wouldn't you use
autodiscover to get the list of servers to round robin? Also auto discover
means you need to be sure you don't connect across DCs.

Some clients like Astynax try to connect to the one of the replicas so you
cut down on coordinators hops. But even then you would still need some kind
of round robin across replicas.

We've been running with @dlbromen https://github.com/dlbromen's patch
and so far so good.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10409128.

Contributor

eplowe commented Nov 15, 2012

You'd need something similar for sure. The idea is that as long as your
code can connect to at least one c* box it'll update itself with a list of
servers. So you could add/remove servers to your cluster without having to
update your configs/code in most situations.

Hector does this and uses policies to determine which machine to connect to
-- load balanced, round robin , etc
On Nov 15, 2012 8:56 AM, "Jake Luciani" notifications@github.com wrote:

What does autodiscover have todo with round robin? Wouldn't you use
autodiscover to get the list of servers to round robin? Also auto discover
means you need to be sure you don't connect across DCs.

Some clients like Astynax try to connect to the one of the replicas so you
cut down on coordinators hops. But even then you would still need some kind
of round robin across replicas.

We've been running with @dlbromen https://github.com/dlbromen's patch
and so far so good.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10409128.

@ilushka85

This comment has been minimized.

Show comment Hide comment
@ilushka85

ilushka85 Nov 15, 2012

@eplowe @dlbromen @tjake Are you guys using connection pool in your environment? If yes are you seeing it properly release connections?

@eplowe @dlbromen @tjake Are you guys using connection pool in your environment? If yes are you seeing it properly release connections?

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 15, 2012

Contributor

If you don't define a connection lifetime I am seeing the connections left
open, which makes sense, at least to me. The most expensive part of
creating a socket connection is the establish.

On Thu, Nov 15, 2012 at 11:48 AM, ilushka85 notifications@github.comwrote:

@eplowe https://github.com/eplowe @dlbromenhttps://github.com/dlbromen
@tjake https://github.com/tjake Are you guys using connection pool in
your environment? If yes are you seeing it properly release connections?


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10415525.

Contributor

eplowe commented Nov 15, 2012

If you don't define a connection lifetime I am seeing the connections left
open, which makes sense, at least to me. The most expensive part of
creating a socket connection is the establish.

On Thu, Nov 15, 2012 at 11:48 AM, ilushka85 notifications@github.comwrote:

@eplowe https://github.com/eplowe @dlbromenhttps://github.com/dlbromen
@tjake https://github.com/tjake Are you guys using connection pool in
your environment? If yes are you seeing it properly release connections?


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10415525.

@ilushka85

This comment has been minimized.

Show comment Hide comment
@ilushka85

ilushka85 Nov 15, 2012

@eplowe well the connections should be left open... the question is if they get returned properly. This is an issue we have had that I have commented on and had a slight but improper fix for this issue.

@eplowe well the connections should be left open... the question is if they get returned properly. This is an issue we have had that I have commented on and had a slight but improper fix for this issue.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 15, 2012

Contributor

@ilushka85 Understood. And by returned properly, you mean from say an
exception? Based on my tests, and this is just from my particular use cases
so YMMV, I am seeing the call to PooledConnectionProvider::Close happen
when CassandraSession::Dispose is called -- which will remove the
connection from the usedconnections list and return it back to the
freeconnection queue. I'll take a look at the issue for further context.

On Thu, Nov 15, 2012 at 12:11 PM, ilushka85 notifications@github.comwrote:

@eplowe https://github.com/eplowe well the connections should be left
open... the question is if they get returned properly. This is an issue we
have had that I have commented on and had a slight but improper fix for
this issue.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10416529.

Contributor

eplowe commented Nov 15, 2012

@ilushka85 Understood. And by returned properly, you mean from say an
exception? Based on my tests, and this is just from my particular use cases
so YMMV, I am seeing the call to PooledConnectionProvider::Close happen
when CassandraSession::Dispose is called -- which will remove the
connection from the usedconnections list and return it back to the
freeconnection queue. I'll take a look at the issue for further context.

On Thu, Nov 15, 2012 at 12:11 PM, ilushka85 notifications@github.comwrote:

@eplowe https://github.com/eplowe well the connections should be left
open... the question is if they get returned properly. This is an issue we
have had that I have commented on and had a slight but improper fix for
this issue.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10416529.

@ilushka85

This comment has been minimized.

Show comment Hide comment
@ilushka85

ilushka85 Nov 15, 2012

heres a simple test if you debug this and say after the tenth time thru the loop look at the connection pool you will see ten connections are in use and not returned to freed.

var builder = new ConnectionBuilder(ConfigurationManager.AppSettings["ResultlyDataConnectionString"]);

    var db = new CassandraContext(builder);



    CassandraColumnFamily usersColumnFamily = new CassandraColumnFamily<UTF8Type>(db, "Users");


    for (int x = 0; x < 100; x++)
    {
        var rows = usersColumnFamily.Get().TakeKeys(25);
        string lastKey = rows.Last().Key;


        Console.WriteLine(lastKey);
    }

heres a simple test if you debug this and say after the tenth time thru the loop look at the connection pool you will see ten connections are in use and not returned to freed.

var builder = new ConnectionBuilder(ConfigurationManager.AppSettings["ResultlyDataConnectionString"]);

    var db = new CassandraContext(builder);



    CassandraColumnFamily usersColumnFamily = new CassandraColumnFamily<UTF8Type>(db, "Users");


    for (int x = 0; x < 100; x++)
    {
        var rows = usersColumnFamily.Get().TakeKeys(25);
        string lastKey = rows.Last().Key;


        Console.WriteLine(lastKey);
    }
@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 15, 2012

Contributor

You're right -- I did a similar test against a my dev cassandra server. I
think I see a possible solution. I'll post reply again when I have more.

On Thu, Nov 15, 2012 at 12:32 PM, ilushka85 notifications@github.comwrote:

heres a simple test if you debug this and say after the tenth time thru
the loop look at the connection pool you will see ten connections are in
use and not returned to freed.

var builder = new
ConnectionBuilder(ConfigurationManager.AppSettings["ResultlyDataConnectionString"]);

var db = new CassandraContext(builder);



CassandraColumnFamily usersColumnFamily = new CassandraColumnFamily<UTF8Type>(db, "Users");


for (int x = 0; x < 100; x++)
{
    var rows = usersColumnFamily.Get().TakeKeys(25);
    string lastKey = rows.Last().Key;


    Console.WriteLine(lastKey);
}


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10417317.

Contributor

eplowe commented Nov 15, 2012

You're right -- I did a similar test against a my dev cassandra server. I
think I see a possible solution. I'll post reply again when I have more.

On Thu, Nov 15, 2012 at 12:32 PM, ilushka85 notifications@github.comwrote:

heres a simple test if you debug this and say after the tenth time thru
the loop look at the connection pool you will see ten connections are in
use and not returned to freed.

var builder = new
ConnectionBuilder(ConfigurationManager.AppSettings["ResultlyDataConnectionString"]);

var db = new CassandraContext(builder);



CassandraColumnFamily usersColumnFamily = new CassandraColumnFamily<UTF8Type>(db, "Users");


for (int x = 0; x < 100; x++)
{
    var rows = usersColumnFamily.Get().TakeKeys(25);
    string lastKey = rows.Last().Key;


    Console.WriteLine(lastKey);
}


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10417317.

@ilushka85

This comment has been minimized.

Show comment Hide comment
@ilushka85

ilushka85 Nov 15, 2012

@eplowe i posted a hack that fixes this in my other message:

My crude attempt at fixing this was inserting the following snippets in each of the operation function classes

var temp = output.GetEnumerator();
if (Session.WasDisposed)
{
Session.WasDisposed = false;
Session.Dispose();
}

Any better ideas on how to do the above? it has to do with how .net doesnt really execute code till later.

IN addition i noticed connections having an exception do not get returned to pool or closed in original code.

@eplowe i posted a hack that fixes this in my other message:

My crude attempt at fixing this was inserting the following snippets in each of the operation function classes

var temp = output.GetEnumerator();
if (Session.WasDisposed)
{
Session.WasDisposed = false;
Session.Dispose();
}

Any better ideas on how to do the above? it has to do with how .net doesnt really execute code till later.

IN addition i noticed connections having an exception do not get returned to pool or closed in original code.

@dlbromen

This comment has been minimized.

Show comment Hide comment
@dlbromen

dlbromen Nov 15, 2012

Contributor

I haven't had a chance to catch up with the recent activity here, but can you please take the Connection releasing discussion to issue #29. Thank you.

Contributor

dlbromen commented Nov 15, 2012

I haven't had a chance to catch up with the recent activity here, but can you please take the Connection releasing discussion to issue #29. Thank you.

@ilushka85

This comment has been minimized.

Show comment Hide comment
@ilushka85

ilushka85 Nov 15, 2012

also current round robin method and connection pool means that second and subsequent queries you execute are not hit on next server in round robin but on the same one open connection ... so if your code only requires the use of one connection and you are using a pool all executions will happen on that one server regardless of how many you have defined.

also current round robin method and connection pool means that second and subsequent queries you execute are not hit on next server in round robin but on the same one open connection ... so if your code only requires the use of one connection and you are using a pool all executions will happen on that one server regardless of how many you have defined.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 15, 2012

Contributor

I have a general idea in my head that I am churning over while at lunch
atm. Connections not returning due to exception (server down, invalid
request, etc) should be fixed in my patch that is currently in my fork.
Let's move this thread to the proper issue # moving forward.
On Nov 15, 2012 12:45 PM, "ilushka85" notifications@github.com wrote:

@eplowe https://github.com/eplowe i posted a hack that fixes this in my
other message:

My crude attempt at fixing this was inserting the following snippets in
each of the operation function classes

var temp = output.GetEnumerator();
if (Session.WasDisposed)
{
Session.WasDisposed = false;
Session.Dispose();
}

Any better ideas on how to do the above? it has to do with how .net doesnt
really execute code till later.

IN addition i noticed connections having an exception do not get returned
to pool or closed in original code.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10417842.

Contributor

eplowe commented Nov 15, 2012

I have a general idea in my head that I am churning over while at lunch
atm. Connections not returning due to exception (server down, invalid
request, etc) should be fixed in my patch that is currently in my fork.
Let's move this thread to the proper issue # moving forward.
On Nov 15, 2012 12:45 PM, "ilushka85" notifications@github.com wrote:

@eplowe https://github.com/eplowe i posted a hack that fixes this in my
other message:

My crude attempt at fixing this was inserting the following snippets in
each of the operation function classes

var temp = output.GetEnumerator();
if (Session.WasDisposed)
{
Session.WasDisposed = false;
Session.Dispose();
}

Any better ideas on how to do the above? it has to do with how .net doesnt
really execute code till later.

IN addition i noticed connections having an exception do not get returned
to pool or closed in original code.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10417842.

@ilushka85

This comment has been minimized.

Show comment Hide comment
@ilushka85

ilushka85 Nov 15, 2012

@eplowe Ok. Waiting for your input there.

@eplowe Ok. Waiting for your input there.

@ilushka85

This comment has been minimized.

Show comment Hide comment
@ilushka85

ilushka85 Nov 16, 2012

@eplowe any thoughts on my comment above about your round robin replacement changes not taking into account using other servers when using a pool?

@eplowe any thoughts on my comment above about your round robin replacement changes not taking into account using other servers when using a pool?

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 16, 2012

Contributor

I have a change that I am still testing/not happy with that does auto
discovery plus returning potentially a diff node that's up for a single
context that's running many different commands so its not just leaning on
the first node in the queue that's up and available
On Nov 16, 2012 11:13 AM, "ilushka85" notifications@github.com wrote:

@eplowe https://github.com/eplowe any thoughts on my comment above
about your round robin replacement changes not taking into account using
other servers when using a pool?


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10452221.

Contributor

eplowe commented Nov 16, 2012

I have a change that I am still testing/not happy with that does auto
discovery plus returning potentially a diff node that's up for a single
context that's running many different commands so its not just leaning on
the first node in the queue that's up and available
On Nov 16, 2012 11:13 AM, "ilushka85" notifications@github.com wrote:

@eplowe https://github.com/eplowe any thoughts on my comment above
about your round robin replacement changes not taking into account using
other servers when using a pool?


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10452221.

@ilushka85

This comment has been minimized.

Show comment Hide comment
@ilushka85

ilushka85 Nov 17, 2012

@eplowe using your blacklisting enhancements with enough timeout exceptions using the pool the connections still do not get reset and all subsequent connections fail as pool is exhausted.

@eplowe using your blacklisting enhancements with enough timeout exceptions using the pool the connections still do not get reset and all subsequent connections fail as pool is exhausted.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 17, 2012

Contributor

Please provide an example.
On Nov 17, 2012 5:05 PM, "ilushka85" notifications@github.com wrote:

@eplowe https://github.com/eplowe using your blacklisting enhancements
with enough timeout exceptions using the pool the connections still do not
get reset and all subsequent connections fail as pool is exhausted.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479535.

Contributor

eplowe commented Nov 17, 2012

Please provide an example.
On Nov 17, 2012 5:05 PM, "ilushka85" notifications@github.com wrote:

@eplowe https://github.com/eplowe using your blacklisting enhancements
with enough timeout exceptions using the pool the connections still do not
get reset and all subsequent connections fail as pool is exhausted.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479535.

@ilushka85

This comment has been minimized.

Show comment Hide comment
@ilushka85

ilushka85 Nov 17, 2012

Don't have an example per say but that's what we see happening in our app
now.

Sent from my iPhone

On Nov 17, 2012, at 4:10 PM, Eric Plowe notifications@github.com wrote:

Please provide an example.
On Nov 17, 2012 5:05 PM, "ilushka85" notifications@github.com wrote:

@eplowe https://github.com/eplowe using your blacklisting enhancements
with enough timeout exceptions using the pool the connections still do
not
get reset and all subsequent connections fail as pool is exhausted.


Reply to this email directly or view it on GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479535>.


Reply to this email directly or view it on
GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479589.

Don't have an example per say but that's what we see happening in our app
now.

Sent from my iPhone

On Nov 17, 2012, at 4:10 PM, Eric Plowe notifications@github.com wrote:

Please provide an example.
On Nov 17, 2012 5:05 PM, "ilushka85" notifications@github.com wrote:

@eplowe https://github.com/eplowe using your blacklisting enhancements
with enough timeout exceptions using the pool the connections still do
not
get reset and all subsequent connections fail as pool is exhausted.


Reply to this email directly or view it on GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479535>.


Reply to this email directly or view it on
GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479589.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 17, 2012

Contributor

I've simulated throwing random timeout executions and from what I am seeing
I am seeing connections go back from _usedConnections to the free
connection queue. Are you talking about fluentcassandra throwing this exception?

"No connection could be made because all servers have failed."

If so, that's because I am marking the Client as not healthy for a
timeout, it should actually still technically be healthy, that's an
oversight on my part

On Sat, Nov 17, 2012 at 5:11 PM, ilushka85 notifications@github.com wrote:

Don't have an example per say but that's what we see happening in our app
now.

Sent from my iPhone

On Nov 17, 2012, at 4:10 PM, Eric Plowe notifications@github.com wrote:

Please provide an example.
On Nov 17, 2012 5:05 PM, "ilushka85" notifications@github.com wrote:

@eplowe https://github.com/eplowe using your blacklisting
enhancements
with enough timeout exceptions using the pool the connections still do
not
get reset and all subsequent connections fail as pool is exhausted.


Reply to this email directly or view it on GitHub<

https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479535>.


Reply to this email directly or view it on
GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479589>.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479597.

Contributor

eplowe commented Nov 17, 2012

I've simulated throwing random timeout executions and from what I am seeing
I am seeing connections go back from _usedConnections to the free
connection queue. Are you talking about fluentcassandra throwing this exception?

"No connection could be made because all servers have failed."

If so, that's because I am marking the Client as not healthy for a
timeout, it should actually still technically be healthy, that's an
oversight on my part

On Sat, Nov 17, 2012 at 5:11 PM, ilushka85 notifications@github.com wrote:

Don't have an example per say but that's what we see happening in our app
now.

Sent from my iPhone

On Nov 17, 2012, at 4:10 PM, Eric Plowe notifications@github.com wrote:

Please provide an example.
On Nov 17, 2012 5:05 PM, "ilushka85" notifications@github.com wrote:

@eplowe https://github.com/eplowe using your blacklisting
enhancements
with enough timeout exceptions using the pool the connections still do
not
get reset and all subsequent connections fail as pool is exhausted.


Reply to this email directly or view it on GitHub<

https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479535>.


Reply to this email directly or view it on
GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479589>.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479597.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 17, 2012

Contributor

So for a brief moment all servers will be blacklisted. When the timer runs
it'll re-add to the serverqueue if no exception is thrown when it tries to
open the connection. I modified CassnadraOperationException to mark the
IsHealthy = true, instead of false for a TimedOutException and then set
MaxRetries = 5 in my connection string and re ran the test with random
timeoutexceptions being thrown in Operation::Execute and everything behaved
as expected.

On Sat, Nov 17, 2012 at 5:28 PM, Eric Plowe eric.plowe@gmail.com wrote:

I've simulated throwing random timeout executions and from what I am
seeing I am seeing connections go back from _usedConnections to the free
connection queue. Are you talking about cassandra throwing this exception?

"No connection could be made because all servers have failed."
?? If so, thats because I am marking the Client as not healthy for a timeout, it should actually still technically be healthy, that's an oversight on my part

On Sat, Nov 17, 2012 at 5:11 PM, ilushka85 notifications@github.comwrote:

Don't have an example per say but that's what we see happening in our app
now.

Sent from my iPhone

On Nov 17, 2012, at 4:10 PM, Eric Plowe notifications@github.com
wrote:

Please provide an example.
On Nov 17, 2012 5:05 PM, "ilushka85" notifications@github.com wrote:

@eplowe https://github.com/eplowe using your blacklisting
enhancements
with enough timeout exceptions using the pool the connections still do
not
get reset and all subsequent connections fail as pool is exhausted.


Reply to this email directly or view it on GitHub<

https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479535>.


Reply to this email directly or view it on
GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479589>.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479597.

Contributor

eplowe commented Nov 17, 2012

So for a brief moment all servers will be blacklisted. When the timer runs
it'll re-add to the serverqueue if no exception is thrown when it tries to
open the connection. I modified CassnadraOperationException to mark the
IsHealthy = true, instead of false for a TimedOutException and then set
MaxRetries = 5 in my connection string and re ran the test with random
timeoutexceptions being thrown in Operation::Execute and everything behaved
as expected.

On Sat, Nov 17, 2012 at 5:28 PM, Eric Plowe eric.plowe@gmail.com wrote:

I've simulated throwing random timeout executions and from what I am
seeing I am seeing connections go back from _usedConnections to the free
connection queue. Are you talking about cassandra throwing this exception?

"No connection could be made because all servers have failed."
?? If so, thats because I am marking the Client as not healthy for a timeout, it should actually still technically be healthy, that's an oversight on my part

On Sat, Nov 17, 2012 at 5:11 PM, ilushka85 notifications@github.comwrote:

Don't have an example per say but that's what we see happening in our app
now.

Sent from my iPhone

On Nov 17, 2012, at 4:10 PM, Eric Plowe notifications@github.com
wrote:

Please provide an example.
On Nov 17, 2012 5:05 PM, "ilushka85" notifications@github.com wrote:

@eplowe https://github.com/eplowe using your blacklisting
enhancements
with enough timeout exceptions using the pool the connections still do
not
get reset and all subsequent connections fail as pool is exhausted.


Reply to this email directly or view it on GitHub<

https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479535>.


Reply to this email directly or view it on
GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479589>.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479597.

@ilushka85

This comment has been minimized.

Show comment Hide comment
@ilushka85

ilushka85 Nov 20, 2012

@eplowe I have not been able to reproduce since we corrected what was causing the timeout issue in the first place but will continue to test it.

We have found a bug in cassandra with secondary indexes just so you are aware that we reported to datastax who is working on a solution. Basically timeouts occur if you rely on secondary indexes and perform deletes on the column with the secondary index. If you delete enough of them a timeout will occur during querys against the secondary index while it skips over tombstones as they still exist in the secondary index. this is what lead to our timeouts.

In any case.... Were you able to make any more progress on your changes with auto discovery etc?

@eplowe I have not been able to reproduce since we corrected what was causing the timeout issue in the first place but will continue to test it.

We have found a bug in cassandra with secondary indexes just so you are aware that we reported to datastax who is working on a solution. Basically timeouts occur if you rely on secondary indexes and perform deletes on the column with the secondary index. If you delete enough of them a timeout will occur during querys against the secondary index while it skips over tombstones as they still exist in the secondary index. this is what lead to our timeouts.

In any case.... Were you able to make any more progress on your changes with auto discovery etc?

@nberardi

This comment has been minimized.

Show comment Hide comment
@nberardi

nberardi Nov 20, 2012

Contributor

Any chance we can get a patch? Was hoping to wrap up the next release before Thanksgiving.

Contributor

nberardi commented Nov 20, 2012

Any chance we can get a patch? Was hoping to wrap up the next release before Thanksgiving.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 20, 2012

Contributor

I'll push the fix to mark TimedOutException::IsClientHealthy as true vs
false shortly. As for the auto discovery that will not be wrapped up before
the holidays as its a rather big change that I'd like to put more time into
it before pushing the branch for review.
On Nov 20, 2012 9:11 AM, "Nick Berardi" notifications@github.com wrote:

Any chance we can get a patch? Was hoping to wrap up the next release
before Thanksgiving.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10555661.

Contributor

eplowe commented Nov 20, 2012

I'll push the fix to mark TimedOutException::IsClientHealthy as true vs
false shortly. As for the auto discovery that will not be wrapped up before
the holidays as its a rather big change that I'd like to put more time into
it before pushing the branch for review.
On Nov 20, 2012 9:11 AM, "Nick Berardi" notifications@github.com wrote:

Any chance we can get a patch? Was hoping to wrap up the next release
before Thanksgiving.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10555661.

@nberardi

This comment has been minimized.

Show comment Hide comment
@nberardi

nberardi Nov 20, 2012

Contributor

@eplowe thanks for the TimedOutException

RE: auto discovery, this functionality should be moved to another connection manager, since for the time being we still need to support the current methodology of server lists that are provided. Right now I am favoring if in the connection string the server is just a server:* then we will use auto discovery, anything else will be the old method. How does that sound?

Contributor

nberardi commented Nov 20, 2012

@eplowe thanks for the TimedOutException

RE: auto discovery, this functionality should be moved to another connection manager, since for the time being we still need to support the current methodology of server lists that are provided. Right now I am favoring if in the connection string the server is just a server:* then we will use auto discovery, anything else will be the old method. How does that sound?

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 20, 2012

Contributor

Agreed. You still do need to provide at least one server because you need
something to connect to run describe ring against. I've also added auto
discover as a connection string/builder property as well as the ability to
define how often it should attempt to auto discover.
On Nov 20, 2012 2:36 PM, "Nick Berardi" notifications@github.com wrote:

@eplowe https://github.com/eplowe thanks for the TimedOutException

RE: auto discovery, this functionality should be moved to another
connection manager, since for the time being we still need to support the
current methodology of server lists that are provided. Right now I am
favoring if in the connection string the server is just a server:* then
we will use auto discovery, anything else will be the old method. How does
that sound?


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10569330.

Contributor

eplowe commented Nov 20, 2012

Agreed. You still do need to provide at least one server because you need
something to connect to run describe ring against. I've also added auto
discover as a connection string/builder property as well as the ability to
define how often it should attempt to auto discover.
On Nov 20, 2012 2:36 PM, "Nick Berardi" notifications@github.com wrote:

@eplowe https://github.com/eplowe thanks for the TimedOutException

RE: auto discovery, this functionality should be moved to another
connection manager, since for the time being we still need to support the
current methodology of server lists that are provided. Right now I am
favoring if in the connection string the server is just a server:* then
we will use auto discovery, anything else will be the old method. How does
that sound?


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10569330.

@nberardi

This comment has been minimized.

Show comment Hide comment
@nberardi

nberardi Nov 20, 2012

Contributor

You are right, I like your method better. I think auto discover should be turned on by default, what is your thoughts?

Contributor

nberardi commented Nov 20, 2012

You are right, I like your method better. I think auto discover should be turned on by default, what is your thoughts?

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 20, 2012

Contributor

That's the route I am going. Also, to give you a heads up I am removing the
concept of normal vs pool provider and moving to a more generalized manager
concept that will manage separate pools of connections per host instead of
one pool spread out for n servers
On Nov 20, 2012 2:56 PM, "Nick Berardi" notifications@github.com wrote:

You are right, I like your method better. I think auto discover should be
turned on by default, what is your thoughts?


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10570168.

Contributor

eplowe commented Nov 20, 2012

That's the route I am going. Also, to give you a heads up I am removing the
concept of normal vs pool provider and moving to a more generalized manager
concept that will manage separate pools of connections per host instead of
one pool spread out for n servers
On Nov 20, 2012 2:56 PM, "Nick Berardi" notifications@github.com wrote:

You are right, I like your method better. I think auto discover should be
turned on by default, what is your thoughts?


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10570168.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 20, 2012

Contributor

The change for marking TimedOutException as IsClientHealthy = true instead
of false has been committed and pushed.

On Tue, Nov 20, 2012 at 3:03 PM, Eric Plowe eric.plowe@gmail.com wrote:

That's the route I am going. Also, to give you a heads up I am removing
the concept of normal vs pool provider and moving to a more generalized
manager concept that will manage separate pools of connections per host
instead of one pool spread out for n servers
On Nov 20, 2012 2:56 PM, "Nick Berardi" notifications@github.com wrote:

You are right, I like your method better. I think auto discover should be
turned on by default, what is your thoughts?


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10570168.

Contributor

eplowe commented Nov 20, 2012

The change for marking TimedOutException as IsClientHealthy = true instead
of false has been committed and pushed.

On Tue, Nov 20, 2012 at 3:03 PM, Eric Plowe eric.plowe@gmail.com wrote:

That's the route I am going. Also, to give you a heads up I am removing
the concept of normal vs pool provider and moving to a more generalized
manager concept that will manage separate pools of connections per host
instead of one pool spread out for n servers
On Nov 20, 2012 2:56 PM, "Nick Berardi" notifications@github.com wrote:

You are right, I like your method better. I think auto discover should be
turned on by default, what is your thoughts?


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10570168.

@nberardi

This comment has been minimized.

Show comment Hide comment
@nberardi

nberardi Nov 20, 2012

Contributor

Did you send a pull request?

Contributor

nberardi commented Nov 20, 2012

Did you send a pull request?

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 20, 2012

Contributor

There you go. Also let me know what you decide to do with the connection
pooling/enumerators/lazy loading issue.

On Tue, Nov 20, 2012 at 4:03 PM, Nick Berardi notifications@github.comwrote:

Did you send a pull request?


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10572899.

Contributor

eplowe commented Nov 20, 2012

There you go. Also let me know what you decide to do with the connection
pooling/enumerators/lazy loading issue.

On Tue, Nov 20, 2012 at 4:03 PM, Nick Berardi notifications@github.comwrote:

Did you send a pull request?


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10572899.

@nberardi

This comment has been minimized.

Show comment Hide comment
@nberardi

nberardi Nov 20, 2012

Contributor

Thanks for the commit from issue #88, I think that closes the issue for now. At least until we open the code back up for #29, regarding auto searching of servers.

Contributor

nberardi commented Nov 20, 2012

Thanks for the commit from issue #88, I think that closes the issue for now. At least until we open the code back up for #29, regarding auto searching of servers.

@nberardi nberardi closed this Nov 20, 2012

@nberardi

This comment has been minimized.

Show comment Hide comment
@nberardi

nberardi Nov 20, 2012

Contributor

Enumerator/lazy should be switched to lists since it is an RPC call.
Pooling we will wait and see how the removal of lazy commands effect it. I
would like to keep those commits separate so test they can be evaluated
individually.

Nick Berardi
(484) 302-0125
Sent from my tablet.

On Nov 20, 2012, at 4:07 PM, Eric Plowe notifications@github.com wrote:

There you go. Also let me know what you decide to do with the connection
pooling/enumerators/lazy loading issue.

On Tue, Nov 20, 2012 at 4:03 PM, Nick Berardi notifications@github.comwrote:

Did you send a pull request?


Reply to this email directly or view it on GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10572899>.


Reply to this email directly or view it on
GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10573079.

Contributor

nberardi commented Nov 20, 2012

Enumerator/lazy should be switched to lists since it is an RPC call.
Pooling we will wait and see how the removal of lazy commands effect it. I
would like to keep those commits separate so test they can be evaluated
individually.

Nick Berardi
(484) 302-0125
Sent from my tablet.

On Nov 20, 2012, at 4:07 PM, Eric Plowe notifications@github.com wrote:

There you go. Also let me know what you decide to do with the connection
pooling/enumerators/lazy loading issue.

On Tue, Nov 20, 2012 at 4:03 PM, Nick Berardi notifications@github.comwrote:

Did you send a pull request?


Reply to this email directly or view it on GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10572899>.


Reply to this email directly or view it on
GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10573079.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 20, 2012

Contributor

Just looked at the refactoring -- You should stop the timer when you try to recover, and the recovery time IMO is too long. You could have a situation where you're having some network hiccups and given the amount of traffic to your application you blacklist all your servers and now you have 30 seconds until we attempt to check if they are back up or not. 30 seconds when you're getting getting hammered with traffic feels like years.

Contributor

eplowe commented Nov 20, 2012

Just looked at the refactoring -- You should stop the timer when you try to recover, and the recovery time IMO is too long. You could have a situation where you're having some network hiccups and given the amount of traffic to your application you blacklist all your servers and now you have 30 seconds until we attempt to check if they are back up or not. 30 seconds when you're getting getting hammered with traffic feels like years.

@nberardi

This comment has been minimized.

Show comment Hide comment
@nberardi

nberardi Nov 20, 2012

Contributor

You do realize you weren't actually starting or stopping anything? You were
just setting trigger times in the endless loop that runs in the timer
thread, mine essentially does the same thing but uses a random time between
0 and 30 seconds.

A better one to use for servers according to MSDN is Timers.Timer. But I
was lazy and didn't want to update the code too much.

http://msdn.microsoft.com/en-us/library/system.timers.timer.aspx

Nick Berardi
(484) 302-0125
Sent from my tablet.

On Nov 20, 2012, at 6:41 PM, Eric Plowe notifications@github.com wrote:

Just looked at the refactoring -- You should stop the timer when you try to
recover, and the recovery time IMO is too long. You could have a situation
where you're having some network hiccups and given the amount of traffic to
your application you blacklist all your servers and now you have 30 seconds
until we attempt to check if they are back up or not. 30 seconds when
you're getting getting hammered with traffic feels like years.


Reply to this email directly or view it on
GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10579487.

Contributor

nberardi commented Nov 20, 2012

You do realize you weren't actually starting or stopping anything? You were
just setting trigger times in the endless loop that runs in the timer
thread, mine essentially does the same thing but uses a random time between
0 and 30 seconds.

A better one to use for servers according to MSDN is Timers.Timer. But I
was lazy and didn't want to update the code too much.

http://msdn.microsoft.com/en-us/library/system.timers.timer.aspx

Nick Berardi
(484) 302-0125
Sent from my tablet.

On Nov 20, 2012, at 6:41 PM, Eric Plowe notifications@github.com wrote:

Just looked at the refactoring -- You should stop the timer when you try to
recover, and the recovery time IMO is too long. You could have a situation
where you're having some network hiccups and given the amount of traffic to
your application you blacklist all your servers and now you have 30 seconds
until we attempt to check if they are back up or not. 30 seconds when
you're getting getting hammered with traffic feels like years.


Reply to this email directly or view it on
GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10579487.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 21, 2012

Contributor

http://msdn.microsoft.com/en-us/library/yz1c7148%28v=vs.80%29.aspx

Timer.Change Method (Int32, Int32)

dueTime

The amount of time to delay before the invoking the callback method specified when the Timer was constructed, in milliseconds. Specify Timeout.Infinite to prevent the timer from restarting. Specify zero (0) to restart the timer immediately. 

period

The time interval between invocations of the callback method specified when the Timer was constructed, in milliseconds. Specify Timeout.Infinite to disable periodic signaling. 

Also, there is a slight bug -- the reason I was passing the two states on the exception was to know what to do based on the exception. But you are doing things like blacklisting a connection and retrying for TimedOutExceptions -- which is wrong, a timeout doesn't necessarily indicate an unhealthy client. Just that the particular command timed out. You're also not taking into account IOException and NotFoundException -- which are important. That is a pretty easy fix.

Contributor

eplowe commented Nov 21, 2012

http://msdn.microsoft.com/en-us/library/yz1c7148%28v=vs.80%29.aspx

Timer.Change Method (Int32, Int32)

dueTime

The amount of time to delay before the invoking the callback method specified when the Timer was constructed, in milliseconds. Specify Timeout.Infinite to prevent the timer from restarting. Specify zero (0) to restart the timer immediately. 

period

The time interval between invocations of the callback method specified when the Timer was constructed, in milliseconds. Specify Timeout.Infinite to disable periodic signaling. 

Also, there is a slight bug -- the reason I was passing the two states on the exception was to know what to do based on the exception. But you are doing things like blacklisting a connection and retrying for TimedOutExceptions -- which is wrong, a timeout doesn't necessarily indicate an unhealthy client. Just that the particular command timed out. You're also not taking into account IOException and NotFoundException -- which are important. That is a pretty easy fix.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 21, 2012

Contributor

Also, I just noticed this as well, and when we do remove a server from the blacklist we need to add it back to _serverQueue.

Contributor

eplowe commented Nov 21, 2012

Also, I just noticed this as well, and when we do remove a server from the blacklist we need to add it back to _serverQueue.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 21, 2012

Contributor

And sorry if I diverged from your style I'll make sure future changes
adhere! With that said thanks for the comments and insight. So far its been
great working with you.
On Nov 20, 2012 6:54 PM, "Nick Berardi" notifications@github.com wrote:

You do realize you weren't actually starting or stopping anything? You
were
just setting trigger times in the endless loop that runs in the timer
thread, mine essentially does the same thing but uses a random time
between
0 and 30 seconds.

A better one to use for servers according to MSDN is Timers.Timer. But I
was lazy and didn't want to update the code too much.

http://msdn.microsoft.com/en-us/library/system.timers.timer.aspx

Nick Berardi
(484) 302-0125
Sent from my tablet.

On Nov 20, 2012, at 6:41 PM, Eric Plowe notifications@github.com wrote:

Just looked at the refactoring -- You should stop the timer when you try
to
recover, and the recovery time IMO is too long. You could have a situation
where you're having some network hiccups and given the amount of traffic
to
your application you blacklist all your servers and now you have 30
seconds
until we attempt to check if they are back up or not. 30 seconds when
you're getting getting hammered with traffic feels like years.


Reply to this email directly or view it on
GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10579487>.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10579912.

Contributor

eplowe commented Nov 21, 2012

And sorry if I diverged from your style I'll make sure future changes
adhere! With that said thanks for the comments and insight. So far its been
great working with you.
On Nov 20, 2012 6:54 PM, "Nick Berardi" notifications@github.com wrote:

You do realize you weren't actually starting or stopping anything? You
were
just setting trigger times in the endless loop that runs in the timer
thread, mine essentially does the same thing but uses a random time
between
0 and 30 seconds.

A better one to use for servers according to MSDN is Timers.Timer. But I
was lazy and didn't want to update the code too much.

http://msdn.microsoft.com/en-us/library/system.timers.timer.aspx

Nick Berardi
(484) 302-0125
Sent from my tablet.

On Nov 20, 2012, at 6:41 PM, Eric Plowe notifications@github.com wrote:

Just looked at the refactoring -- You should stop the timer when you try
to
recover, and the recovery time IMO is too long. You could have a situation
where you're having some network hiccups and given the amount of traffic
to
your application you blacklist all your servers and now you have 30
seconds
until we attempt to check if they are back up or not. 30 seconds when
you're getting getting hammered with traffic feels like years.


Reply to this email directly or view it on
GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10579487>.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10579912.

@nberardi

This comment has been minimized.

Show comment Hide comment
@nberardi

nberardi Nov 21, 2012

Contributor

I know what they say, but how do you think it actually work under the covers? It is essentially this.

while(true) {
    if (TickCount == CallbackTickCount)
        Callback();
}

I do realize this is an over simplification, but essentially this is how all timers work. Also on MSDN they do make this note on http://msdn.microsoft.com/en-us/library/system.threading.timer.aspx

System.Threading.Timer is a simple, lightweight timer that uses callback methods and is served by thread pool threads. It is not recommended for use with Windows Forms, because its callbacks do not occur on the user interface thread. System.Windows.Forms.Timer is a better choice for use with Windows Forms. For server-based timer functionality, you might consider using System.Timers.Timer, which raises events and has additional features.

Also here is the difference between System.Timers.Timer and System.Threading.Timer

http://stackoverflow.com/questions/1416803/system-timers-timer-vs-system-threading-timer

RE: IOException and NotFoundException

I didn't actually see you implement them anywhere, so I assumed it was a mistake.

https://github.com/eplowe/fluentcassandra/blob/de1be9790a4d27c77a98a692e10916d2be16625e/src/Operations/Operation.cs

Also it has been good to work with you, I just like to maintain some sort of order so the codebase doesn't get overrun with personal one-off changes that only benefit one entity.

Contributor

nberardi commented Nov 21, 2012

I know what they say, but how do you think it actually work under the covers? It is essentially this.

while(true) {
    if (TickCount == CallbackTickCount)
        Callback();
}

I do realize this is an over simplification, but essentially this is how all timers work. Also on MSDN they do make this note on http://msdn.microsoft.com/en-us/library/system.threading.timer.aspx

System.Threading.Timer is a simple, lightweight timer that uses callback methods and is served by thread pool threads. It is not recommended for use with Windows Forms, because its callbacks do not occur on the user interface thread. System.Windows.Forms.Timer is a better choice for use with Windows Forms. For server-based timer functionality, you might consider using System.Timers.Timer, which raises events and has additional features.

Also here is the difference between System.Timers.Timer and System.Threading.Timer

http://stackoverflow.com/questions/1416803/system-timers-timer-vs-system-threading-timer

RE: IOException and NotFoundException

I didn't actually see you implement them anywhere, so I assumed it was a mistake.

https://github.com/eplowe/fluentcassandra/blob/de1be9790a4d27c77a98a692e10916d2be16625e/src/Operations/Operation.cs

Also it has been good to work with you, I just like to maintain some sort of order so the codebase doesn't get overrun with personal one-off changes that only benefit one entity.

@eplowe

This comment has been minimized.

Show comment Hide comment
@eplowe

eplowe Nov 21, 2012

Contributor

Oh I know. I actually believe they sleep to avoid spinning -- my memory
might be a little hazy from looking at the implementation in reflector. I
took them into account in CassandraOperationException. And I agree a 100%
about keeping the code base inline. I just noticed the issue and needed to
rectify it for my environment but didn't take the time to refactor the
code. Bad discipline on my part.
On Nov 20, 2012 7:27 PM, "Nick Berardi" notifications@github.com wrote:

I know what they say, but how do you think it actually work under the
covers? It is essentially this.

while(true) {
if (TickCount == CallbackTickCount)
Callback();}I do realize this is an over simplification, but essentially this is how all timers work. Also on MSDN they do make this note on http://msdn.microsoft.com/en-us/library/system.threading.timer.aspxSystem.Threading.Timer is a simple, lightweight timer that uses callback methods and is served by thread pool threads. It is not recommended for use with Windows Forms, because its callbacks do not occur on the user interface thread. System.Windows.Forms.Timer is a better choice for use with Windows Forms. For server-based timer functionality, you<
/span> might consider using System.Timers.Timer, which raises events and has additional features.`
Also here is the difference between System.Timers.Timer and System.Threading.Timer
http://stackoverflow.com/questions/1416803/system-timers-timer-vs-system-threading-timer
RE: IOException and NotFoundException
I didn't actually see you implement them anywhere, so I assumed it was a mistake.
https://github.com/eplowe/fluentcassandra/blob/de1be9790a4d27c77a98a692e10916d2be16625e/src/Operations/Operation.cs
Also it has been good to work with you, I just like to maintain some sort of order so the codebase doesn't get overrun with personal one-off changes that only benefit<
/span> one entity.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10580764.

Contributor

eplowe commented Nov 21, 2012

Oh I know. I actually believe they sleep to avoid spinning -- my memory
might be a little hazy from looking at the implementation in reflector. I
took them into account in CassandraOperationException. And I agree a 100%
about keeping the code base inline. I just noticed the issue and needed to
rectify it for my environment but didn't take the time to refactor the
code. Bad discipline on my part.
On Nov 20, 2012 7:27 PM, "Nick Berardi" notifications@github.com wrote:

I know what they say, but how do you think it actually work under the
covers? It is essentially this.

while(true) {
if (TickCount == CallbackTickCount)
Callback();}I do realize this is an over simplification, but essentially this is how all timers work. Also on MSDN they do make this note on http://msdn.microsoft.com/en-us/library/system.threading.timer.aspxSystem.Threading.Timer is a simple, lightweight timer that uses callback methods and is served by thread pool threads. It is not recommended for use with Windows Forms, because its callbacks do not occur on the user interface thread. System.Windows.Forms.Timer is a better choice for use with Windows Forms. For server-based timer functionality, you<
/span> might consider using System.Timers.Timer, which raises events and has additional features.`
Also here is the difference between System.Timers.Timer and System.Threading.Timer
http://stackoverflow.com/questions/1416803/system-timers-timer-vs-system-threading-timer
RE: IOException and NotFoundException
I didn't actually see you implement them anywhere, so I assumed it was a mistake.
https://github.com/eplowe/fluentcassandra/blob/de1be9790a4d27c77a98a692e10916d2be16625e/src/Operations/Operation.cs
Also it has been good to work with you, I just like to maintain some sort of order so the codebase doesn't get overrun with personal one-off changes that only benefit<
/span> one entity.


Reply to this email directly or view it on GitHubhttps://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10580764.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment