Conversation
- added sql connection stat logging - one retry on connection errors - prevent pools from getting removed
) | ||
var stats = nodes.reduce( | ||
function (totals, node) { | ||
totals.errors += node.errorCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will these error counters be reset to zero after a successful connection attempt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is kind of hard to follow but I think its supposed to "recover" back to 0
https://github.com/felixge/node-mysql/blob/master/lib/PoolCluster.js#L148-L160
r+ LGTM |
log.error( { op: 'MySql.getConnection', name: name, err: err } ) | ||
if (retry) { | ||
retry = false | ||
return self.poolCluster.getConnection(name, gotConnection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're just going to try twice and then fail if it doesn't work? If the retry is immediately after the previous attempt, the likelihood is it will also fail - just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on the cause of the error I guess. There may be errors that don't appear until you actually try to use the returned connection, and that we should also retry on (e.g. the server killed an idle connection but it's still in the pool). Perhaps the retry logic should move up to a higher level in a subsequent refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the error. If the db is unreachable reconnecting right away probably won't help. But, in the too many connections case that we've been experiencing, there's a very good chance the retry will get a free connection instead of trying to make a new one. My numbers above help support that claim, at least locally.
We should monitor these errors closely to see what happens and how we can react to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yep, different errors. :) Sounds good.
r+. Passes all test related to MySql locally. |
Some MySQL connection improvements
Setting
removeNodeErrorCount = Infinity
seems better than listening to the 'remove' event because there's a slight gap in service between the pool being removed and re-added where things can get weird or stuck. Our usage of the pools with RDS isn't really what this was meant for, which seems to be direct connections to a cluster of db servers not behind a load balancer.This does a fairly good job of handling connection errors. I ran this in a test environment with mysql set to 4 max connections and the app configured for 20 total against a load test of 8 concurrent clients. There were 2 503 errors returned to the clients, but the connection retry was able to recover from over 10k
ER_CON_COUNT_ERROR
mysql errors (from a run of ~26k requests). Obviously the right thing to do is never set the app connection limit higher than the database, but at least we continue to service requests when we're over.We can still do with a nice refactor of the query logic, transaction handling, etc.