handle stalling in mysql driver #990

lloyd · 2012-01-26T18:49:27Z

We are seeing sporatic mysql driver time out and stalling. The first step to addressing this is to detect the situation and recover. We should implement application level detection of slow queries that output an error in the log but reconnect and continue running if a query takes longer than N seconds.

lloyd · 2012-01-26T19:15:53Z

/cc @vmunix @fetep

So dudes. is this crazy? I'm thinking a very simple way to get TCP connection falldown connection is to stop using the mysql driver's built in query queue completely. Rather apply something that has its own queue, is well tested, and we can easily introduce the ability to cancel jobs if they take too long.

node-compute-cluster doesn't have the right name for this application, but I think it could work very nicely and easily, and also provide us with a trivial way to configure how many simultaneous db connections should be managed in a process. I also think it can help us handle database failover.

my first pass at thinking about this would be to write an abstraction around the mysql driver which exports the very same interface as the driver itself, but sends queries off to child processes for execution, and cancels them (and the process in which they were run) if they fail. logging all the way. ho ho ho.

first reactions?

fetep · 2012-01-31T19:07:43Z

sounds reasonable. we want to make sure cancelling transactions cancels on the server side, too (this should also happen in the case of a client-side query timeout).

lloyd · 2012-01-31T19:21:30Z

[12:05:55] <mmayo> applying full compute-cluster feels too.. brutish?
[12:16:24] <lloyd> I think the best first approach is to try to write a tiny little wrapper around felixge's stuff that does query timing to detect stalls, and keeps the queue of work in the application.

ozten · 2012-02-24T21:01:41Z

I see we have a single connection to mysql per process. This is probably fine, as we have many processes running.

Another way to go is to create a db pool, so that a single process is more resilient to slow queries. Having multiple connection in a pool would allow us to tune the timeout to allow for longer queries since the probability of all connections hitting slow queries is lower under healthy conditions. We'd still be able to kill and reset connections for queries that aren't responding.

Multiple connections should improve throughput, but again has to be balanced with how many processes are hitting the DB. If a single process is handling dozens of concurrent requests which use the DB (out of the hundreds of requests it's servicing), this could be a win. Probably outside the scope of this bug, but definitely a feature for a more general purpose solution.

Doesn't mysql have slow query logging? (I know Postgres does). If so, we should turn that on and aggregate timing to drive Issues for poorly written SQL or missing indexes. This isn't a big deal with BrowserID, since it has a small schema that changes in-frequently.

jbonacci · 2012-03-06T22:58:16Z

QA will test this with OPs in Stage during the 12-03-01 load testing.

jbonacci · 2012-03-08T23:10:13Z

Verified as fixed through load testing and Stage breakage today...

ghost assigned lloyd Jan 26, 2012

lloyd mentioned this issue Feb 27, 2012

update database code and mysql driver to handle stalled connections #1211

Closed

lloyd closed this as completed in 0c74c69 Feb 28, 2012

jbonacci mentioned this issue May 18, 2012

dbwriter did not auto-reconnect to mysql master #1608

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle stalling in mysql driver #990

handle stalling in mysql driver #990

lloyd commented Jan 26, 2012

lloyd commented Jan 26, 2012

fetep commented Jan 31, 2012

lloyd commented Jan 31, 2012

ozten commented Feb 24, 2012

jbonacci commented Mar 6, 2012

jbonacci commented Mar 8, 2012

handle stalling in mysql driver #990

handle stalling in mysql driver #990

Comments

lloyd commented Jan 26, 2012

lloyd commented Jan 26, 2012

fetep commented Jan 31, 2012

lloyd commented Jan 31, 2012

ozten commented Feb 24, 2012

jbonacci commented Mar 6, 2012

jbonacci commented Mar 8, 2012