Skip to content
This repository has been archived by the owner on May 10, 2019. It is now read-only.

handle stalling in mysql driver #990

Closed
lloyd opened this issue Jan 26, 2012 · 6 comments
Closed

handle stalling in mysql driver #990

lloyd opened this issue Jan 26, 2012 · 6 comments

Comments

@lloyd
Copy link
Contributor

lloyd commented Jan 26, 2012

We are seeing sporatic mysql driver time out and stalling. The first step to addressing this is to detect the situation and recover. We should implement application level detection of slow queries that output an error in the log but reconnect and continue running if a query takes longer than N seconds.

@ghost ghost assigned lloyd Jan 26, 2012
@lloyd
Copy link
Contributor Author

lloyd commented Jan 26, 2012

/cc @vmunix @fetep

So dudes. is this crazy? I'm thinking a very simple way to get TCP connection falldown connection is to stop using the mysql driver's built in query queue completely. Rather apply something that has its own queue, is well tested, and we can easily introduce the ability to cancel jobs if they take too long.

node-compute-cluster doesn't have the right name for this application, but I think it could work very nicely and easily, and also provide us with a trivial way to configure how many simultaneous db connections should be managed in a process. I also think it can help us handle database failover.

my first pass at thinking about this would be to write an abstraction around the mysql driver which exports the very same interface as the driver itself, but sends queries off to child processes for execution, and cancels them (and the process in which they were run) if they fail. logging all the way. ho ho ho.

first reactions?

@fetep
Copy link
Contributor

fetep commented Jan 31, 2012

sounds reasonable. we want to make sure cancelling transactions cancels on the server side, too (this should also happen in the case of a client-side query timeout).

@lloyd
Copy link
Contributor Author

lloyd commented Jan 31, 2012

[12:05:55] <mmayo> applying full compute-cluster feels too.. brutish?
[12:16:24] <lloyd> I think the best first approach is to try to write a tiny little wrapper around felixge's stuff that does query timing to detect stalls, and keeps the queue of work in the application.

@ozten
Copy link
Contributor

ozten commented Feb 24, 2012

I see we have a single connection to mysql per process. This is probably fine, as we have many processes running.

Another way to go is to create a db pool, so that a single process is more resilient to slow queries. Having multiple connection in a pool would allow us to tune the timeout to allow for longer queries since the probability of all connections hitting slow queries is lower under healthy conditions. We'd still be able to kill and reset connections for queries that aren't responding.

Multiple connections should improve throughput, but again has to be balanced with how many processes are hitting the DB. If a single process is handling dozens of concurrent requests which use the DB (out of the hundreds of requests it's servicing), this could be a win. Probably outside the scope of this bug, but definitely a feature for a more general purpose solution.

Doesn't mysql have slow query logging? (I know Postgres does). If so, we should turn that on and aggregate timing to drive Issues for poorly written SQL or missing indexes. This isn't a big deal with BrowserID, since it has a small schema that changes in-frequently.

@jbonacci
Copy link
Contributor

jbonacci commented Mar 6, 2012

QA will test this with OPs in Stage during the 12-03-01 load testing.

@jbonacci
Copy link
Contributor

jbonacci commented Mar 8, 2012

Verified as fixed through load testing and Stage breakage today...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants