Treat timeout error before pool shutting down error #359

wandenberg · 2015-03-23T03:44:17Z

When the connection_pool gem is not able to return a connection from the pool during the time configured at pool_timeout, it raise a Timeout::Error.
Which is not properly handled and result on an attempt do set the node as down.
Resulting in a invalid state transformed in to ConnectionPool::PoolShuttingDownError exception.

This pull request was done using the script posted by @InvisibleMan at #353.

I also applied this commit to operation_timeout branch

coveralls · 2015-03-23T03:57:41Z

Coverage increased (+0.02%) to 93.77% when pulling 3e43409 on wandenberg:treat_timeout_error_before_pool_shutting_down_error into 7ef2b2c on mongoid:master.

dblock · 2015-03-25T14:45:37Z

@wandenberg Do you believe this permanently fixes the ConnectionPool::PoolShuttingDownError that people are seeing? In which case #353 didn't and CHANGELOG needs to be updated to to say that.

dblock · 2015-03-25T14:47:28Z

@arthurnn Could you please try to review this one in before 2.0.5, reduce the surface of issues that are discussed in #346?

wandenberg · 2015-03-25T14:52:20Z

@dblock I believe yes. It was a not well handled exception that set the node as down and try to close the connections when it was not able to get a connection from the pool, it even used the connection to know if is broken, resulting in some wrong interpretations like the ConnectionPool::PoolShuttingDownError and the could not connect to a primary node in some cases.

dblock · 2015-04-27T17:53:49Z

@arthurnn Bump?

jonhyman · 2015-05-04T14:39:55Z

@arthurnn can you take a look? We just upgraded to Moped 2 in production last night and have been wrecked by this bug so far.

jonhyman · 2015-05-04T15:04:29Z

@wandenberg I just applied this patch to prod, will let you know if we see the connection pool shutdown. We're still getting the could not connect to a primary node every few minutes. I'm still debugging that one, this doesn't seem to have fixed that (at least for us).

jonhyman · 2015-05-07T19:41:34Z

@wandenberg unfortunately even with this patch, this just happened and didn't go away until we restarted services. I guess it's possible that there are other scenarios in which this would happen and your patch fixes a subset of them.

sahin · 2015-05-22T13:36:49Z

+1, @arthurnn any update on this one?

jonhyman · 2015-05-22T14:20:02Z

@sahin

Give this branch a try. We upgraded from 1.5 to 2.0 about 3 weeks ago and have seen absolutely horrible failover handling with Moped 2.0. We finally now can do stepdowns in production without a single error and haven't seen this error anymore. I cherry-picked in various commits from other pulls (such as @wandenberg's) that address this and also added many commits of my own to handle different failure scenarios.

https://github.com/jonhyman/moped/tree/feature/15988-and-logging

It has some extra logging in there that I've been using as we've been doing failover testing, so feel free to fork and remove if you inspect your Moped logs. We've also tested kill -9'ing the primary mongod on this branch and killing a mongos successfully, whereas on 2.0.4 it couldn't handle any of those scenarios.

sahin · 2015-05-22T14:26:20Z

@jonhyman right now, we have a some issues in productions in dozens of server and websites + api that is used by many vendors, movie studios and our apps.

Right now, if any thing happens to a node in the replication , we are getting
No route to host - connect(2) for "20.0.0.16" port 27017
ConnectionPool::PoolShuttingDownError

jonhyman · 2015-05-22T14:28:30Z

Give my branch a try, see if it helps.

deepredsky · 2015-05-23T17:25:25Z

@arthurnn Bump!

deepredsky · 2015-05-24T07:10:52Z

@jonhyman your branch seems to get rid of the pool shutdown error. are you using this in production?

jonhyman · 2015-05-24T19:28:10Z

Yeah we are. And we've done numerous stepdowns in prod without issues with
my branch.

Sent from my mobile device
On May 24, 2015 3:11 AM, "Rajesh Sharma" notifications@github.com wrote:

@jonhyman https://github.com/jonhyman your branch seems to get rid of
the pool shutdown error. are you using this in production?

Reply to this email directly or view it on GitHub
#359 (comment).

…ing a connection from pool

…tion/authorization error

coveralls · 2015-06-03T17:01:09Z

Coverage increased (+0.03%) to 93.92% when pulling 372f22a on wandenberg:treat_timeout_error_before_pool_shutting_down_error into 68923e0 on mongoid:master.

agis · 2015-09-18T08:08:11Z

@jonhyman Hey, are the issues you mention in your comment fixed in 2.0.7 which contains #380? Or do you still use a fork?

jonhyman · 2015-09-18T13:55:20Z

Yeah it should all be fixed in 2.0.7. We're still on my fork because we've stopped putting any resources behind Moped (even if it is just gem update which conceptually should be fine, I'm not going to spend the time testing in staging). We're instead entirely focused on getting to Mongoid 5 and the official driver.

wandenberg mentioned this pull request Mar 23, 2015

Moped is utilizing 'bad' Connections from the connection Pool #346

Open

wandenberg added 2 commits June 3, 2015 10:53

return a Moped::Errors::PoolTimeout when a timeout happens while gett…

11751da

…ing a connection from pool

fix retry for PoolTimeout error

49f65ef

wandenberg force-pushed the treat_timeout_error_before_pool_shutting_down_error branch from 3e43409 to 49f65ef Compare June 3, 2015 13:55

flush connection credentials to force a new login after an authentica…

372f22a

…tion/authorization error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treat timeout error before pool shutting down error #359

Treat timeout error before pool shutting down error #359

wandenberg commented Mar 23, 2015

coveralls commented Mar 23, 2015

dblock commented Mar 25, 2015

dblock commented Mar 25, 2015

wandenberg commented Mar 25, 2015

dblock commented Apr 27, 2015

jonhyman commented May 4, 2015

jonhyman commented May 4, 2015

jonhyman commented May 7, 2015

sahin commented May 22, 2015

jonhyman commented May 22, 2015

sahin commented May 22, 2015

jonhyman commented May 22, 2015

deepredsky commented May 23, 2015

deepredsky commented May 24, 2015

jonhyman commented May 24, 2015

coveralls commented Jun 3, 2015

agis commented Sep 18, 2015

jonhyman commented Sep 18, 2015

Treat timeout error before pool shutting down error #359

Are you sure you want to change the base?

Treat timeout error before pool shutting down error #359

Conversation

wandenberg commented Mar 23, 2015

coveralls commented Mar 23, 2015

dblock commented Mar 25, 2015

dblock commented Mar 25, 2015

wandenberg commented Mar 25, 2015

dblock commented Apr 27, 2015

jonhyman commented May 4, 2015

jonhyman commented May 4, 2015

jonhyman commented May 7, 2015

sahin commented May 22, 2015

jonhyman commented May 22, 2015

sahin commented May 22, 2015

jonhyman commented May 22, 2015

deepredsky commented May 23, 2015

deepredsky commented May 24, 2015

jonhyman commented May 24, 2015

coveralls commented Jun 3, 2015

agis commented Sep 18, 2015

jonhyman commented Sep 18, 2015