Add signaling support for connection pool waiting #115

KJTsanaktsidis · 2018-02-27T06:13:59Z

The current behaviour when the poolLimit is reached and a new connection
is required is to poll every 100ms to see if there is now headroom to
make a new connection. This adds tremendous latency to the
limit-hit-path.

This commit changes the checkout behaviour to watch on a condition
variable for connections to become available, and the checkin behaviour
to signal this variable. This should allow waiters to use connections
immediately after they become available.

A new parameter is also added to DialInfo, PoolTimeout, which is the
maximum time that clients will wait for connection headroom to become
available. By default this is unlimited.

Full disclosure - I haven't yet actually tried this in our production environment yet. The only testing this has so far is that it passes the regression tests. I'm pull-requesting this early to get some feedback on the design of this change and to see if this would be useful for other people :). I intend to try and get this into production sometime in the next couple of weeks.

szank · 2018-02-27T12:51:52Z

cluster.go

+		if err == errPoolTimeout {
+			// No need to remove servers from the topology if acquiring a socket fails for this reason.
+			return nil, err
+		} else if err != nil {


I think this should be caught by go vet ( or gometalinter, I can't remember which one would complain), but the else if can be replaced by a normal if, as the last statement in the previous if case is return.

Good catch - this should be fixed now

szank · 2018-02-27T13:26:46Z

server.go

+						case <-waitDone:
+						case <-time.After(poolTimeout):
+							// timeoutHit is part of the wait condition, so needs to be changed under mutex.
+							server.Lock()


Also, I might be missing something here, but,
the server lock is locked in line 124.

The waiter on line 158 wait for a signal, but the signal is sent only if a socket is returned to the pool.
The broadcast in line 151 won't be executed before an already used socket is released, as that code path tries to acquire lock that is is held by the parent goroutine (line 124) and releases on Wait() (line 158), no?

This is only after a cursory reading of the code. Please correct me if I am wrong.

the server lock is locked in line 124.

This lock on 148 is in a separate goroutine - there is a new goroutine spawned here to wait for the timeout to expire and signal the waiters if so.

The broadcast in line 151 won't be executed before an already used socket is released, as that code path tries to acquire lock that is is held by the parent goroutine

Perhaps this is the most confusing part - the invocation of Wait on 158 actually releases the server mutex - the CondVar's L member is initialised to the server mutex on line 90, and the Wait method is defined to unlock L, wait for a signal, and then lock L again before returning control to the caller.

So, in the case where a socket is released in time -

The call to AcquireSocket locks the server mutex

The call to AcquireSocket observes that there are no free connections

The call to AcquireSocket unlocks the server mutex and waits for a signal (I believe this is atomic)

The call to RecycleSocket locks the server mutex

The call to RecycleSocket returns a connection to the free list

The call to RecycleSocket broadcasts

The call to AcquireSocket is woken from its Wait, but is still blocked waiting for the server mutex, so Wait does not yet return

The call to RecycleSocket unlocks the server mutex

The call to AcquireSocket locks the server mutex and returns control from Wait

The call to AcquireSocket observes that there are now free connections and continues as usual.

In the case where a socket is not released on time -

The call to AcquireSocket locks the server mutex

The call to AcquireSocket observes that there are no free connections

The call to AcquireSocket unlocks the server mutex and waits for a signal

The timer goroutine hits the time.After timeout

The timer goroutine takes the server mutex

The timer goroutine sets the timeout flag

The timer goroutine broadcasts

The call to AcquireSocket is woken from its Wait, but is still blocked waiting for the server mutex, so Wait does not yet return

The timer goroutine unlocks the server mutex

The call to AcquireSocket locks the server mutex and returns control from Wait

The call to AcquireSocket notices that the timeout flag has been set

The call to AcquireSocket unlocks the mutex and errors out

I mean, this is concurrent code with mutexs and semaphores, so I'm practically guaranteed to be wrong, but this is my understanding of how this works and hopefully this answers your question.

Right. I forgot that the Wait() releases the lock before yielding the goroutine.

szank · 2018-02-27T13:39:46Z

server.go

+	// that there is always a connection available for replset topology discovery). Thus, once
+	// a connection is returned to the pool, _every_ waiter needs to check if the connection count
+	// is underneath their particular value for poolLimit.
+	server.poolWaiter.Broadcast()


Again, I am not 100% sure of my reading of the code ( mgo was never super fun to read, and I haven't had a closer look at it since some time ago, but you take a lock on the server when you try to acquire a socket, and block wait on a signal. The signal won't be raised, if you first need to take a lock on the same mutex, no ?

Before, the pool was unlocked before returning an ErrPoolLimit, but it is not the case anymore.

mgo was never super fun to read

Yyyyyyup......

The signal won't be raised, if you first need to take a lock on the same mutex, no ?

I think my answer above covers this - the acquiring code implicitly releases the mutex in the call to Wait and re-acquires it when Wait returns. So the release code should be able to grab the mutex and send the signal.

Before, the pool was unlocked before returning an ErrPoolLimit, but it is not the case anymore.

I think i'm calling Unlock() on line 167:

} else { if len(server.liveSockets)-len(server.unusedSockets) >= poolLimit { server.Unlock() return nil, false, errPoolLimit } }

Is this not sufficient?

Yes, it is. I'd like to blame the low caffeine level in my blood for raising this issue ;)

KJTsanaktsidis · 2018-02-27T22:01:36Z

Thanks a bunch for your review. I know this kind of code is really complex to reason about so I'm really glad to have another pair of eyes on it! Let me know if the above makes sense or if I just sound crazy...

KJTsanaktsidis · 2018-02-27T23:08:42Z

The test failure in TestStreamsNextNoEventTimeout just on go 1.9 mongo 3.6 looks unrelated...

domodwyer · 2018-03-01T17:39:12Z

Hi @KJTsanaktsidis,

This is a great improvement to the limit code, it seems a lot of people use the pool limit so this will definitely be a big win!

I'll try and get something together to drive this code path properly - all my existing benchmark/verification setup limits concurrency at the application layer - it might take a couple days, I hope you understand.

Thanks for taking the time to improve this - it's really appreciated - I'll get back to you!

Dom

KJTsanaktsidis · 2018-03-04T21:37:49Z

No worries, thanks for your effort picking up this project and maintaining this important piece of infrastructure for Go/Mongo users!

Looks like I got some conflicts from #116 - I'll push up a fix shortly. I'm still hoping to get this into production for our product in the next couple of weeks and give you a report on how it went.

KJTsanaktsidis · 2018-03-23T03:00:59Z

@domodwyer I added a couple of commits to this PR to report metrics around how long goroutines spend waiting for the pool limit. In order to deploy this to production, we want to be able to report pool wait times into a Datadog histogram. Unfortunately, the stats mechanism in mgo is only really good for counters and gauges.

I added in the last commit a mechanism for mgo to send event-based information to the library user over a set of channels; the user of the library can optionally watch these channels and do something with the events (we are sending them to statsd for histogram/percentile estimation).

We kind of need this information in some way to know if the connection limit stuff is going to have an adverse effect, but I'm totally flexible on the API design if you have any better ideas. The channel-solution had the benefit of being low-overhead, optional, and unopinionated about what to do with event data. It also shouldn't be a breaking API change to add extra channels to the struct to publish different kinds of events. I did consider doing histogram bucketing in the stats code, but that raises lots of questions about what buckets to use, what percentiles to export, etc.

szank · 2018-03-23T09:04:58Z

.vscode/settings.json

@@ -0,0 +1,2 @@
+{


I don't think you wanted this file to be added.
Could you please remove it and add .vscode to .gitgnore?

Oops. That's embarrassing :)

szank · 2018-03-23T09:19:21Z

cluster_test.go

+				return nil
+			}
+		}
+		c.Assert(getEvent().WaitTime, Equals, time.Duration(0))


This is a nitpick, but I would check if the result of getEvent() is not nil before accessing the WaitTime.
If all works well that's not a problem, but if in subsequent commits someone breaks the stats by chance, this will panic. And failing unit test is much easier to deal with than a panicking one (imho).

This is fixed; just added an extra NotNil assert before this one

szank · 2018-03-23T09:22:01Z

cluster_test.go

+	case ev := <-statsEvents.OnPoolTimeout:
+		c.Assert(ev.WaitTime, Equals, stats.TotalPoolWaitTime)
+	default:
+		c.Fail()


c.Fatalf("Could not retrieve OnPoolTimeout statsEvent")?

Ah yeah, this should have a message. fixed.

szank · 2018-03-23T09:37:14Z

stats.go

+		case statsEvents.OnSocketAcquired <- eventStruct:
+		default:
+		}
+		statsMutex.Unlock()


Another nitpick:
You could either defer the mutex unlock, or move if before the send on the channel.

The channel send doesn't need the mutex, and there's no particular reason to order the events in the channel according to what order the stats mutex was acquired in (which isn't even guaranteed to be the order mongo saw the connections in) so I moved the unlock above the send.

szank · 2018-03-23T09:54:20Z

stats.go

+func GetStatsEvents() *StatsEvents {
+	return statsEvents
+}
+
 // ResetStats reset Stats to the previous database state
 func ResetStats() {


After the library user calls ResetStats(), GetStatsEvents() needs to be called immediately, at least from my understanding of the intended usage.

Why not return new stats from the ResetStats() func. It can always be ignored.

My understanding is that this would be a breaking change - someone might be passing the ResetStats function somewhere that expects a func() callback type or something, for example.

It might be better to just decouple the StatsEvents thing from the Stats thing entirely, and require an explicit call to SetStatsEvents(bool) or something to turn it on and off (and don't provide a reset function at all - doesn't really seem necessary)

szank · 2018-03-23T10:05:28Z

Hi @KJTsanaktsidis.
I like the metrics idea.
Here is a suggestion though.
What do you think about making the Stats an interface?
Something like

type MgoMetrics interface{
OnSocketAcquired SocketAcquireMetric
OnPoolTimeout SocketAcquireMetric
}

You could modify your current implementation to send data on the chan on every call of the interface,
provide a default NOP implementation if library user doesn't want stats.

Your current implementation could stay as a reference one, it's perfectly valid.
I am suggesting it because:

Setting an arbitrary chan size rubs be the wrong way. Using my suggested approach the user would be able to decide by himself what channel sizes to use.
I don't think that the metrics (while very useful !) are an integral part of the database API. Using the interface approach is IMHO cleaner. The reference metrics implementation could even stay in another package, somewhere under mgo/metrics?
The user would have an easier time integrating the metrics in their own system if they wanted to have a struct gathering and processing metrics from the whole system, that would implement the mgo metrics interface among other things.

KJTsanaktsidis · 2018-03-23T22:33:56Z

I did consider the interface approach - unfortunately, if we did that, adding a new kind of event to it would be a breaking change since people's structs would no longer match the expanded interface. In a similar vein, I considered a struct with a whole bunch of callback-members you could assign, like this:

type StatsEventsCallbacks struct {
    OnSocketAcquired func(time.Duration),
    OnPoolTimeout func(time.Duration),
}

// library user code
callbacks := &StatsEventsCallbacks{
    OnSocketAcquired: func(t time.Duration) {
        // something
    }
}
mgo.SetStatsEventsCallbacks(callbacks)

That way, if new events were added, users could just not fill them in on their structs and so it wouldn't be a breaking change. Ultimately the channel sends just felt more idiomatic go to me, but if you prefer the callback approach I'm more than happy to roll with that instead (and maybe just wrap it in to a set of channels in my application code).

If the only thing that concerns you about the channel approach is the arbitrary channel size, I'm also happy to let the caller specify that in some kind of SetStatsEvents(enabled bool, bufferSize int) call and to separate it out of the SetStats/ResetStats mechanism entirely.

KJTsanaktsidis · 2018-03-23T22:36:40Z

Thanks again for the feedback by the way :)

KJTsanaktsidis · 2018-03-26T04:31:16Z

FYI - I have shipped this change into production for my team today. I will report back any findings in maybe a week or so after it's baked for a bit.

domodwyer · 2018-03-26T08:09:07Z

Hi @KJTsanaktsidis

Again, sorry for disappearing as I was out of the country but seems @szank has been keeping this going along 👍

I'm a little concerned this has grown the scope of the PR a fair bit - I understand you needed it for metrics to ensure the fix worked as expected, but as you correctly point out there's quite a lot of complexity around the existing, and proposed metrics (how to collect, who's job it is to aggregate, if/how to handle metric drops, etc.)

You're totally right though, it's time the stats were improved - I think it might serve the public mgo better to tackle this as a separate PR however, maybe in a different branch till we're happy with the design and get this connection pooling change merged sooner rather than later. I might have an idea for a lockless solution that should be backwards compatible, but I'm currently trying to stave off jet lag by keeping myself busy so might not fit the job ;)

Thanks again for all your hard work and solid solutions.

Dom

KJTsanaktsidis · 2018-03-26T09:36:13Z

Yeah I totally understand the metrics raise some broader design questions about how that should be done. Sounds like I should back that bit out tomorrow; its not a huge patch for us to carry out of tree and it won’t really stop us switching to an upstream solution once that’s baked.

Hopefully this metrics stuff has been a good spike to explore some ideas :)

Other than letting this bake in our production environment for a bit, is there anything else I should do here do you think?

KJTsanaktsidis · 2018-03-26T09:38:21Z

FWIW I’d be keen to hear your thoughts on the metrics when you’ve put them together too :) I’ll keep a look out here!

domodwyer · 2018-03-26T09:43:59Z

No I think the connections bit of the PR is all good! No problems with a merge, I'll run a test during the normal release process and should be good to go for a release.

Regarding the metrics I'd love it if you helped collaborate on the design/implementation/whatever-you-like - I'll open an issue for the improvement design in the coming days - even just a "this would work for us" or a "we also need X" would be most appreciated!

Dom

The current behaviour when the poolLimit is reached and a new connection is required is to poll every 100ms to see if there is now headroom to make a new connection. This adds tremendous latency to the limit-hit-path. This commit changes the checkout behaviour to watch on a condition variable for connections to become available, and the checkin behaviour to signal this variable. This should allow waiters to use connections immediately after they become available. A new parameter is also added to DialInfo, PoolTimeout, which is the maximum time that clients will wait for connection headroom to become available. By default this is unlimited.

I'm using vscode and accidently committed the .vscode directroy; .gitignore this footgun.

This exposes four new counters * The number of times a socket was successfully obtained from the connection pool * The number of times the connection pool needed to be waited on * The total time that has been spent waiting for a conneciton to become available * The number of times socket acquisition failed due to a pool timeout

KJTsanaktsidis · 2018-03-26T22:34:53Z

Alright, I've done some rebasing. I've left in "Add stats for connection pool timings", which simply adds some counters to the existing stats struct, but taken out "send events through a channel", which added the new API surface for receiving individual events. If you prefer I can remove the counters too but they don't add much in the way of new API style so I think they don't make a ton of tech debt and they're useful enough to leave there.

stale

domodwyer · 2018-04-02T15:04:14Z

Honestly, this is great work - thanks @KJTsanaktsidis.

Dom

KJTsanaktsidis · 2018-04-03T03:45:35Z

Thank you for your help!

Just an update on how this is going in our production usage. I had the pool limit kick in a couple of times this week with our app, and it seems that my change had the desired effect - the wait time hit a max of ~3ms, whereas the old code would have had this at at least 100ms due to the sleep-polling. This can be seen in the attached datadog graph (sorry about the redactions); when the app goes to open up a whole bunch of new connections, the limit kicks in and queues everything up for a few ms:

All in all, I'm pretty happy with how this is going. (subsequent to this graph, i turned on maxIdleTimeMs as well, which makes the surge in connections drop off after a couple of minutes - thanks to @gnawux !).

domodwyer · 2018-04-03T09:05:35Z

That's cool! Thanks for sharing (and thanks @gnawux too!).

I'll look at getting our staging environment up and running to check for any performance regressions, but I don't expect to find any - after that I'll cut a release.

Dom

* Add signaling support for connection pool waiting The current behaviour when the poolLimit is reached and a new connection is required is to poll every 100ms to see if there is now headroom to make a new connection. This adds tremendous latency to the limit-hit-path. This commit changes the checkout behaviour to watch on a condition variable for connections to become available, and the checkin behaviour to signal this variable. This should allow waiters to use connections immediately after they become available. A new parameter is also added to DialInfo, PoolTimeout, which is the maximum time that clients will wait for connection headroom to become available. By default this is unlimited. * Add stats for connection pool timings This exposes four new counters * The number of times a socket was successfully obtained from the connection pool * The number of times the connection pool needed to be waited on * The total time that has been spent waiting for a conneciton to become available * The number of times socket acquisition failed due to a pool timeout * Gitignore .vscode directory I'm using vscode and accidently committed the .vscode directroy; .gitignore this footgun.

szank reviewed Feb 27, 2018

View reviewed changes

szank previously approved these changes Feb 28, 2018

View reviewed changes

domodwyer assigned KJTsanaktsidis Mar 1, 2018

domodwyer added enhancement needs stg test labels Mar 1, 2018

domodwyer requested review from brknstrngz, tadukurow, domodwyer, weiishann, csucu and eminano March 1, 2018 17:33

KJTsanaktsidis dismissed szank’s stale review via 2189357 March 4, 2018 21:45

KJTsanaktsidis force-pushed the connection_pool_waiter branch 2 times, most recently from 14c8088 to d63aa50 Compare March 23, 2018 02:36

szank previously requested changes Mar 23, 2018

View reviewed changes

szank reviewed Mar 23, 2018

View reviewed changes

KJTsanaktsidis force-pushed the connection_pool_waiter branch from d63aa50 to cc4cb07 Compare March 23, 2018 22:22

KJTsanaktsidis force-pushed the connection_pool_waiter branch from cc4cb07 to f6983ec Compare March 23, 2018 22:36

KJTsanaktsidis force-pushed the connection_pool_waiter branch from f6983ec to 5ae7f45 Compare March 23, 2018 23:15

KJTsanaktsidis added 3 commits March 27, 2018 09:26

Gitignore .vscode directory

891f21d

I'm using vscode and accidently committed the .vscode directroy; .gitignore this footgun.

KJTsanaktsidis force-pushed the connection_pool_waiter branch from f2125b5 to a58d789 Compare March 26, 2018 22:31

domodwyer approved these changes Apr 2, 2018

View reviewed changes

domodwyer merged commit 76ea203 into globalsign:development Apr 3, 2018

domodwyer mentioned this pull request Apr 23, 2018

Release/r2018.04.23 #152

Merged

domodwyer removed the needs stg test label Apr 23, 2018

Add signaling support for connection pool waiting #115

Add signaling support for connection pool waiting #115

Conversation

KJTsanaktsidis commented Feb 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KJTsanaktsidis commented Feb 27, 2018

KJTsanaktsidis commented Feb 27, 2018

domodwyer commented Mar 1, 2018

KJTsanaktsidis commented Mar 4, 2018

KJTsanaktsidis commented Mar 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szank commented Mar 23, 2018

KJTsanaktsidis commented Mar 23, 2018

KJTsanaktsidis commented Mar 23, 2018

KJTsanaktsidis commented Mar 26, 2018

domodwyer commented Mar 26, 2018

KJTsanaktsidis commented Mar 26, 2018

KJTsanaktsidis commented Mar 26, 2018

domodwyer commented Mar 26, 2018

KJTsanaktsidis commented Mar 26, 2018

domodwyer commented Apr 2, 2018

KJTsanaktsidis commented Apr 3, 2018

domodwyer commented Apr 3, 2018

KJTsanaktsidis commented Mar 23, 2018 •

edited

Loading