wait-free buffer #75

achille-roussel · 2017-10-31T02:59:53Z

This PR changes the approach taken to handle concurrent use of a stats buffer to get rid of all locks and offer better throughput in highly concurrent environments. What motivated this change is an observation that on an high-traffic service, 50% of the time spent in locking/unlocking was on the mutexes in the stats.(*Buffer).HandleMeasures method (see profile attached).

The proposed solution here is to use an array of buffers instead of a single buffer, with a wait-free lock on each buffer that goroutines have to acquire before writing metrics. They don't block on the lock tho so if a buffer is busy they'll just move to try the next one until one becomes available.

The initial chunks to which metrics were serialized is gone as well, because sync.Pool has an internal mutex which happens to hit frequently when GC is configured to be more aggressive.

I'll be testing this in production before merging it but please take a look and let me know if anything should be changed.

…the buffer size in an empty buffer

abraithwaite

This looks great but I'm worried that there might be diminishing returns as things get more busy. Perhaps that suspicion is unfounded in reality but if this is lockless then is there the potential to get high amounts of contention?

I recognize the fact that if you're having contention writing metrics, your problems have to do with the metrics you're writing but it's food for thought.

abraithwaite · 2017-10-31T04:02:17Z

buffer.go

-			} else {
-				b.buffer.append(chunk)
-			}
+		if length != 0 {


I don't understand what this is doing.

If we get length before updating buffer.data on line 53, couldn't we be potentially truncating things off here? I would get truncating to size but I'm sure I'm just missing something.

It prevents from exceeding the buffer size in the common case, when the buffer is full it’ll get flushed and the code loops back to write the measures.
Now that I’m re-reading I realize this code would likely cause an infinite loop since we’ll keep retrying, thanks for pointing it out (seems like this is calling for a test).

I thought the break statement above would take care of not exceeding the buffer size.

Totally missed the infinite loop after write issue though. I think I understood what you were getting at in my head and missed that.

abraithwaite · 2017-10-31T04:08:17Z

buffer.go

-			}
+// Flush satisfies the Flusher interface.
+func (b *Buffer) Flush() {
+	b.init(b.bufferSize())


Is this meant to be here? init looks like it zeros buffers to me.

Maybe that's what you want, but it doesn't sound like a Flush type to me. However, I don't know how this is used as a flusher.

I see, init is a once.Do and we don't want to flush before they're initialized.

A bit confusing to the reader still. Wonder if there's a cleaner way. It's also confusing going between Buffer and buffer types in this file when they share method names.

How about renaming the method initOnce, or prepare?

I think I was more confused by the two init methods more so than the name itself. I wasn't in my editor and went to the other function when reviewing in the browser.

It's just a nit mostly.

achille-roussel · 2017-10-31T05:00:33Z

You’re right, not waiting can potentially result in longer busy-waits with goroutines just looping trying to acquire buffers. In practice this will call for increasing the pool size. I’m not too worried about it, I think the default pool size of 2 x GOMAXPROCS should prevent this kind of scenario from occurring since there should always be a roughly half of the buffers available to be grabbed.
Profiles will tell what what the right approach is, it’s really hard to prove what the best approach is without exercising the code in real world applications (which is why I’ll wait to have results before merging this, the benchmarks in this package show no difference between the different versions for example).

achille-roussel · 2017-10-31T06:36:57Z

Actually I wasn't running the right benchmark, we do see some differences:

benchmark                                                       old ns/op     new ns/op     delta
BenchmarkClient/write_a_batch_of_1_measures_to_a_client-4       171           81.5          -52.34%
BenchmarkClient/write_a_batch_of_10_measures_to_a_client-4      445           385           -13.48%
BenchmarkClient/write_a_batch_of_100_measures_to_a_client-4     3309          3367          +1.75%

In most cases the number of metrics in a batch is somewhere between 1 and 20, so likely we'll get better throughput with this change. While it may be related to contention it doesn't really tell if it improved here because the block profiler gets too much noise from synchronizations within the test package that it doesn't show anything about the mutex in the buffer.

jnjackins

LGTM!

achille-roussel · 2017-11-13T19:21:30Z

Thanks for the reviews guys, I'm still getting this tested in production and it seems to be helping so far, will merge after running it for a little longer.

achille-roussel · 2017-11-15T04:06:56Z

Alright this has been running without issues for a few days, definitely no contention anymore on this code path.

I'll go ahead and merge!

Achille Roussel added 2 commits October 30, 2017 19:50

wait-free buffer

6149b2c

prevent infinite loop when writing metrics that are larger than than …

b7edb7a

…the buffer size in an empty buffer

achille-roussel requested review from jnjackins, f2prateek, Pryz, yields and abraithwaite October 31, 2017 02:59

abraithwaite reviewed Oct 31, 2017

View reviewed changes

Achille Roussel added 2 commits October 30, 2017 23:00

less code, less bugs

b0bd56e

disable logging during benchmarks

be785bf

jnjackins approved these changes Oct 31, 2017

View reviewed changes

abraithwaite approved these changes Nov 1, 2017

View reviewed changes

Pryz approved these changes Nov 6, 2017

View reviewed changes

prevent acquireBuffer from looping forever

2c7fbfa

achille-roussel merged commit b82e0ef into master Nov 15, 2017

achille-roussel deleted the wait-free-buffer branch November 15, 2017 04:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wait-free buffer #75

wait-free buffer #75

achille-roussel commented Oct 31, 2017

abraithwaite left a comment

abraithwaite Oct 31, 2017

achille-roussel Oct 31, 2017

abraithwaite Oct 31, 2017

abraithwaite Oct 31, 2017

abraithwaite Oct 31, 2017

achille-roussel Oct 31, 2017

abraithwaite Oct 31, 2017

achille-roussel commented Oct 31, 2017

achille-roussel commented Oct 31, 2017

jnjackins left a comment

achille-roussel commented Nov 13, 2017

achille-roussel commented Nov 15, 2017

wait-free buffer #75

wait-free buffer #75

Conversation

achille-roussel commented Oct 31, 2017

abraithwaite left a comment

Choose a reason for hiding this comment

abraithwaite Oct 31, 2017

Choose a reason for hiding this comment

achille-roussel Oct 31, 2017

Choose a reason for hiding this comment

abraithwaite Oct 31, 2017

Choose a reason for hiding this comment

abraithwaite Oct 31, 2017

Choose a reason for hiding this comment

abraithwaite Oct 31, 2017

Choose a reason for hiding this comment

achille-roussel Oct 31, 2017

Choose a reason for hiding this comment

abraithwaite Oct 31, 2017

Choose a reason for hiding this comment

achille-roussel commented Oct 31, 2017

achille-roussel commented Oct 31, 2017

jnjackins left a comment

Choose a reason for hiding this comment

achille-roussel commented Nov 13, 2017

achille-roussel commented Nov 15, 2017