Fix possible memory leak due to worker goroutines going out of sync when reporting partial results #117

piodul · 2022-12-29T13:36:20Z

scylla-bench spawns a bunch of worker goroutines that are responsible for executing CQL operations and a single collector goroutine that collects partial results from the workers. Each worker goroutine has its own separate channel that it uses to send partial results to the collector. The collector goroutine, in a loop, fetches a single partial result from each channel, merges them and prints.

There are some problems with this setup:

The channel capacities are very large (10k elements each),
The size of a partial result can be large (each one has a hdrhistogram),
The intervals that the worker use to send partial results are prone to skew and workers might go out of sync.

More specifically, the algorithm that the worker uses looks like this:

last_send_timestamp := now
loop:
	do_an_operation()
	if now - last_send_timestamp >= log_interval: # log_interval == 1s
		send_partial_results()
		last_send_timestamp := now

Notice that when the condition for sending partial results is met, more than one second might have elapsed. Because the loop just resets the last_send_timestamp to the current timestamp, the worker does not try to catch up the lag in any way.

Because the lag is dependent on operation latency - which is random from our POV - the difference between the worker with the lowest lag and the highest lag will keep on increasing. The collector takes a single value from each channel in each round, so it will be bottlenecked on the slowest worker. If the difference in lag between the slowest worker and some other worker is large enough, then the other worker's channel will be non-empty and will always contain a number of partial results proportional to the worker's lag.

Because of the above, the memory consumption may grow over time. In addition to this, results might become inaccurate because the collector will merge current results of the slow workers with outdated results from several seconds ago of the fast workers.

This PR fixes this in a very simple way: instead of resetting last_send_timestamp to the current timestamp, the last_send_timestamp is just advanced by the log interval. This will ensure that the worker goroutines try to keep synchronized with each other.

Might be related to: #112
This is the likely cause, although I don't have a definitive proof. In my one hour test I observed that the memory use looks correlated to the number of items sitting in the channels, and indeed was higher before the fix - but not much higher (by 100MB) and the number of items fluctuated instead of steadily growing. Perhaps a longer test, like in the original issue, would be needed to reproduce this.

scylla-bench spawns a bunch of worker goroutines that are responsible for executing CQL operations and a single collector goroutine that collects partial results from the workers. Each worker goroutine has its own separate channel that it uses to send partial results to the collector. The collector goroutine, in a loop, fetches a single partial result from each channel, merges them and prints. There are some problems with this setup: - The channel capacities are very large (10k elements each), - The size of a partial result can be large (each one has a hdrhistogram), - The intervals that the worker use to send partial results are prone to skew and workers might go out of sync. More specifically, the algorithm that the worker uses looks like this: ``` last_send_timestamp := now loop: do_an_operation() if now - last_send_timestamp >= log_interval: # log_interval == 1s send_partial_results() last_send_timestamp := now ``` Notice that when the condition for sending partial results is met, more than one second might have elapsed. Because the loop just resets the last_send_timestamp to the current timestamp, the worker does not try to catch up the lag in any way. Because the lag is dependent on operation latency - which is random from our POV - the difference between the worker with the lowest lag and the highest lag will keep on increasing. The collector takes a single value from each channel in each round, so it will be bottlenecked on the slowest worker. If the difference in lag between the slowest worker and some other worker is large enough, then the other worker's channel will be non-empty and will always contain a number of partial results proportional to the worker's lag. Because of the above, the memory consumption may grow over time. In addition to this, results might become inaccurate because the collector will merge current results of the slow workers with outdated results from several seconds ago of the fast workers. This commit fixes this in a very simple way: instead of resetting last_send_timestamp to the current timestamp, the last_send_timestamp is just advanced by the log interval. This will ensure that the worker goroutines try to keep synchronized with each other. Might be related to: scylladb#112 This is the likely cause, although I don't have a definitive proof. In my one hour test I observed that the memory use looks correlated to the number of items sitting in the channels, and indeed was higher before the fix - but not much higher (by 100MB) and the number of items fluctuated instead of steadily growing. Perhaps a longer test, like in the original issue, would be needed to reproduce this.

Ref: scylladb/scylla-bench#117

fruch · 2022-12-29T22:42:41Z

@yarongilor

I've create a docker image with this one:
scylladb/scylla-cluster-tests#5616

can you run it with the case that was show the leak ?

yarongilor · 2023-01-01T16:07:39Z

@yarongilor

I've create a docker image with this one: scylladb/scylla-cluster-tests#5616

can you run it with the case that was show the leak ?

Running a test in: https://jenkins.scylladb.com/job/scylla-staging/job/Longevity_yaron/job/longevity-large-partition-4days-test-rq2/22/

It looks like the memory leak is not reproduced after ~ 20 hours -

fruch

LGTM

Ref: scylladb/scylla-bench#117

piodul added 2 commits December 29, 2022 14:22

modes: extract report interval to a constant

7b23a0c

piodul mentioned this pull request Dec 29, 2022

s-b stress with use-order-by=desc causes a memory leak #112

Closed

fruch added a commit to fruch/scylla-cluster-tests that referenced this pull request Dec 29, 2022

test(scylla-bench): version with fix for memory leak

9bbbf96

Ref: scylladb/scylla-bench#117

fruch mentioned this pull request Dec 29, 2022

test(scylla-bench): version with fix for memory leak scylladb/scylla-cluster-tests#5616

Merged

7 tasks

fruch approved these changes Jan 3, 2023

View reviewed changes

fruch merged commit 725efa2 into scylladb:master Jan 3, 2023

yarongilor mentioned this pull request Jan 3, 2023

improvement(reversed queries): Added select-order-by to scylla-bench scylladb/scylla-cluster-tests#5335

Merged

7 tasks

fruch added a commit to fruch/scylla-cluster-tests that referenced this pull request Jan 3, 2023

test(scylla-bench): version with fix for memory leak

31b6417

Ref: scylladb/scylla-bench#117

fruch added a commit to scylladb/scylla-cluster-tests that referenced this pull request Jan 3, 2023

test(scylla-bench): version with fix for memory leak

ba39d49

Ref: scylladb/scylla-bench#117

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix possible memory leak due to worker goroutines going out of sync when reporting partial results #117

Fix possible memory leak due to worker goroutines going out of sync when reporting partial results #117

piodul commented Dec 29, 2022

fruch commented Dec 29, 2022

yarongilor commented Jan 1, 2023 •

edited

Loading

fruch left a comment

Fix possible memory leak due to worker goroutines going out of sync when reporting partial results #117

Fix possible memory leak due to worker goroutines going out of sync when reporting partial results #117

Conversation

piodul commented Dec 29, 2022

fruch commented Dec 29, 2022

yarongilor commented Jan 1, 2023 • edited Loading

fruch left a comment

Choose a reason for hiding this comment

yarongilor commented Jan 1, 2023 •

edited

Loading