Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor client outbound queues #3733

Merged
merged 1 commit into from
Jan 9, 2023
Merged

Refactor client outbound queues #3733

merged 1 commit into from
Jan 9, 2023

Conversation

neilalexander
Copy link
Member

@neilalexander neilalexander commented Dec 21, 2022

Previously, dynamically-sized buffers were used and discarded often in the client outbound queues. This PR removes the dynamic sizing and instead replaces them with fixed-size coalesced buffers, as well as a sync.Pool to enable efficient buffer reuse. The result is that the server now allocates dramatically less memory when sending data to clients.

The following results are generated by running -bench=BenchmarkJetStreamConsume -benchtime=100000x -memprofile=.

Memory allocations on main (2.9.10) as of this morning:

Screenshot 2022-12-21 at 11 17 36

Screenshot 2022-12-21 at 11 17 27

Memory allocations on this PR:

Screenshot 2022-12-21 at 11 17 53

Screenshot 2022-12-21 at 11 17 43

I don't know if this will make a huge difference to throughput in the usual case, but for servers that are pushing a lot of traffic or handling a large number of concurrent clients, this should reduce the GC burden noticeably and, with any luck, bring load averages down a little bit.

There are a couple TODOs around WebSockets but figured they would be better dealt with separately.

Signed-off-by: Neil Twigg neil@nats.io

/cc @nats-io/core

@neilalexander
Copy link
Member Author

Investigating TestWSCompressionWithPartialWrite failure.

@neilalexander neilalexander force-pushed the neil/clientoutbound branch 2 times, most recently from 7072394 to bc0d67f Compare December 21, 2022 15:24
server/client.go Outdated Show resolved Hide resolved
server/client.go Show resolved Hide resolved
@neilalexander neilalexander force-pushed the neil/clientoutbound branch 2 times, most recently from e18c7c1 to e260af9 Compare December 29, 2022 11:24
@neilalexander
Copy link
Member Author

4K+64K buffers vs main:

benchmark                           old MB/s     new MB/s     speedup
Benchmark______Pub0b_Payload-16     185.50       178.70       0.96x
Benchmark______Pub8b_Payload-16     313.33       303.27       0.97x
Benchmark_____Pub32b_Payload-16     673.18       652.82       0.97x
Benchmark____Pub128B_Payload-16     1807.57      1720.96      0.95x
Benchmark____Pub256B_Payload-16     2937.77      2905.47      0.99x
Benchmark______Pub1K_Payload-16     7438.41      7205.29      0.97x
Benchmark______Pub4K_Payload-16     5349.21      5728.48      1.07x
Benchmark______Pub8K_Payload-16     4083.02      2379.72      0.58x
Benchmark_____Pub32K_Payload-16     2168.19      1626.91      0.75x
Benchmark__AuthPub0b_Payload-16     85.36        84.67        0.99x
Benchmark______FanOut_8x1x10-16     232.15       325.98       1.40x
Benchmark_____FanOut_8x1x100-16     245.85       403.99       1.64x
Benchmark____FanOut_8x10x100-16     138.72       236.03       1.70x
Benchmark___FanOut_8x10x1000-16     361.59       440.56       1.22x
Benchmark___FanOut_8x100x100-16     425.92       385.28       0.90x
Benchmark__FanOut_8x100x1000-16     428.71       422.80       0.99x
Benchmark__FanOut_8x10x10000-16     451.37       419.97       0.93x
Benchmark___FanOut_8x500x100-16     429.13       371.97       0.87x
Benchmark___FanOut_128x1x100-16     464.37       2116.05      4.56x
Benchmark__FanOut_128x10x100-16     509.55       846.38       1.66x
Benchmark_FanOut_128x10x1000-16     954.77       2317.24      2.43x
Benchmark_FanOut_128x100x100-16     1627.18      1897.50      1.17x
BenchmarkFanOut_128x100x1000-16     1816.02      2136.97      1.18x
BenchmarkFanOut_128x10x10000-16     1790.16      2157.75      1.21x
BenchmarkFanOut__128x500x100-16     1720.79      1826.24      1.06x
Benchmark_FanOut_512x100x100-16     3498.85      4873.54      1.39x
Benchmark__FanOut_512x100x1k-16     4091.18      5626.64      1.38x
Benchmark____FanOut_1kx10x1k-16     3822.01      8174.05      2.14x
Benchmark__FanOut_1kx100x100-16     3795.23      7438.82      1.96x
Benchmark_____RFanOut_8x1x10-16     107.98       303.84       2.81x
Benchmark____RFanOut_8x1x100-16     179.57       405.74       2.26x
Benchmark___RFanOut_8x10x100-16     179.14       252.19       1.41x
Benchmark__RFanOut_8x10x1000-16     443.64       453.00       1.02x
Benchmark__RFanOut_8x100x100-16     414.94       388.85       0.94x
Benchmark_RFanOut_8x100x1000-16     432.00       417.43       0.97x
Benchmark_RFanOut_8x10x10000-16     448.99       419.25       0.93x
Benchmark_RFanOut_1kx10x1000-16     3665.21      8164.32      2.23x
Benchmark_____FanIn_1kx100x1-16     3691.06      4420.21      1.20x
Benchmark_____FanIn_4kx100x1-16     5184.43      6001.12      1.16x
Benchmark_____FanIn_8kx100x1-16     5545.94      7457.00      1.34x
Benchmark____FanIn_16kx100x1-16     5594.28      7070.22      1.26x
Benchmark____FanIn_64kx100x1-16     4329.60      5892.91      1.36x
Benchmark___FanIn_128kx100x1-16     4764.75      5648.84      1.19x
Benchmark___BumpPubCount_1x3-16     112.43       294.79       2.62x
Benchmark___BumpPubCount_2x3-16     267.78       434.67       1.62x
Benchmark___BumpPubCount_5x3-16     398.54       121.10       0.30x
Benchmark__BumpPubCount_10x3-16     85.19        355.62       4.17x
Benchmark____GWs_Opt_1kx01x0-16     3645.74      3653.55      1.00x
Benchmark____GWs_Opt_2kx01x0-16     5565.68      5460.11      0.98x
Benchmark____GWs_Opt_4kx01x0-16     5558.51      5508.58      0.99x
Benchmark____GWs_Opt_1kx10x0-16     9385.95      9781.65      1.04x
Benchmark____GWs_Opt_2kx10x0-16     12544.53     12614.63     1.01x
Benchmark____GWs_Opt_4kx10x0-16     11868.34     11857.18     1.00x
Benchmark____GWs_Opt_1kx01x1-16     701.42       5266.21      7.51x
Benchmark____GWs_Opt_2kx01x1-16     918.64       7144.90      7.78x
Benchmark____GWs_Opt_4kx01x1-16     1112.30      8082.00      7.27x
Benchmark____GWs_Opt_1kx10x1-16     868.60       4851.29      5.59x
Benchmark____GWs_Opt_2kx10x1-16     1160.86      6486.34      5.59x
Benchmark____GWs_Opt_4kx10x1-16     1344.30      7582.30      5.64x
Benchmark____GWs_Int_1kx01x0-16     3637.42      3592.07      0.99x
Benchmark____GWs_Int_2kx01x0-16     5359.40      5335.23      1.00x
Benchmark____GWs_Int_4kx01x0-16     5613.72      5842.08      1.04x
Benchmark____GWs_Int_1kx10x0-16     9298.72      9134.23      0.98x
Benchmark____GWs_Int_2kx10x0-16     13618.82     12871.17     0.95x
Benchmark____GWs_Int_4kx10x0-16     11922.15     12748.66     1.07x
Benchmark____GWs_Int_1kx01x1-16     687.65       5290.96      7.69x
Benchmark____GWs_Int_2kx01x1-16     867.25       7448.40      8.59x
Benchmark____GWs_Int_4kx01x1-16     1006.29      8412.82      8.36x
Benchmark____GWs_Int_1kx10x1-16     975.18       5038.44      5.17x
Benchmark____GWs_Int_2kx10x1-16     3365.42      6686.50      1.99x
Benchmark____GWs_Int_4kx10x1-16     4201.79      7518.25      1.79x
Benchmark__GWs_Reqs_1_SubAll-16     224.87       234.15       1.04x
Benchmark__GWs_Reqs_1SubEach-16     38.13        34.25        0.90x

8 byte message:

=== RUN   TestClientOutboundQueueMemory
Message size:       0.0KB
Subscribed clients: 50000
Heap allocs before: 2733.0MB
Heap allocs after:  2925.2MB
Heap allocs delta:  7.0%

48000 byte message:

=== RUN   TestClientOutboundQueueMemory
Message size:       46.9KB
Subscribed clients: 50000
Heap allocs before: 2734.7MB
Heap allocs after:  2871.0MB
Heap allocs delta:  5.0%

@neilalexander neilalexander force-pushed the neil/clientoutbound branch 2 times, most recently from f48254b to 9aa4d4f Compare December 29, 2022 12:07
server/client.go Show resolved Hide resolved
server/client.go Show resolved Hide resolved
server/client.go Outdated Show resolved Hide resolved
server/client.go Show resolved Hide resolved
server/client.go Show resolved Hide resolved
server/client.go Show resolved Hide resolved
server/client.go Outdated Show resolved Hide resolved
server/client.go Outdated Show resolved Hide resolved
server/client.go Show resolved Hide resolved
server/client.go Outdated Show resolved Hide resolved
@neilalexander neilalexander force-pushed the neil/clientoutbound branch 2 times, most recently from cf1eed9 to 2b37beb Compare January 4, 2023 17:40
@neilalexander neilalexander force-pushed the neil/clientoutbound branch 3 times, most recently from 928f520 to 14f3abb Compare January 6, 2023 17:48
@derekcollison derekcollison self-requested a review January 6, 2023 18:54
server/client.go Outdated Show resolved Hide resolved
test/cluster_test.go Outdated Show resolved Hide resolved
@neilalexander neilalexander force-pushed the neil/clientoutbound branch 2 times, most recently from e72f9b7 to 0f7e419 Compare January 6, 2023 19:37
@neilalexander
Copy link
Member Author

@derekcollison Review comments addressed!

@derekcollison
Copy link
Member

Let's squash down to 1 commit..

Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Also try to reduce flakiness of `TestClusterQueueSubs` and `TestCrossAccountServiceResponseTypes`
Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

derekcollison added a commit that referenced this pull request Apr 3, 2023
…dd buffer reuse) (#3965)

This brings #3733 forward from `dev` into `main`, to go into the next
release.

Signed-off-by: Neil Twigg <neil@nats.io>
derekcollison added a commit that referenced this pull request Apr 21, 2023
This extends the previous work in #3733 with the following:

1. Remove buffer coalescing, as this could result in a race condition
during the `writev` syscall in rare circumstances
2. Add a third buffer size, to ensure that we aren't allocating more
than we need to without coalescing
3. Refactor buffer handling in the WebSocket code to reduce allocations
and ensure owned buffers aren't incorrectly being pooled resulting in
further race conditions

Fixes nats-io/nats.ws#194.

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants