Avoid growbuffer, take two #197

jech · 2022-05-13T18:50:20Z

Take two of #77. Due to overhead having been removed elsewhere, the results are much more dramatic, especially for GCM.

When I first submitted that, @Sean-Der noticed that this relies on nextConn.write not retaining a reference to the buffer. This remains true of this version. However, relying on write methods not retaining the buffer is standard practice in the Go world, as mentioned here: https://pkg.go.dev/io#Writer. I remain naturally open to other approaches, but I think that the performance improvement is too large to leave on the table, especially on embedded systems with slow memory interconnects (where the results are even more dramatic, as pointed out by @adriancable).

This depends on #196.

name                 old time/op    new time/op    delta
WriteRTP/CTR-100-8      951ns ± 0%     934ns ± 0%   -1.76%  (p=0.029 n=4+4)
WriteRTP/CTR-1400-8    4.64µs ± 0%    4.52µs ± 0%   -2.59%  (p=0.029 n=4+4)
WriteRTP/GCM-100-8      305ns ± 6%     259ns ± 0%  -15.07%  (p=0.029 n=4+4)
WriteRTP/GCM-1000-8     704ns ±12%     460ns ± 1%  -34.72%  (p=0.029 n=4+4)

name                 old speed      new speed      delta
WriteRTP/CTR-100-8    118MB/s ± 0%   120MB/s ± 0%   +1.80%  (p=0.029 n=4+4)
WriteRTP/CTR-1400-8   218MB/s ± 0%   224MB/s ± 0%   +2.65%  (p=0.029 n=4+4)
WriteRTP/GCM-100-8    368MB/s ± 6%   433MB/s ± 0%  +17.54%  (p=0.029 n=4+4)
WriteRTP/GCM-1000-8  1.45GB/s ±11%  2.20GB/s ± 1%  +51.73%  (p=0.029 n=4+4)

name                 old alloc/op   new alloc/op   delta
WriteRTP/CTR-100-8       188B ± 0%       60B ± 0%  -68.09%  (p=0.029 n=4+4)
WriteRTP/CTR-1400-8    1.08kB ± 0%    0.06kB ± 0%  -94.46%  (p=0.029 n=4+4)
WriteRTP/GCM-100-8       144B ± 0%       16B ± 0%  -88.89%  (p=0.029 n=4+4)
WriteRTP/GCM-1000-8    1.17kB ± 0%    0.02kB ± 0%  -98.63%  (p=0.029 n=4+4)

name                 old allocs/op  new allocs/op  delta
WriteRTP/CTR-100-8       5.00 ± 0%      4.00 ± 0%  -20.00%  (p=0.029 n=4+4)
WriteRTP/CTR-1400-8      5.00 ± 0%      4.00 ± 0%  -20.00%  (p=0.029 n=4+4)
WriteRTP/GCM-100-8       2.00 ± 0%      1.00 ± 0%  -50.00%  (p=0.029 n=4+4)
WriteRTP/GCM-1000-8      2.00 ± 0%      1.00 ± 0%  -50.00%  (p=0.029 n=4+4)

Add benchmarks for 1000-byte packets in addition to the existing benchmarks for 100-byte packets.

codecov · 2022-05-13T18:51:36Z

Codecov Report

Merging #197 (704fd0a) into master (16044eb) will increase coverage by 0.18%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #197      +/-   ##
==========================================
+ Coverage   74.97%   75.16%   +0.18%     
==========================================
  Files          17       17              
  Lines        1215     1224       +9     
==========================================
+ Hits          911      920       +9     
  Misses        208      208              
  Partials       96       96

Flag	Coverage Δ
go	`75.16% <100.00%> (+0.18%)`	⬆️
wasm	`74.67% <100.00%> (+0.18%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
session_srtcp.go	`76.85% <100.00%> (+0.58%)`	⬆️
session_srtp.go	`74.16% <100.00%> (+1.35%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16044eb...704fd0a. Read the comment docs.

Use a sync.Pool for buffers used by (*SessionRTP).Write and friends. The pool is global, so its cost can be amortised across multiple sessions.

jech · 2022-05-18T09:45:20Z

@adriancable on an in-order ARMv7 with no hardware crypto (AM335x, Cortex-A8 at 1GHz) the results are less dramatic (due to the time spent in crypto) but still siginificant:

name               old time/op    new time/op    delta
WriteRTP/CTR-100     29.3µs ± 1%    28.6µs ± 1%  -2.31%  (p=0.029 n=4+4)
WriteRTP/CTR-1400     150µs ± 0%     139µs ± 1%  -7.41%  (p=0.029 n=4+4)
WriteRTP/GCM-100     34.5µs ± 0%    33.1µs ± 1%  -3.93%  (p=0.029 n=4+4)
WriteRTP/GCM-1000     215µs ± 1%     200µs ± 0%  -6.67%  (p=0.029 n=4+4)

name               old speed      new speed      delta
WriteRTP/CTR-100   3.82MB/s ± 1%  3.92MB/s ± 1%  +2.42%  (p=0.029 n=4+4)
WriteRTP/CTR-1400  6.73MB/s ± 0%  7.27MB/s ± 1%  +7.95%  (p=0.029 n=4+4)
WriteRTP/GCM-100   3.25MB/s ± 0%  3.38MB/s ± 1%  +4.08%  (p=0.029 n=4+4)
WriteRTP/GCM-1000  4.71MB/s ± 1%  5.05MB/s ± 0%  +7.21%  (p=0.029 n=4+4)

adriancable

I've been running these changes for a few days on ARMv7 and x86 with no issues. The benchmark improvements are impressive. A few % shouldn't be left on the table.

Add benchmarks for large packets

f6c6e96

Add benchmarks for 1000-byte packets in addition to the existing benchmarks for 100-byte packets.

jech mentioned this pull request May 13, 2022

Avoid growbuffer #77

Closed

jech force-pushed the avoid-growbuffer-2 branch from 55d292c to 51d420f Compare May 13, 2022 18:58

jech mentioned this pull request May 13, 2022

Pion issues related to Galène jech/galene#14

Open

19 tasks

jech force-pushed the avoid-growbuffer-2 branch 4 times, most recently from 437cb98 to b35932d Compare May 14, 2022 00:56

Avoid buffer allocation in Write

a6d09d2

Use a sync.Pool for buffers used by (*SessionRTP).Write and friends. The pool is global, so its cost can be amortised across multiple sessions.

jech force-pushed the avoid-growbuffer-2 branch from b35932d to a6d09d2 Compare May 14, 2022 00:59

jech mentioned this pull request May 16, 2022

Fix testRTPReplayProtection and testRTPLifecyleInPlace for GCM #194

Closed

TheBeachMaster mentioned this pull request May 17, 2022

SRTP optimizations to improve CPU utilization livekit/livekit#129

Closed

adriancable approved these changes May 19, 2022

View reviewed changes

Merge branch 'master' into avoid-growbuffer-2

704fd0a

adriancable merged commit 7f52ce5 into pion:master May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid growbuffer, take two #197

Avoid growbuffer, take two #197

jech commented May 13, 2022 •

edited

Loading

codecov bot commented May 13, 2022 •

edited

Loading

jech commented May 18, 2022 •

edited

Loading

adriancable left a comment

Avoid growbuffer, take two #197

Avoid growbuffer, take two #197

Conversation

jech commented May 13, 2022 • edited Loading

codecov bot commented May 13, 2022 • edited Loading

Codecov Report

jech commented May 18, 2022 • edited Loading

adriancable left a comment

Choose a reason for hiding this comment

jech commented May 13, 2022 •

edited

Loading

codecov bot commented May 13, 2022 •

edited

Loading

jech commented May 18, 2022 •

edited

Loading