Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat) add pooled buf allocator #161

Merged
merged 14 commits into from
May 22, 2019
Merged

Conversation

fengjiachun
Copy link
Contributor

@fengjiachun fengjiachun commented May 14, 2019

Motivation:

Predict the allocation capacity of buf to avoid frequent expansion.

Modification:

Add AdaptiveBufAllocator

The allocate size table:
[16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, 8388608, 16777216, 33554432, 67108864, 134217728, 268435456, 536870912, 1073741824]

The code comes from https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/AdaptiveRecvByteBufAllocator.java

Result:

Fixes #158

@fengjiachun
Copy link
Contributor Author

TODO: add benchmark

* (fix) add metric for recyclers

* (fix) add metric for ByteBufferCollector.capacity

* (fix) code format

* (fix) by review comment
@fengjiachun fengjiachun changed the title (feat) add AdaptiveBufAllocator (DONT_MERGE) add AdaptiveBufAllocator May 16, 2019
@fengjiachun
Copy link
Contributor Author

先不要合并,想到了更好的办法

* (fix) zero copy with replicator

* (fix) support zero copy and add benchmark

* (fix) rename field

* (fix) rm zero copy and unnecessary metric
@fengjiachun fengjiachun changed the title (DONT_MERGE) add AdaptiveBufAllocator (feat) add pooled buf allocator May 17, 2019
@fengjiachun
Copy link
Contributor Author

fengjiachun commented May 18, 2019

测试下来采用的最终方案:

  1. ByteBufferCollector 基于 Recyclers 做 pooled,ByteBufferCollector 新增 allocateByRecyclers 接口,alloc 后的 instance 需要调用 recycle 来释放
  2. Recyclers 代码 fork 自 netty 代码,基于 ThreadLocal,但是 recycle 可以被其他 thread 调用(会归还到 alloc thread 的空间内),是线程安全并且不会内存泄露的,对于一个 thread 来说等价于一个 MPSC queue,在 netty 代码里应用比较成熟了,不会有什么问题
  3. 新增了一个 AdaptiveBufAllocator 用于预测 alloc size,但是没在这个场景里应用,因为日志复制时 size 是可提前计算的 ,不过没有删除 AdaptiveBufAllocator,考虑到未来可能应用到其他代码里,比如 snapshot 复制
  4. ZeroByteStringHelper 新增了 concatenate api 用于 ByteString 的 zero copy,本次也没有应用,也是考虑到未来很可能应用与其他场景中没有删除

以下为 benchmark 数据,每次批量复制 256 条日志,每次测试的单条日志大小不同,分别测了 2048、1024、512、256、128、64、16 字节的场景

  • adaptiveAndPooled: 基于 AdaptiveBufAllocator 的 pooled 方式
  • copy: 现有代码
  • pooled :默认第一次分配 1024 个字节的 pooled 方案
  • zeroCopy: zero copy 的方案

其中 zeroCopy 方案是基于 protoBuf 的 RopeByteString,原理参见论文 https://web.archive.org/web/20060202015456/http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf

可以看到,zeroCopy 在总数据量 64K (256 * 256) 的情况下性能才开始接近 pooled,一部分原因是因为 RopeByteString 是个树形结构需要 balance 有一定开销,我尝试改为 list 结构,在我们的场景下性能有所提升但是仍然追不上 pooled,数据就不列出来了, 所以最终还是选择 pooled 方案

* entryCount=256, sizeOfEntry=2048
 * ---------------------------------------------------------------------------
 * Benchmark                                  Mode  Cnt  Score   Error   Units
 * AppendEntriesBenchmark.adaptiveAndPooled  thrpt    3  4.139 ± 2.662  ops/ms
 * AppendEntriesBenchmark.copy               thrpt    3  0.148 ± 0.027  ops/ms
 * AppendEntriesBenchmark.pooled             thrpt    3  3.730 ± 0.355  ops/ms
 * AppendEntriesBenchmark.zeroCopy           thrpt    3  3.069 ± 3.563  ops/ms
 *
 *
 * entryCount=256, sizeOfEntry=1024
 * ---------------------------------------------------------------------------
 * Benchmark                                  Mode  Cnt  Score   Error   Units
 * AppendEntriesBenchmark.adaptiveAndPooled  thrpt    3  8.290 ± 5.438  ops/ms
 * AppendEntriesBenchmark.copy               thrpt    3  0.326 ± 0.137  ops/ms
 * AppendEntriesBenchmark.pooled             thrpt    3  7.559 ± 1.245  ops/ms
 * AppendEntriesBenchmark.zeroCopy           thrpt    3  6.602 ± 0.859  ops/ms
 *
 * entryCount=256, sizeOfEntry=512
 * ---------------------------------------------------------------------------
 *
 * Benchmark                                  Mode  Cnt   Score   Error   Units
 * AppendEntriesBenchmark.adaptiveAndPooled  thrpt    3  14.358 ± 8.622  ops/ms
 * AppendEntriesBenchmark.copy               thrpt    3   1.625 ± 0.058  ops/ms
 * AppendEntriesBenchmark.pooled             thrpt    3  15.332 ± 1.531  ops/ms
 * AppendEntriesBenchmark.zeroCopy           thrpt    3  12.614 ± 5.904  ops/ms
 *
 * entryCount=256, sizeOfEntry=256
 * ---------------------------------------------------------------------------
 * Benchmark                                  Mode  Cnt   Score    Error   Units
 * AppendEntriesBenchmark.adaptiveAndPooled  thrpt    3  32.506 ± 21.961  ops/ms
 * AppendEntriesBenchmark.copy               thrpt    3   6.595 ±  5.772  ops/ms
 * AppendEntriesBenchmark.pooled             thrpt    3  27.847 ± 14.010  ops/ms
 * AppendEntriesBenchmark.zeroCopy           thrpt    3  26.427 ±  5.187  ops/ms
 *
 * entryCount=256, sizeOfEntry=128
 * ---------------------------------------------------------------------------
 * Benchmark                                  Mode  Cnt   Score    Error   Units
 * AppendEntriesBenchmark.adaptiveAndPooled  thrpt    3  60.014 ± 47.206  ops/ms
 * AppendEntriesBenchmark.copy               thrpt    3  22.884 ±  3.286  ops/ms
 * AppendEntriesBenchmark.pooled             thrpt    3  57.373 ±  8.201  ops/ms
 * AppendEntriesBenchmark.zeroCopy           thrpt    3  43.923 ±  7.133  ops/ms
 *
 * entryCount=256, sizeOfEntry=64
 * ---------------------------------------------------------------------------
 * Benchmark                                  Mode  Cnt    Score    Error   Units
 * AppendEntriesBenchmark.adaptiveAndPooled  thrpt    3  114.016 ± 84.874  ops/ms
 * AppendEntriesBenchmark.copy               thrpt    3   71.699 ± 19.016  ops/ms
 * AppendEntriesBenchmark.pooled             thrpt    3  107.714 ±  7.944  ops/ms
 * AppendEntriesBenchmark.zeroCopy           thrpt    3   71.767 ± 14.510  ops/ms
 *
 * entryCount=256, sizeOfEntry=16
 * ---------------------------------------------------------------------------
 * Benchmark                                  Mode  Cnt    Score     Error   Units
 * AppendEntriesBenchmark.adaptiveAndPooled  thrpt    3  285.386 ± 114.361  ops/ms
 * AppendEntriesBenchmark.copy               thrpt    3  243.805 ±  31.725  ops/ms
 * AppendEntriesBenchmark.pooled             thrpt    3  293.779 ±  76.557  ops/ms
 * AppendEntriesBenchmark.zeroCopy           thrpt    3  124.669 ±  32.460  ops/ms

final ByteBufferCollector dateBuffer) {
if (dateBuffer.capacity() >= this.raftOptions.getMaxBodySize()) {
final RecyclableByteBufferList dateBuffer) {
if (dateBuffer.getByteNumber() >= this.raftOptions.getMaxBodySize()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还是 capacity 这个名字更合适点

* -Djraft.max_collector_size_pre_thread, default 256
*/
public static final int MAX_COLLECTOR_SIZE_PRE_THREAD = Integer.parseInt(System.getProperty(
"jraft.max_collector_size_pre_thread", "256"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo? pre_thread


@Override
public boolean recycle() {
// TODO If the size is too large, it should not be reused?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个先写死一个大小吧,比如 4m 之类


private static final int DEFAULT_INITIAL_CAPACITY = 8;

private int byteNumber = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改名为 capacity 吧

if (DEFAULT_MAX_CAPACITY == 0) {
LOG.debug("-Drhea.recyclers.maxCapacity.default: disabled");
if (DEFAULT_MAX_CAPACITY_PER_THREAD == 0) {
LOG.debug("-Djraft.recyclers.maxCapacityPerThread: disabled");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个改动可能要放到 release note

"1024"));

/**
* Default max {@link ByteBufferCollector} size pre thread for recycle, it can be set by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, pre thread

@killme2008
Copy link
Contributor

@fengjiachun 看了下,没有大的问题,但是新增加的这几个类都需要补充下单元测试。

@fengjiachun
Copy link
Contributor Author

@killme2008 恩,我尽快补一下单测

@killme2008 killme2008 merged commit ba43a16 into master May 22, 2019
@killme2008 killme2008 deleted the feat/add_adaptive_allocator branch May 22, 2019 09:39
fengjiachun added a commit that referenced this pull request May 24, 2019
* (feat) add AdaptiveBufAllocator

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (fix) rename method name

* (fix) minor fix

* (fix) add metric for recyclers (#164)

* (fix) add metric for recyclers

* (fix) add metric for ByteBufferCollector.capacity

* (fix) code format

* (fix) by review comment

* feat/zero copy with replicator (#167)

* (fix) zero copy with replicator

* (fix) support zero copy and add benchmark

* (fix) rename field

* (fix) rm zero copy and unnecessary metric

* (fix) by review comment

* (feat) add unit test AdaptiveBufAllocatorTest

* (feat) add unit test RecyclersTest

* (feat) add unit test RecyclableByteBufferListTest

* (feat) add unit test ByteBufferCollectorTest
fengjiachun added a commit that referenced this pull request May 24, 2019
* (feat) add AdaptiveBufAllocator

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (fix) rename method name

* (fix) minor fix

* (fix) add metric for recyclers (#164)

* (fix) add metric for recyclers

* (fix) add metric for ByteBufferCollector.capacity

* (fix) code format

* (fix) by review comment

* feat/zero copy with replicator (#167)

* (fix) zero copy with replicator

* (fix) support zero copy and add benchmark

* (fix) rename field

* (fix) rm zero copy and unnecessary metric

* (fix) by review comment

* (feat) add unit test AdaptiveBufAllocatorTest

* (feat) add unit test RecyclersTest

* (feat) add unit test RecyclableByteBufferListTest

* (feat) add unit test ByteBufferCollectorTest
killme2008 pushed a commit that referenced this pull request Jul 5, 2019
* (feat) add FixedThreadsExecutorGroup #168

* (feat) rename method

* (feat) add MpscSingleThreadExecutor and benchmark #168

* (fix) forget to warmup producers

* (fix) fix some bugs and add unit test

* (fix) add more unit test

* (fix) add more unit test

* (fix) add more unit test

* (fix) add some comments

* (fix) unit test

* (fix) add some comments

* (fix) refactoring Utils class

* (fix) refactoring Utils class

* (fix) jraft.closure.threadpool.size.max update default value

* (fix) fix unit test

* (fix) fix unit test

* (feat) refactor ThreadId and replicator (#169)

* (feat) refactor ThreadId and replicator

* (feat) Adds javadoc

* (feat) add pooled buf allocator (#161)

* (feat) add AdaptiveBufAllocator

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (fix) rename method name

* (fix) minor fix

* (fix) add metric for recyclers (#164)

* (fix) add metric for recyclers

* (fix) add metric for ByteBufferCollector.capacity

* (fix) code format

* (fix) by review comment

* feat/zero copy with replicator (#167)

* (fix) zero copy with replicator

* (fix) support zero copy and add benchmark

* (fix) rename field

* (fix) rm zero copy and unnecessary metric

* (fix) by review comment

* (feat) add unit test AdaptiveBufAllocatorTest

* (feat) add unit test RecyclersTest

* (feat) add unit test RecyclableByteBufferListTest

* (feat) add unit test ByteBufferCollectorTest

* Add unit tests for com.alipay.sofa.jraft.util.BytesUtil (#166)

These tests were written using Diffblue Cover.

* (fix) Utils.java format

* (feat) add FixedThreadsExecutorGroup #168

* (feat) rename method

* (feat) add MpscSingleThreadExecutor and benchmark #168

* (fix) forget to warmup producers

* (fix) fix some bugs and add unit test

* (fix) add more unit test

* (fix) add more unit test

* (fix) add more unit test

* (fix) add some comments

* (fix) unit test

* (fix) add some comments

* (fix) refactoring Utils class

* (fix) refactoring Utils class

* (fix) jraft.closure.threadpool.size.max update default value

* (fix) fix unit test

* (fix) fix unit test

* (feat) add pooled buf allocator (#161)

* (feat) add AdaptiveBufAllocator

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (feat) pooled for ByteBufferCollector #158

* (fix) rename method name

* (fix) minor fix

* (fix) add metric for recyclers (#164)

* (fix) add metric for recyclers

* (fix) add metric for ByteBufferCollector.capacity

* (fix) code format

* (fix) by review comment

* feat/zero copy with replicator (#167)

* (fix) zero copy with replicator

* (fix) support zero copy and add benchmark

* (fix) rename field

* (fix) rm zero copy and unnecessary metric

* (fix) by review comment

* (feat) add unit test AdaptiveBufAllocatorTest

* (feat) add unit test RecyclersTest

* (feat) add unit test RecyclableByteBufferListTest

* (feat) add unit test ByteBufferCollectorTest

* (fix) Utils.java format

* (fix) fix bad key with executor map

* (fix) bad import

* (fix) fix unit test

* (feat) add mor benchmark

* (fix) code format

* (fix) code format

* (fix) benchmark with jmh

* (fix) benchmark with jmh

* (fix) set common daemon

* (fix) fix unit test

* (fix) should be no radical changes, especially if they are not fully tested.

* (feat) add jctools

* (feat) configure the number of processors #180 (#181)

* (fix) format
@fengjiachun fengjiachun mentioned this pull request Aug 15, 2019
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

大量的byte[]创建引发频发GC,希望能做优化
2 participants