Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(bench): introduce bench tool for the file cache system #3889

Merged
merged 16 commits into from Jul 19, 2022

Conversation

MrCroxx
Copy link
Contributor

@MrCroxx MrCroxx commented Jul 14, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

As titled, introduce bench tool for the file cache system.

Quick Startup:

cargo run --release --package risingwave_bench --bin file-cache-bench -- -p /<path>/<to>/<file>/<cache>

Usage:

USAGE:
    file-cache-bench [OPTIONS] --path <PATH>

OPTIONS:
        --bs <BS>                                                  [default: 1048576]
        --cache-file-fallocate-unit <CACHE_FILE_FALLOCATE_UNIT>    [default: 67108864]
        --capacity <CAPACITY>                                      [default: 1073741824]
        --concurrency <CONCURRENCY>                                [default: 8]
    -h, --help                                                     Print help information
        --look-up-range <LOOK_UP_RANGE>                            [default: 10000]
        --loop-min-interval <LOOP_MIN_INTERVAL>                    [default: 0]
    -p, --path <PATH>
        --read <READ>                                              [default: 1]
        --report-interval <REPORT_INTERVAL>                        [default: 1]
        --time <TIME>                                              [default: 600]
        --total-buffer-capacity <TOTAL_BUFFER_CAPACITY>            [default: 134217728]
        --write <WRITE>                                            [default: 1]

Output example:

Total:
disk total iops: 1459.7
disk total throughput: 361.2 MiB/s
disk read iops: 1344.6
disk read throughput: 334.8 MiB/s
disk write iops: 115.1
disk write throughput: 26.4 MiB/s
insert iops: 26.4/s
insert throughput: 26.4 MiB/s
insert lat p50: 4us
insert lat p90: 6us
insert lat p99: 10us
get iops: 343.1/s
get miss: 0.00%
get hit lat p50: 1527us
get hit lat p90: 4159us
get hit lat p99: 5791us
get miss lat p50: 0us
get miss lat p90: 0us
get miss lat p99: 0us
flush iops: 20.3/s
flush throughput: 26.4 MiB/s

To monitor:

  • Disk (partition) io statics where the cache dir locates.
  • CURD critical path latency.
  • Foreground operation statics.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

#198
#3556

@MrCroxx MrCroxx self-assigned this Jul 14, 2022
@github-actions github-actions bot added the component/test Test related issue. label Jul 14, 2022
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 937 files.

Valid Invalid Ignored Fixed
933 3 1 0
Click to see the invalid file list
  • src/bench/file_cache_bench/bench.rs
  • src/bench/file_cache_bench/main.rs
  • src/bench/file_cache_bench/utils.rs

src/bench/file_cache_bench/bench.rs Outdated Show resolved Hide resolved
src/bench/file_cache_bench/main.rs Show resolved Hide resolved
src/bench/file_cache_bench/utils.rs Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jul 14, 2022

Codecov Report

Merging #3889 (33437bb) into main (66e37d8) will decrease coverage by 0.28%.
The diff coverage is 1.08%.

@@            Coverage Diff             @@
##             main    #3889      +/-   ##
==========================================
- Coverage   73.88%   73.60%   -0.29%     
==========================================
  Files         828      832       +4     
  Lines      117044   117500     +456     
==========================================
+ Hits        86481    86487       +6     
- Misses      30563    31013     +450     
Flag Coverage Δ
rust 73.60% <1.08%> (-0.29%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/bench/file_cache_bench/analyze.rs 0.00% <0.00%> (ø)
src/bench/file_cache_bench/bench.rs 0.00% <0.00%> (ø)
src/bench/file_cache_bench/utils.rs 0.00% <0.00%> (ø)
src/bench/file_cache_bench/main.rs 11.11% <11.11%> (ø)
src/storage/src/hummock/file_cache/cache.rs 92.76% <75.00%> (+0.06%) ⬆️
src/storage/src/hummock/file_cache/test_utils.rs 87.35% <100.00%> (ø)
src/meta/src/manager/id.rs 94.94% <0.00%> (-0.57%) ⬇️
src/frontend/src/expr/mod.rs 81.95% <0.00%> (-0.31%) ⬇️
src/common/src/types/ordered_float.rs 29.10% <0.00%> (+0.19%) ⬆️
...c/frontend/src/optimizer/plan_node/logical_scan.rs 88.57% <0.00%> (+1.11%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@MrCroxx MrCroxx mentioned this pull request Jul 14, 2022
6 tasks
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 942 files.

Valid Invalid Ignored Fixed
940 1 1 0
Click to see the invalid file list
  • src/bench/file_cache_bench/analyze.rs

src/bench/file_cache_bench/analyze.rs Show resolved Hide resolved
@MrCroxx MrCroxx marked this pull request as ready for review July 18, 2022 07:59
@MrCroxx MrCroxx requested review from wenym1 and hzxa21 July 18, 2022 08:00
@MrCroxx
Copy link
Contributor Author

MrCroxx commented Jul 18, 2022

Calculating quantiles seems suffer from a performance issue after long running. Investigating the cause.

UPDATE: Fixed with another quantile lib.

@MrCroxx
Copy link
Contributor Author

MrCroxx commented Jul 18, 2022

The get hit latency looks high. I'll run a fio to find the issue.

@MrCroxx
Copy link
Contributor Author

MrCroxx commented Jul 18, 2022

fio result:

fio --name=read --directory=. --numjobs=1 --size=5G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 --verify=0 --bs=1M --iodepth=1 --rw=randread --group_reporting=1

read: (groupid=0, jobs=1): err= 0: pid=41020: Mon Jul 18 09:55:53 2022
  read: IOPS=378, BW=379MiB/s (397MB/s)(22.2GiB/60002msec)
    slat (usec): min=34, max=109, avg=56.19, stdev=10.45
    clat (usec): min=695, max=20243, avg=2578.14, stdev=268.01
     lat (usec): min=734, max=20333, avg=2634.67, stdev=267.41
    clat percentiles (usec):
     |  1.00th=[ 1827],  5.00th=[ 2245], 10.00th=[ 2376], 20.00th=[ 2474],
     | 30.00th=[ 2507], 40.00th=[ 2540], 50.00th=[ 2573], 60.00th=[ 2606],
     | 70.00th=[ 2638], 80.00th=[ 2671], 90.00th=[ 2769], 95.00th=[ 2900],
     | 99.00th=[ 3392], 99.50th=[ 3589], 99.90th=[ 4178], 99.95th=[ 4686],
     | 99.99th=[ 5932]
   bw (  KiB/s): min=385795, max=389120, per=100.00%, avg=387777.72, stdev=866.31, samples=120
   iops        : min=  376, max=  380, avg=378.50, stdev= 0.91, samples=120
  lat (usec)   : 750=0.03%, 1000=0.04%
  lat (msec)   : 2=1.83%, 4=97.94%, 10=0.16%, 50=0.01%
  cpu          : usr=0.62%, sys=3.04%, ctx=22714, majf=0, minf=58
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=22711,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=379MiB/s (397MB/s), 379MiB/s-379MiB/s (397MB/s-397MB/s), io=22.2GiB (23.8GB), run=60002-60002msec

Disk stats (read/write):
  nvme0n1: ios=95380/41, merge=0/20, ticks=155245/80, in_queue=155326, util=99.89%
fio --name=read --directory=. --numjobs=2 --size=5G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 --verify=0 --bs=1M --iodepth=1 --rw=randread --group_reporting=1

read: (groupid=0, jobs=2): err= 0: pid=41056: Mon Jul 18 09:57:29 2022
  read: IOPS=378, BW=379MiB/s (397MB/s)(22.2GiB/60005msec)
    slat (usec): min=32, max=111, avg=57.27, stdev= 8.41
    clat (usec): min=752, max=76481, avg=5218.48, stdev=772.94
     lat (usec): min=788, max=76542, avg=5276.13, stdev=772.70
    clat percentiles (usec):
     |  1.00th=[ 4080],  5.00th=[ 4621], 10.00th=[ 4883], 20.00th=[ 5080],
     | 30.00th=[ 5145], 40.00th=[ 5211], 50.00th=[ 5211], 60.00th=[ 5276],
     | 70.00th=[ 5276], 80.00th=[ 5407], 90.00th=[ 5538], 95.00th=[ 5800],
     | 99.00th=[ 6259], 99.50th=[ 6521], 99.90th=[ 7046], 99.95th=[ 7439],
     | 99.99th=[15533]
   bw (  KiB/s): min=352256, max=422314, per=100.00%, avg=387838.55, stdev=2556.02, samples=239
   iops        : min=  344, max=  412, avg=378.63, stdev= 2.49, samples=239
  lat (usec)   : 1000=0.04%
  lat (msec)   : 2=0.11%, 4=0.68%, 10=99.15%, 20=0.01%, 100=0.01%
  cpu          : usr=0.37%, sys=1.55%, ctx=22717, majf=0, minf=116
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=22712,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=379MiB/s (397MB/s), 379MiB/s-379MiB/s (397MB/s-397MB/s), io=22.2GiB (23.8GB), run=60005-60005msec

Disk stats (read/write):
  nvme0n1: ios=95386/14, merge=0/8, ticks=406737/36, in_queue=406773, util=99.91%
fio --name=read --directory=. --numjobs=4 --size=5G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 --verify=0 --bs=1M --iodepth=1 --rw=randread --group_reporting=1

read: (groupid=0, jobs=4): err= 0: pid=41093: Mon Jul 18 10:00:45 2022
  read: IOPS=378, BW=378MiB/s (397MB/s)(22.2GiB/60008msec)
    slat (usec): min=28, max=194, avg=59.14, stdev=10.35
    clat (usec): min=890, max=96193, avg=10499.62, stdev=1818.22
     lat (usec): min=939, max=96265, avg=10559.21, stdev=1818.58
    clat percentiles (usec):
     |  1.00th=[ 7570],  5.00th=[ 9241], 10.00th=[ 9896], 20.00th=[10290],
     | 30.00th=[10421], 40.00th=[10421], 50.00th=[10552], 60.00th=[10552],
     | 70.00th=[10683], 80.00th=[10814], 90.00th=[11076], 95.00th=[11600],
     | 99.00th=[13304], 99.50th=[13698], 99.90th=[29492], 99.95th=[49021],
     | 99.99th=[76022]
   bw (  KiB/s): min=329728, max=436224, per=100.00%, avg=387865.92, stdev=3544.72, samples=476
   iops        : min=  322, max=  426, avg=378.71, stdev= 3.46, samples=476
  lat (usec)   : 1000=0.02%
  lat (msec)   : 2=0.31%, 4=0.14%, 10=11.64%, 20=87.75%, 50=0.11%
  lat (msec)   : 100=0.04%
  cpu          : usr=0.21%, sys=0.78%, ctx=22718, majf=0, minf=233
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=22712,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=378MiB/s (397MB/s), 378MiB/s-378MiB/s (397MB/s-397MB/s), io=22.2GiB (23.8GB), run=60008-60008msec

Disk stats (read/write):
  nvme0n1: ios=95395/11, merge=0/10, ticks=892291/72, in_queue=892363, util=99.92%
fio --name=read --directory=. --numjobs=8 --size=5G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 --verify=0 --bs=1M --iodepth=1 --rw=randread --group_reporting=1

read: (groupid=0, jobs=8): err= 0: pid=41147: Mon Jul 18 10:03:41 2022
  read: IOPS=378, BW=378MiB/s (397MB/s)(22.2GiB/60015msec)
    slat (usec): min=31, max=3443, avg=101.59, stdev=287.90
    clat (usec): min=2263, max=85737, avg=21024.46, stdev=4245.46
     lat (usec): min=2312, max=85815, avg=21126.54, stdev=4237.47
    clat percentiles (usec):
     |  1.00th=[ 8029],  5.00th=[13173], 10.00th=[15926], 20.00th=[20055],
     | 30.00th=[20841], 40.00th=[20841], 50.00th=[21103], 60.00th=[21103],
     | 70.00th=[21365], 80.00th=[21627], 90.00th=[25822], 95.00th=[28967],
     | 99.00th=[33817], 99.50th=[34341], 99.90th=[40633], 99.95th=[59507],
     | 99.99th=[81265]
   bw (  KiB/s): min=299008, max=411954, per=100.00%, avg=388033.03, stdev=3156.21, samples=952
   iops        : min=  292, max=  402, avg=378.88, stdev= 3.08, samples=952
  lat (msec)   : 4=0.13%, 10=1.24%, 20=18.24%, 50=80.32%, 100=0.07%
  cpu          : usr=0.12%, sys=0.37%, ctx=23223, majf=0, minf=469
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=22714,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=378MiB/s (397MB/s), 378MiB/s-378MiB/s (397MB/s-397MB/s), io=22.2GiB (23.8GB), run=60015-60015msec

Disk stats (read/write):
  nvme0n1: ios=95409/8, merge=0/9, ticks=1882672/98, in_queue=1882770, util=99.97%

@MrCroxx
Copy link
Contributor Author

MrCroxx commented Jul 18, 2022

An observation is that: When the disk pressure is high, the latency is mush higher. But because of the bandwidth limitation, the read throughput is no longer increasing. And the latency can be even higher than S3.

FYI: ScyllaDB has found the disk character before. See: https://www.scylladb.com/2018/04/19/scylla-i-o-scheduler-3/

To solve the problem, IMO, we can introduce an IO scheduler:

  1. If the read workload overwhelms the disk ability, redirect the read requests to S3 to reduce the disk pressure.
  2. Control the IO quota between put and get to reduce coherent influences.

There's not only bad news, because we cannot tell how badly the read/write workload is for the file cache system. Maybe we can simply introduce a threshold for IOs.

How do you think about it? @hzxa21 @wenym1 @Little-Wallace

@MrCroxx
Copy link
Contributor Author

MrCroxx commented Jul 19, 2022

With a limited rate, the latency can be much better:

--default

Total:
disk total iops: 1559.9
disk total throughput: 382.7 MiB/s
disk read iops: 264.8
disk read throughput: 66.1 MiB/s
disk write iops: 1295.1
disk write throughput: 316.6 MiB/s
insert iops: 13043.8/s
insert throughput: 12.7 GiB/s
insert lat p50: 5us
insert lat p90: 9us
insert lat p99: 45us
get iops: 13043.8/s
get miss: 98.34%
get hit lat p50: 2975us
get hit lat p90: 20479us
get hit lat p99: 132095us
get miss lat p50: 4us
get miss lat p90: 7us
get miss lat p99: 707us
flush iops: 5.0/s
flush throughput: 315.2 MiB/s

--loop-min-interval 150 --read 13 --write 1 --concurrency 4

Total:
disk total iops: 1459.7
disk total throughput: 361.2 MiB/s
disk read iops: 1344.6
disk read throughput: 334.8 MiB/s
disk write iops: 115.1
disk write throughput: 26.4 MiB/s
insert iops: 26.4/s
insert throughput: 26.4 MiB/s
insert lat p50: 4us
insert lat p90: 6us
insert lat p99: 10us
get iops: 343.1/s
get miss: 0.00%
get hit lat p50: 1527us
get hit lat p90: 4159us
get hit lat p99: 5791us
get miss lat p50: 0us
get miss lat p90: 0us
get miss lat p99: 0us
flush iops: 20.3/s
flush throughput: 26.4 MiB/s

So IMO the key point is:

  1. limit I/O rate
  2. reads should have higher priority than writes, overwhelming reads can be scheduled directly to S3

@MrCroxx MrCroxx added the mergify/can-merge Indicates that the PR can be added to the merge queue label Jul 19, 2022
@mergify mergify bot merged commit 7a0b2ee into main Jul 19, 2022
@mergify mergify bot deleted the xx/bench-file-cache branch July 19, 2022 04:02
nasnoisaac pushed a commit to nasnoisaac/risingwave that referenced this pull request Aug 9, 2022
…velabs#3889)

* test(bench): introduce bench tool for the file cache system

* add foreground write and flush metrics

* rename metrics

* add more metrics

* fix license header

* fix file early close when benching and fix timeout

* remove unused args

* replace quantile lib

* replace f64 with u64 for lat

* separate get hit and miss lat

* add more options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/test Test related issue. mergify/can-merge Indicates that the PR can be added to the merge queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants