[QUESTION] Very disappointing performance, is this expected? #3037

liyimeng · 2021-09-20T13:27:33Z

Question

I run a fio test with longhorn 1.1.2, the result is kind of disappointing. On native ssd, 4K random write could reach IOPS=42.5k, BW=166MiB/s, while on longhorn, only get about IOPS=6k, BW=16MiB/s

Is this expected longhorn performance?

Environment:

Longhorn version: 1.1.2
Kubernetes version: 1.21
Node config
- OS type and version: ubuntu 20.04
- CPU per node: 40
- Memory per node: 512G
- Disk type: sata SSD
- Network bandwidth and latency between the nodes: 10G x 2 with aggregation, with jumbo frame on.
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): bare metal

Additional context
Add any other context about the problem here.

ssd test (not raw disk, already formatted as ext4 and mounted)

fio -direct=1  -iodepth 16 -thread -rw=randwrite -ioengine=libaio  -numjobs=16 -runtime=300 -group_reporting -name=4K-100Write100random -fsync=0
 -bs=4k  -end-fsync=1 -size=2g
4K-100Write100random: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.27
Starting 16 threads
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
4K-100Write100random: Laying out IO file (1 file / 2048MiB)
^Cbs: 16 (f=16): [w(16)][23.7%][w=171MiB/s][w=43.8k IOPS][eta 02m:31s]
fio: terminating on signal 2

4K-100Write100random: (groupid=0, jobs=16): err= 0: pid=79351: Mon Sep 20 13:07:50 2021
  write: IOPS=42.5k, BW=166MiB/s (174MB/s)(7817MiB/47039msec); 0 zone resets
    slat (usec): min=4, max=183526, avg=15.49, stdev=438.11
    clat (usec): min=140, max=286711, avg=6000.97, stdev=2266.19
     lat (usec): min=161, max=286722, avg=6016.56, stdev=2344.46
    clat percentiles (msec):
     |  1.00th=[    6],  5.00th=[    6], 10.00th=[    6], 20.00th=[    6],
     | 30.00th=[    6], 40.00th=[    6], 50.00th=[    6], 60.00th=[    6],
     | 70.00th=[    6], 80.00th=[    6], 90.00th=[    7], 95.00th=[    7],
     | 99.00th=[    7], 99.50th=[   11], 99.90th=[   28], 99.95th=[   38],
     | 99.99th=[  142]
   bw (  KiB/s): min=122128, max=179328, per=100.00%, avg=170262.37, stdev=611.52, samples=1504
   iops        : min=30532, max=44832, avg=42565.56, stdev=152.88, samples=1504
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.03%, 10=99.40%, 20=0.36%, 50=0.17%
  lat (msec)   : 100=0.02%, 250=0.02%, 500=0.01%
  cpu          : usr=0.46%, sys=3.22%, ctx=1278629, majf=0, minf=3097
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2001151,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=166MiB/s (174MB/s), 166MiB/s-166MiB/s (174MB/s-174MB/s), io=7817MiB (8197MB), run=47039-47039msec

Disk stats (read/write):
  sdd: ios=20/1997114, merge=0/85763, ticks=123/11832844, in_queue=8067712, util=99.85%

longhorn with 3 replicas, and directly test again longhorn volume


fio -direct=1 -filename=/dev/longhorn/testssd   -iodepth 16 -thread -rw=randwrite -ioengine=libaio  -numjobs=16 -runtime=300 -group_reporting -name=4K-100Write100random -fsync=1 -bs=4k  -end-fsync=1 -size=2g
4K-100Write100random: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.27
Starting 16 threads
^Cbs: 16 (f=16): [w(16)][15.0%][w=24.3MiB/s][w=6230 IOPS][eta 04m:15s]]
fio: terminating on signal 2
Jobs: 16 (f=0): [f(16)][100.0%][w=24.5MiB/s][w=6263 IOPS][eta 00m:00s]
4K-100Write100random: (groupid=0, jobs=16): err= 0: pid=10354: Mon Sep 20 13:10:33 2021
  write: IOPS=6167, BW=24.1MiB/s (25.3MB/s)(1105MiB/45870msec); 0 zone resets
    slat (nsec): min=1572, max=184830, avg=6255.05, stdev=3755.88
    clat (usec): min=783, max=86783, avg=38909.39, stdev=2512.99
     lat (usec): min=805, max=86790, avg=38915.77, stdev=2513.02
    clat percentiles (usec):
     |  1.00th=[34341],  5.00th=[35390], 10.00th=[35914], 20.00th=[36963],
     | 30.00th=[38011], 40.00th=[38536], 50.00th=[39060], 60.00th=[39584],
     | 70.00th=[40109], 80.00th=[40633], 90.00th=[41157], 95.00th=[42206],
     | 99.00th=[46924], 99.50th=[47449], 99.90th=[48497], 99.95th=[56361],
     | 99.99th=[73925]
   bw (  KiB/s): min=22046, max=27008, per=100.00%, avg=24673.82, stdev=64.38, samples=1456
   iops        : min= 5504, max= 6752, avg=6168.37, stdev=16.11, samples=1456
  lat (usec)   : 1000=0.01%
  lat (msec)   : 4=0.01%, 10=0.02%, 20=0.03%, 50=99.87%, 100=0.06%
  fsync/fdatasync/sync_file_range:
    sync (nsec): min=22, max=11380, avg=100.76, stdev=77.14
    sync percentiles (nsec):
     |  1.00th=[   46],  5.00th=[   47], 10.00th=[   48], 20.00th=[   54],
     | 30.00th=[   61], 40.00th=[   73], 50.00th=[   86], 60.00th=[  103],
     | 70.00th=[  119], 80.00th=[  133], 90.00th=[  169], 95.00th=[  209],
     | 99.00th=[  302], 99.50th=[  338], 99.90th=[  506], 99.95th=[  604],
     | 99.99th=[ 1128]
  cpu          : usr=0.15%, sys=0.60%, ctx=307774, majf=0, minf=16
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=199.8%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,282917,0,282693 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=24.1MiB/s (25.3MB/s), 24.1MiB/s-24.1MiB/s (25.3MB/s-25.3MB/s), io=1105MiB (1159MB), run=45870-45870msec

Disk stats (read/write):
  sdk: ios=0/562755, merge=0/0, ticks=0/1748630, in_queue=296688, util=99.84%

longhorn with 1 replica with data locality (to exclude potential network negative impact)

fio -direct=1 -filename=/dev/longhorn/testlhlocal    -iodepth 16 -thread -rw=randwrite -ioengine=libaio  -numjobs=16 -runtime=300 -group_reporting -name=4K-100Write100random -fsync=1 -bs=4k  -end-fsync=1 -size=2g
4K-100Write100random: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.27
Starting 16 threads
^Cbs: 16 (f=16): [w(16)][18.3%][w=27.6MiB/s][w=7060 IOPS][eta 04m:05s]
fio: terminating on signal 2

4K-100Write100random: (groupid=0, jobs=16): err= 0: pid=30445: Mon Sep 20 13:15:09 2021
  write: IOPS=6922, BW=27.0MiB/s (28.4MB/s)(1493MiB/55193msec); 0 zone resets
    slat (usec): min=2, max=543, avg= 5.85, stdev= 3.61
    clat (usec): min=728, max=96857, avg=34666.00, stdev=2460.70
     lat (usec): min=832, max=96881, avg=34671.97, stdev=2461.14
    clat percentiles (usec):
     |  1.00th=[29230],  5.00th=[32637], 10.00th=[32900], 20.00th=[33162],
     | 30.00th=[33424], 40.00th=[33817], 50.00th=[33817], 60.00th=[34341],
     | 70.00th=[35390], 80.00th=[35914], 90.00th=[38011], 95.00th=[39584],
     | 99.00th=[42206], 99.50th=[42730], 99.90th=[46924], 99.95th=[48497],
     | 99.99th=[72877]
   bw (  KiB/s): min=22146, max=28856, per=99.98%, avg=27686.71, stdev=78.03, samples=1760
   iops        : min= 5530, max= 7214, avg=6921.60, stdev=19.53, samples=1760
  lat (usec)   : 750=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.02%, 50=99.91%
  lat (msec)   : 100=0.05%
  fsync/fdatasync/sync_file_range:
    sync (nsec): min=22, max=18331, avg=103.58, stdev=83.05
    sync percentiles (nsec):
     |  1.00th=[   47],  5.00th=[   50], 10.00th=[   55], 20.00th=[   63],
     | 30.00th=[   71], 40.00th=[   78], 50.00th=[   87], 60.00th=[  105],
     | 70.00th=[  113], 80.00th=[  129], 90.00th=[  171], 95.00th=[  209],
     | 99.00th=[  290], 99.50th=[  322], 99.90th=[  474], 99.95th=[  572],
     | 99.99th=[  956]
  cpu          : usr=0.15%, sys=0.60%, ctx=414609, majf=0, minf=16
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=199.9%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,382093,0,381869 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=27.0MiB/s (28.4MB/s), 27.0MiB/s-27.0MiB/s (28.4MB/s-28.4MB/s), io=1493MiB (1565MB), run=55193-55193msec

Disk stats (read/write):
  sdl: ios=0/762806, merge=0/1, ticks=0/2094405, in_queue=92300, util=99.86%

The text was updated successfully, but these errors were encountered:

liyimeng · 2021-09-20T13:32:41Z

According to https://longhorn.io/blog/performance-scalability-report-aug-2020/ , the bw should be expected to close the native disk, but my test seems 10 times less, what could be wrong?

yasker · 2021-09-20T17:03:07Z

Can you try https://github.com/yasker/kbench ? In your fio job, you're testing bandwidth with 4k block, which is mostly used for IOPS test (since the block size is small). Also with 16 jobs at the same time, I think the CPU resource might become a contention point.

liyimeng · 2021-09-20T19:01:10Z

@yasker Thanks for your attention!
Yes, I was actually targeting iops. I run kbench as you suggested. here is the outcome.

kubectl logs   -f  -l kbench=fio
TEST_FILE: /volume/test
TEST_OUTPUT_PREFIX: test_device
TEST_SIZE: 30G
Benchmarking iops.fio into test_device-iops.json
Benchmarking bandwidth.fio into test_device-bandwidth.json
Benchmarking latency.fio into test_device-latency.json

=====================
FIO Benchmark Summary
For: test_device
SIZE: 30G
QUICK MODE: DISABLED
=====================
IOPS (Read/Write)
        Random:           28,832 / 5,730
    Sequential:          51,415 / 11,329
  CPU Idleness:                      86%

Bandwidth in KiB/sec (Read/Write)
        Random:      1,034,002 / 278,604
    Sequential:      1,047,018 / 221,776
  CPU Idleness:                      80%

Latency in ns (Read/Write)
        Random:      303,098 / 1,029,192
    Sequential:        371,862 / 934,854
  CPU Idleness:                      93%

I was expecting better iops. My ssd reach 50K+ iops at raw, 47K iops with formatted as ext4.

Something I have done wrong?

yasker · 2021-09-20T21:34:23Z

@liyimeng I think your latest result looks valid. There is performance overhead on using Longhorn. Though Longhorn should still be better than most other software-defined distributed storage solutions out there due to it's simple architecture. The only thing I don't quite understand is the IOPS discrepancy between the random RW and sequential RW. You can try to use the comparison mode in kbench, using local path provisioner vs longhorn to get a relative result to see what's the overhead.

In general, the read performed very well IMO. Your write latency is slightly higher than expected though it's hard for me to tell why ATM.

liyimeng · 2021-09-20T23:41:27Z

@yasker Hi here come the result of local-path vs longhorn. Local-path can easily maintain 40K+ write iops, while longhorn show a hug drop.

kubectl logs   -f  -l kbench=fio
TEST_FILE: /volume1/test
TEST_OUTPUT_PREFIX: Local-Path
TEST_SIZE: 30G
Benchmarking iops.fio into Local-Path-iops.json
Benchmarking bandwidth.fio into Local-Path-bandwidth.json
Benchmarking latency.fio into Local-Path-latency.json
TEST_FILE: /volume2/test
TEST_OUTPUT_PREFIX: Longhorn
TEST_SIZE: 30G
Benchmarking iops.fio into Longhorn-iops.json
Benchmarking bandwidth.fio into Longhorn-bandwidth.json
Benchmarking latency.fio into Longhorn-latency.json

================================
FIO Benchmark Comparsion Summary
For: Local-Path vs Longhorn
SIZE: 30G
QUICK MODE: DISABLED
================================
                              Local-Path   vs                 Longhorn    :              Change
IOPS (Read/Write)
        Random:          58,550 / 40,665   vs           29,420 / 6,897    :   -49.75% / -83.04%
    Sequential:          61,937 / 41,463   vs          46,028 / 13,470    :   -25.69% / -67.51%
  CPU Idleness:                      96%   vs                      85%    :                -11%

Bandwidth in KiB/sec (Read/Write)
        Random:        406,288 / 262,054   vs        493,542 / 337,472    :     21.48% / 28.78%
    Sequential:        485,302 / 307,509   vs        593,200 / 366,985    :     22.23% / 19.34%
  CPU Idleness:                      97%   vs                      85%    :                -12%

Latency in ns (Read/Write)
        Random:        131,945 / 129,217   vs        392,504 / 619,945    :   197.48% / 379.77%
    Sequential:        141,914 / 114,732   vs        394,481 / 615,687    :   177.97% / 436.63%
  CPU Idleness:                      95%   vs                      92%    :                 -3%

What even confuse me is that longhorn achieves higher write bandwidth! I actually post another result early, which show similar result, but I then found out that longhorn was running on a different model of SSD. So I delete it and re-test, making sure both local-path and longhorn run on the model of SSD.

liyimeng · 2021-09-21T06:52:12Z

Repeaating same test, seem just confirm previous test

kubectl logs   -f  -l kbench=fio
TEST_FILE: /volume1/test
TEST_OUTPUT_PREFIX: Local-Path
TEST_SIZE: 30G
Benchmarking iops.fio into Local-Path-iops.json
Benchmarking bandwidth.fio into Local-Path-bandwidth.json
Benchmarking latency.fio into Local-Path-latency.json
TEST_FILE: /volume2/test
TEST_OUTPUT_PREFIX: Longhorn
TEST_SIZE: 30G
Benchmarking iops.fio into Longhorn-iops.json
Benchmarking bandwidth.fio into Longhorn-bandwidth.json
Benchmarking latency.fio into Longhorn-latency.json

================================
FIO Benchmark Comparsion Summary
For: Local-Path vs Longhorn
SIZE: 30G
QUICK MODE: DISABLED
================================
                              Local-Path   vs                 Longhorn    :              Change
IOPS (Read/Write)
        Random:          58,386 / 43,299   vs           27,357 / 6,709    :   -53.14% / -84.51%
    Sequential:          62,129 / 42,527   vs          51,477 / 13,376    :   -17.14% / -68.55%
  CPU Idleness:                      97%   vs                      86%    :                -11%

Bandwidth in KiB/sec (Read/Write)
        Random:        414,700 / 333,020   vs        495,104 / 340,691    :      19.39% / 2.30%
    Sequential:        490,799 / 360,981   vs        598,450 / 369,216    :      21.93% / 2.28%
  CPU Idleness:                      95%   vs                      86%    :                 -9%

Latency in ns (Read/Write)
        Random:        131,863 / 122,037   vs        393,911 / 648,344    :   198.73% / 431.27%
    Sequential:        141,507 / 111,135   vs        391,161 / 639,665    :   176.43% / 475.57%
  CPU Idleness:                      96%   vs                      92%    :                 -4%

yasker · 2021-09-21T15:02:19Z

@liyimeng Higher write bandwidth is probably just due to the fluctuations of the test. Increase the test size might help.

The latest result is consistent with what we've observed and is expected. The main reason for IOPS drop is because of the latency increase, which is due to Longhorn is adding additional layers on top of native disks for HA/snapshot etc mechanism.

It's very hard to achieve near native performance in term of IOPS or latency. Longhorn is already one of the fastest SDS out there (with similar functionality set). We're working on a prototype engine which is able to do but it would likely need a couple of years to achieve the feature parity of the current mature Longhorn engine.

liyimeng · 2021-09-21T17:19:15Z

@yasker Thanks a lot! I understand latency is unfortunately unavoidable when more work need to be done. Is it possible to increase the iops by introducing some kind of parallelism? I guess in current implementation, the underneath disk still have bandwidth. Mayastor guys say they have borrowed some ideas nvnme implementation, which significantly improve iops. Is it something longhorn can try? I know little storage, but their number looks tempting.

btw, jumbo frame seem not helping, which usually improve for iscsi.

yasker · 2021-09-21T18:01:12Z

@liyimeng IOs are already happening in parallel. The new prototype Longhorn is building is based on SPDK (which is also used by Mayastor and other storage vendors), which should provide near-native performance. However, it's going to be hard to further optimize the current Longhorn engine. You might be able to increase performance a bit more with iSCSI queue depth, but that will result in more CPU consumption as well.

liyimeng · 2021-09-28T23:20:59Z

@yasker I am still not getting the picture here. Take the example output as above. The latency is about 600us, i.e. 0.6ms. For a single write thread, it is about 1600 iops. 32 threads did not end up with 50K(1600 x 32) iops, but 6k iops. why parallelism is dropping that much, even write is not contending to the same file?

yasker · 2021-09-30T00:42:13Z

There are some inefficiencies in the data path, like contending for CPU, context switch, memory copy, the efficiency of the protocol etc. In general, it won't be 1600 x 32 even you have 32 threads. Also, Longhorn is using 16 threads for each volume (instead of 32).

liyimeng · 2021-09-30T13:05:56Z

Thanks for sharing the insight! @yasker We gota wait for the next generation longhorn engine then :D

shuo-wu · 2021-10-07T13:07:46Z

Actually, there is a setting named Disable Revision Counter. By design, it could increase the IOPS performance a little bit. But the risk is, there may be data loss/inconsistency when all replica crash simultaneously and auto salvage is triggered.

Notice that it is a relatively dangerous setting in terms of HA. That's why we did not enable it by default. If you are interested in it, you can take a quick try.

liyimeng · 2021-10-26T08:53:24Z

@shuo-wu I try to disable revision counter, but at the UI, it says:

Disable Revision Counter:
Required.
This setting is only for volumes created by UI. 
....

How that can be applied to a dynamic provision volume as kbench is using?

shuo-wu · 2021-10-26T12:50:42Z

@liyimeng Maybe you can create such a volume with PV/PVC in UI first then modify the bench deployment YAML file so that it would use the existing Longhorn volume for testing.

liyimeng · 2021-10-27T08:06:15Z

Thanks, I will see if I have chance for doing so. Will report back if I can make it.

liyimeng · 2021-11-06T19:29:17Z

@shuo-wu The number was not impressive, but since there are too much noise in the cluster, I therefore discard it as a unfair test. I see QingStor present their solution neonio, it is really impressive
, If #3202 is ready, will longhorn catch up by any chance?

liyimeng · 2021-12-17T08:53:07Z

Can this be re-open? It is time to look at performance for longhorn, I guess?

LarsBingBong · 2022-03-21T17:50:18Z

Another day - another test.

The test result with hardware ESXi 7.0 U2 and later (VM version 19) on the worker nodes - on the VMWare HCI - was:

TEST_FILE: /volume1/test
TEST_OUTPUT_PREFIX: Local-Path
TEST_SIZE: 39G
Benchmarking iops.fio into Local-Path-iops.json
Benchmarking bandwidth.fio into Local-Path-bandwidth.json
Benchmarking latency.fio into Local-Path-latency.json
TEST_FILE: /volume2/test
TEST_OUTPUT_PREFIX: Longhorn
TEST_SIZE: 39G
Benchmarking iops.fio into Longhorn-iops.json
Benchmarking bandwidth.fio into Longhorn-bandwidth.json
Benchmarking latency.fio into Longhorn-latency.json

================================
FIO Benchmark Comparsion Summary
For: Local-Path vs Longhorn
CPU Idleness Profiling: disabled
Size: 39G
Quick Mode: disabled
================================
                              Local-Path   vs                 Longhorn    :              Change
IOPS (Read/Write)
        Random:           47,687 / 9,780   vs              6,935 / 258    :   -85.46% / -97.36%
    Sequential:          39,378 / 12,350   vs             10,995 / 578    :   -72.08% / -95.32%

Bandwidth in KiB/sec (Read/Write)
        Random:      1,075,840 / 250,161   vs         411,294 / 28,939    :   -61.77% / -88.43%
    Sequential:      1,111,645 / 319,936   vs         566,433 / 70,555    :   -49.05% / -77.95%

Latency in ns (Read/Write)
        Random:      488,575 / 2,559,169   vs    1,951,327 / 9,963,974    :   299.39% / 289.34%
    Sequential:      451,978 / 2,287,137   vs    1,439,435 / 7,084,941    :   218.47% / 209.77%

So no change

LarsBingBong · 2022-03-21T18:11:43Z

Just quickly the CPU idleness ...

Clearly NOT the issue.

yasker · 2022-03-21T18:28:24Z

Hi @LarsBingBong , thanks for the benchmarking.

In general, we don't recommend running Longhorn on top of another Software-Defined Storage. The performance characteristics can be very hard to determine in that case. Though I know many users are using Longhorn on top of VMWare vSAN, so we will look into that. cc @joshimoo

LarsBingBong · 2022-03-21T18:53:01Z

Thank you very much for chiming in @yasker - much appreciated. I can imagine you have a very busy schedule. Okay, I wasn't aware on the "we don't recommend running Longhorn on top of another Software-Defined Storage". But, I can see there being a potential issue there.

Would love for that to work better performance wise. So, I'm happy that the Longhorn team will give it a look. Much appreciated indeed!
Especially as I don't really see any alternative to Longhorn. OSS and all that nice jazz.

LarsBingBong · 2022-03-21T19:23:07Z

The results where a bit better when I tried without using the Cilium CNI. To use flannel instead.

But, still - not the smoking gun.

I'll stop my testing for now and wait for the Longhorn team to give this a look - ;-) @joshimoo

liyimeng · 2022-03-22T15:24:35Z

@LarsBingBong I don't remember if the fio test run with fsync. If not, vSAN might cheat you by caching write locally, while longhorn by its design nature, will never cache write/read locally. If you do wanna see what is longhorn can truly achieve. Better go with raw disks.

But again, with my raw disks testing, longhorn see significant drop of performance. @yasker Good to see you are still around :D. Longing forward to seeing longhorn come with a new engine update.

LarsBingBong · 2022-04-20T09:29:22Z

@joshimoo just kindly asking whether you guys have planned to look at this? Thank you very much. The Longhorn product and all you do to make it prosper is greatly appreciated.

joshimoo · 2022-04-20T10:34:35Z

@LarsBingBong thanks for all the tests, we are very much looking at general io performance enhancements.

@keithalucas is currently working on the SPDK frontend tracked at #3202
Some of that code can be found below, but is not ready for end user utilization yet.
https://github.com/longhorn/longhorn-spdk

We are also looking at smaller optimizations on the existing longhorn-engine to lower the io latency, which should improve the overall io performance.

fzyzcjy · 2022-04-26T11:58:01Z

Hi, is there any estimation when will it be released? Without this, the HDD is almost not usable.

liyimeng · 2022-04-27T13:33:34Z

We are also looking at smaller optimizations on the existing longhorn-engine to lower the io latency, which should improve the overall io performance.

@joshimoo is this something coming shortly?

gp187 · 2022-05-16T09:09:06Z

This is still a huge issue. Any updates?

thiscantbeserious · 2022-11-02T21:50:20Z

Seems like the keithlucas left rancher mid-year and there hasn't been any work done on this anymore since then.

So I guess SPDK is on hold then, is there anyone from rancher left working on it?

joshimoo · 2022-11-03T04:48:23Z

cc @innobead @DamiaSan is currently looking into this.

liyimeng · 2022-11-05T21:22:48Z

@joshimoo It has been more than 3 years since longhorn team start to look into spdk. Is it technically achievable in longhorn framework?

lictw · 2023-01-19T06:18:29Z

While setting up a new cluster I performed a bunch of performance tests. I gathered information into Google Sheet and posting it. All tests performed via KBench, all nodes have perfect 1 GiB/s local network and same SSD drives. The cluster is vanilla K3s with single master and Flannel network.

Conclusions:

Number of nodes with Longhorn don't affect performance, only the number of replicas and data locality are matter.
IOPS heavily drops even with single replica at same node, and doesn't significantly change at replicas increase.
Write bandwidth drops when replicas increase how it should, but Read bandwidth decrease too, while it should be stable or even increase cause of paralleling.

So, Longhorn is really feature-rich, easy in use and stable software, but it has very high performance cost. I believe and hope that someday this price will be less and my database migrations (large number of operations very dependent on Latency) at testing environments will be applied 2-3 times longer, not ~10 like now.

jazZcarabazZ · 2023-01-19T06:25:08Z

While setting up a new cluster I performed a bunch of performance tests. I gathered information into Google Sheet and posting it. All tests performed via KBench, all nodes have perfect 1 GiB/s local network and same SSD drives. The cluster is vanilla K3s with single master and Flannel network.

Conclusions:

Number of nodes with Longhorn don't affect performance, only the number of replicas and data locality are matter.

IOPS heavily drops even with single replica at same node, and doesn't significantly change at replicas increase.

Write bandwidth drops when replicas increase how it should, but Read bandwidth decrease too, while it should be stable or even increase cause of paralleling.

So, Longhorn is really feature-rich, easy in use and stable software, but it has very high performance cost. I believe and hope that someday this price will be less and my database migrations (large number of operations very dependent on Latency) at testing environments will be applied 2-3 times longer, not ~10 like now.

hi, did you try new data locality feature from the latest 1.4.0 release (Data Locality setting, Strict Local)?

lictw · 2023-01-19T06:47:49Z

Hello! Now Longhorn 1.3.2 in my use, I will check new release. Data Locality ofc tested, there is a column Same Node in table about Data Locality - has node with workload replica or not. Is Strict Local a new 1.4.0 feature?

larssb · 2023-01-19T07:37:15Z

Yes @lictw - Strict Local is a new v1.4.0 only feature.

liyimeng · 2023-01-19T11:22:57Z

According to benchmark in #3957, it is still far from ideal.

lictw · 2023-01-19T12:34:12Z

Why I have IOPS drop like 5-6 times compared to local-path in my tests even with 1 replica and 1 node (so best-effort), when in your tests it's just 2 times..? What the reason can be? It's performance node with 2 x AMD EPYC 7282 (64 cores in total) without any load during tests, raid 1 SSD with LVM under the /var/lib/longhorn, I can't understand what's wrong there.. K3s version 1.21.14, can it matter?

derekbit · 2023-01-19T13:05:41Z

Why I have IOPS drop like 5-6 times compared to local-path in my tests even with 1 replica and 1 node (so best-effort), when in your tests it's just 2 times..? What the reason can be? It's performance node with 2 x AMD EPYC 7282 (64 cores in total) without any load during tests, raid 1 SSD with LVM under the /var/lib/longhorn, I can't understand what's wrong there.. K3s version 1.21.14, can it matter?

@lictw Suggest providing your steps, scripts and disk model. Thank you.

lictw · 2023-01-23T09:34:00Z

I used KBench with 10GiB file, SC was:

parameters:
  dataLocality: disabled
  fstype: xfs
  numberOfReplicas: "1"
  staleReplicaTimeout: "30"

Steps:

Install K3s v1.21.14 (single node).
Install Longhorn v1.3.2 (CSI components single replica).
Start testing.

Disks are 2x INTEL SSDSC2KB96 960GB, raid 1 with LVM, LV for /var/lib/longhorn. Thanks in advance.

derekbit · 2023-01-24T04:15:18Z

Can you provide the kbench results of of local-path provisioner and longhorn? In addition to xfs, you can test ext4 as well. The results we provided is on ext4 filesystem. Tw1ce ***@***.***>於 2023年1月23日週一，下午5:34寫道：

…

I used KBench <https://github.com/yasker/kbench> with 10GiB file, SC was: parameters: dataLocality: disabled fstype: xfs numberOfReplicas: "1" staleReplicaTimeout: "30" Steps: 1. Install K3s v1.21.14 (single node). 2. Install Longhorn v1.3.2 (CSI components single replica). 3. Start testing. Disks are 2x INTEL SSDSC2KB96 960GB, raid 1 with LVM, LV for /var/lib/longhorn. Thanks in advance. — Reply to this email directly, view it on GitHub <#3037 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC7SNAN5WZNTNQAYYAYD4PLWTZGBLANCNFSM5EL6W5MA> . You are receiving this because you commented.Message ID: ***@***.***>

lictw · 2023-01-24T18:29:25Z

local-path is the first row (base) in table, all other results are compared to this result. I will test ext4, I use xfs cause of lost+found directory.

safts · 2024-05-17T07:35:00Z

I'm seeing similar performance. I've used kbench to test 2 combinations:

local-path vs longhorn (data locality: best-effort, 3 replicas)
local-path vs strict-local (data locality: strict-local)

My nodes have 1Gbit networking, so that should be a bottleneck in the Longhorn case. However, it shouldn't be in the strict-local and that's pretty much as bad.

TEST_FILE: /volume1/test
TEST_OUTPUT_PREFIX: Local-Path
TEST_SIZE: 30G
MODE: full
Benchmarking iops.fio into Local-Path-iops.json
Benchmarking bandwidth.fio into Local-Path-bandwidth.json
Benchmarking latency.fio into Local-Path-latency.json
TEST_FILE: /volume2/test
TEST_OUTPUT_PREFIX: Longhorn
TEST_SIZE: 30G
MODE: full
Benchmarking iops.fio into Longhorn-iops.json
Benchmarking bandwidth.fio into Longhorn-bandwidth.json
Benchmarking latency.fio into Longhorn-latency.json

================================
FIO Benchmark Comparsion Summary
For: Local-Path vs Longhorn
CPU Idleness Profiling: enabled
Size: 30G
Quick Mode: disabled
================================
                              Local-Path   vs                 Longhorn    :              Change
IOPS (Read/Write)
        Random:         102,215 / 99,646   vs            5,102 / 2,869    :   -95.01% / -97.12%
    Sequential:          26,542 / 96,409   vs            7,388 / 4,575    :   -72.16% / -95.25%
  CPU Idleness:                      72%   vs                      43%    :                -29%

Bandwidth in KiB/sec (Read/Write)
        Random:        438,565 / 433,142   vs         103,426 / 53,543    :   -76.42% / -87.64%
    Sequential:        438,188 / 420,605   vs         101,684 / 54,092    :   -76.79% / -87.14%
  CPU Idleness:                      90%   vs                      38%    :                -52%

Latency in ns (Read/Write)
        Random:          92,931 / 38,282   vs    1,297,516 / 1,661,621    : 1296.21% / 4240.48%
    Sequential:          43,091 / 39,181   vs    1,358,046 / 1,824,472    : 3051.58% / 4556.52%
  CPU Idleness:                      81%   vs                      81%    :                  0%

TEST_FILE: /volume1/test
TEST_OUTPUT_PREFIX: Local-Path
TEST_SIZE: 30G
MODE: full
Benchmarking iops.fio into Local-Path-iops.json
Benchmarking bandwidth.fio into Local-Path-bandwidth.json
Benchmarking latency.fio into Local-Path-latency.json
TEST_FILE: /volume2/test
TEST_OUTPUT_PREFIX: Strict-Local
TEST_SIZE: 30G
MODE: full
Benchmarking iops.fio into Strict-Local-iops.json
Benchmarking bandwidth.fio into Strict-Local-bandwidth.json
Benchmarking latency.fio into Strict-Local-latency.json

================================
FIO Benchmark Comparsion Summary
For: Local-Path vs Strict-Local
CPU Idleness Profiling: enabled
Size: 30G
Quick Mode: disabled
================================
                              Local-Path   vs             Strict-Local    :              Change
IOPS (Read/Write)
        Random:        104,861 / 103,289   vs            4,849 / 4,969    :   -95.38% / -95.19%
    Sequential:         97,949 / 105,513   vs            7,838 / 7,193    :   -92.00% / -93.18%
  CPU Idleness:                      70%   vs                      49%    :                -21%

Bandwidth in KiB/sec (Read/Write)
        Random:        440,566 / 435,960   vs        124,951 / 124,133    :   -71.64% / -71.53%
    Sequential:        441,136 / 436,143   vs        128,719 / 127,085    :   -70.82% / -70.86%
  CPU Idleness:                      91%   vs                      32%    :                -59%

Latency in ns (Read/Write)
        Random:          84,898 / 36,534   vs        876,757 / 823,783    :  932.72% / 2154.84%
    Sequential:          32,588 / 34,382   vs        850,450 / 910,298    : 2509.70% / 2547.60%
  CPU Idleness:                      80%   vs                      77%    :                 -3%

I am wondering what could be going on. I would expect strict-local to be significantly better (also tbh I would expect the best-effort locality option to use the replica on the node being benchmarked at least for reads, which I'm not sure is happening).

liyimeng added the kind/question Please use `discussion` to ask questions instead label Sep 20, 2021

jenting added this to New in Community Issue Review via automation Sep 21, 2021

jenting moved this from New to In progress in Community Issue Review Sep 22, 2021

shuo-wu moved this from In progress to Pending user response in Community Issue Review Oct 7, 2021

github-actions bot added the stale label Dec 7, 2021

github-actions bot closed this as completed Dec 13, 2021

innobead reopened this Dec 17, 2021

Community Issue Review automation moved this from Pending user response to New Dec 17, 2021

longhorn deleted a comment from github-actions bot Dec 17, 2021

innobead removed the stale label Dec 17, 2021

joshimoo added the area/performance System, volume performance label Dec 18, 2021

innobead modified the milestones: Planning, Backlog Mar 21, 2022

LarsBingBong mentioned this issue Mar 26, 2022

[FEATURE] Longhorn SPDK Support #3202

Closed

7 tasks

git-day mentioned this issue Jan 13, 2023

[KB]NVMe PCIe - Slow Virtual Machine Performance harvester/harvester#3356

Open

liyimeng mentioned this issue Feb 8, 2023

[BUG] very bad performances on RWO and data locality #5374

Open

[QUESTION] Very disappointing performance, is this expected? #3037

[QUESTION] Very disappointing performance, is this expected? #3037

Comments

liyimeng commented Sep 20, 2021

ssd test (not raw disk, already formatted as ext4 and mounted)

longhorn with 3 replicas, and directly test again longhorn volume

longhorn with 1 replica with data locality (to exclude potential network negative impact)

liyimeng commented Sep 20, 2021

yasker commented Sep 20, 2021

liyimeng commented Sep 20, 2021

yasker commented Sep 20, 2021

liyimeng commented Sep 20, 2021

liyimeng commented Sep 21, 2021

yasker commented Sep 21, 2021

liyimeng commented Sep 21, 2021 • edited

yasker commented Sep 21, 2021

liyimeng commented Sep 28, 2021

yasker commented Sep 30, 2021

liyimeng commented Sep 30, 2021

shuo-wu commented Oct 7, 2021

liyimeng commented Oct 26, 2021

shuo-wu commented Oct 26, 2021

liyimeng commented Oct 27, 2021

liyimeng commented Nov 6, 2021

liyimeng commented Dec 17, 2021

LarsBingBong commented Mar 21, 2022

LarsBingBong commented Mar 21, 2022

yasker commented Mar 21, 2022

LarsBingBong commented Mar 21, 2022

LarsBingBong commented Mar 21, 2022 • edited

liyimeng commented Mar 22, 2022

LarsBingBong commented Apr 20, 2022

joshimoo commented Apr 20, 2022

fzyzcjy commented Apr 26, 2022

liyimeng commented Apr 27, 2022

gp187 commented May 16, 2022

thiscantbeserious commented Nov 2, 2022

joshimoo commented Nov 3, 2022

liyimeng commented Nov 5, 2022 • edited

lictw commented Jan 19, 2023

jazZcarabazZ commented Jan 19, 2023

lictw commented Jan 19, 2023 • edited

larssb commented Jan 19, 2023

liyimeng commented Jan 19, 2023

lictw commented Jan 19, 2023

derekbit commented Jan 19, 2023

lictw commented Jan 23, 2023

derekbit commented Jan 24, 2023 via email

lictw commented Jan 24, 2023

safts commented May 17, 2024 • edited

liyimeng commented Sep 21, 2021 •

edited

LarsBingBong commented Mar 21, 2022 •

edited

liyimeng commented Nov 5, 2022 •

edited

lictw commented Jan 19, 2023 •

edited

safts commented May 17, 2024 •

edited