Epic: Data Path Performance #1541

yasker · 2020-07-02T00:08:11Z

This serves as the general thread for performance-related discussion in Longhorn.

liyimeng · 2020-07-17T18:40:50Z

@yasker do you have any benchmark number to share? I only see this from Internet https://itnext.io/state-of-persistent-storage-in-k8s-a-benchmark-77a96bb1ac29 it seems longhorn is not impressed with performance.

yasker · 2020-07-17T18:52:31Z

@liyimeng We're still working on optimizing the performance. But that article is not quite apple to apple comparison since it's comparing Longhorn (who is crash-consistent and sync to multiple replicas) with others are either cached (e.g. without O_DIRECT) or async (Piraeus) or not replicated (1 replica). So the more valid case in comparison for that picture is more like this: Longhorn vs OpenEBS vs Piraeus - 2 replicas and O_DIRECT. Also, by default Longhorn is using 3 replicas, so I am not sure if the author tweaks that too.

Nonetheless, Longhorn is not the fastest in the group. And we're aiming to change that.

liyimeng · 2020-07-18T07:15:08Z

@yasker thanks for sharing the info and the awesome work!

liyimeng · 2020-11-28T19:15:33Z

@yasker this (openebs) seems impressive. Dose longhorn have any plan to adapt this? They have learned the architecture from longhorn, is it time for longhorn to take some good bit from them?

yasker · 2020-11-30T18:17:26Z

We already have SPDK (which is what Mayastor is using for frontend) on our roadmap for v1.2 release.

liyimeng · 2020-12-02T07:09:07Z

@yasker super exciting 👍

rajivml · 2021-02-06T03:23:56Z

@yasker do you have metrics comparing longhorn vs ceph performance, we have almost decided to use longhorn but after reading the blogpost from longhorn on the IOPS hit that one has to take compared to bare metal disk we started exploring other options

jonathon2nd · 2021-03-19T23:08:45Z

bump

michaelandrepearce · 2021-04-27T01:05:54Z

https://storageos.com/wp-content/uploads/2021/02/Performance-Benchmarking-Cloud-Native-Storage-Solutions-for-Kubernetes.pdf

Seems there’s been some competitor performance testing, clearly will be skewed to their product. Of note though is in conclusion section, where longhorn misses a trick with local caching and read from memory.

hofalk · 2021-09-09T10:08:53Z

Hi @yasker,

We are currently evaluating, if we can switch to a distributed storage solution for our Rancher vSphere clusters to replace the native vsphere storage (currently using in-tree as vsphere is < v7), which - at times - is quite flunky.

Some of the deployments we have are unfortunately very I/O sensitive, so I am trying to figure out how much of a decline in performance we will have. This is why I am running some performance tests comparing local-path/vsphere (which have pretty much identical performance) with longhorn v1.2 and rook-ceph v1.7.2.

To be honest, I don't understand the results I am getting, as they are very bad on the distributed storage side (for both longhorn and ceph), so maybe I am doing something wrong?

For benchmarking I mostly relied on the https://github.com/yasker/kbench test, but also did some general iperf / dd steps to establish a baseline.

BASELINE:

iperf over 60s:
~ 5 GBit (best-case, for communication between two nodes on the same physical host)
~ 900 MBit (worst-case, for communication between two nodes on different physical hosts)

dd if=/dev/zero of=here bs=1G count=1 oflag=direct
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.20552 s, 255 MB/s

From those results I presumed, that the default 30GB PVC of kbench should be more than sufficient to avoid cache (5 GBit * 25 = ~15.6 GB)

kbench standalone results for local-path / vsphere in-tree kbench (average of 5 runs) are as follows:

IOPS (Read/Write)
  Random:   1204 / 22010
  Sequential: 7960 / 22099
  CPU Idleness: 70%

Bandwidth in KiB/sec (Read/Write)
  Random: 144,463 / 495,160
  Sequential: 722,922 / 636,442
  CPU Idleness: 75%

Latency in ns (Read/Write)
  Random: 14,326,507 / 397,082
  Sequential: 1,289,344 / 377,331
  CPU Idleness: 74%

I then compared vsphere/longhorn vsphere/ceph and longhorn/ceph, where both distributed system were configured to keep 3 replicas spread over multiple physical hosts (worst BW scenario):

================================
FIO Benchmark Comparsion Summary
For: vsphere vs Longhorn
SIZE: 30G
QUICK MODE: DISABLED
================================
                              vsphere      vs                 Longhorn    :              Change
IOPS (Read/Write)
        Random:           1,467 / 24,981   vs              811 / 1,443    :   -44.72% / -94.22%
    Sequential:          10,556 / 25,520   vs            3,811 / 2,795    :   -63.90% / -89.05%
  CPU Idleness:                      72%   vs                      48%    :                -24%

Bandwidth in KiB/sec (Read/Write)
        Random:        142,946 / 483,583   vs          69,222 / 27,082    :   -51.57% / -94.40%
    Sequential:        717,757 / 646,241   vs         100,296 / 30,402    :   -86.03% / -95.30%
  CPU Idleness:                      78%   vs                      45%    :                -33%

Latency in ns (Read/Write)
        Random:     17,399,691 / 318,467   vs   16,328,140 / 4,767,117    :   -6.16% / 1396.90%
    Sequential:      1,590,285 / 321,547   vs    3,744,054 / 4,509,962    :  135.43% / 1302.58%
  CPU Idleness:                      78%   vs                      66%    :                -12%

================================
FIO Benchmark Comparsion Summary
For: vsphere vs ceph
SIZE: 30G
QUICK MODE: DISABLED
================================
                                 vsphere   vs                     ceph    :              Change
IOPS (Read/Write)
        Random:             967 / 21,653   vs                777 / 886    :   -19.65% / -95.91%
    Sequential:           8,520 / 20,253   vs                702 / 763    :   -91.76% / -96.23%
  CPU Idleness:                      69%   vs                      62%    :                 -7%

Bandwidth in KiB/sec (Read/Write)
        Random:        148,331 / 477,384   vs          79,271 / 36,099    :   -46.56% / -92.44%
    Sequential:        763,028 / 567,657   vs         113,277 / 30,039    :   -85.15% / -94.71%
  CPU Idleness:                      64%   vs                      63%    :                 -1%

Latency in ns (Read/Write)
        Random:     11,390,927 / 421,355   vs 24,575,176 / 348,190,854    : 115.74% / 82535.98%
    Sequential:      1,153,712 / 411,457   vs   7,540,324 / 33,412,297    :  553.57% / 8020.48%
  CPU Idleness:                      74%   vs                      67%    :                 -7%


================================
FIO Benchmark Comparsion Summary
For: longhorn vs ceph
SIZE: 30G
QUICK MODE: DISABLED
================================
                                longhorn   vs                     ceph    :              Change
IOPS (Read/Write)
        Random:                507 / 801   vs              799 / 1,113    :     57.59% / 38.95%
    Sequential:            2,170 / 1,551   vs                868 / 924    :   -60.00% / -40.43%
  CPU Idleness:                      47%   vs                      67%    :                 20%

Bandwidth in KiB/sec (Read/Write)
        Random:          74,736 / 29,953   vs          80,697 / 21,504    :     7.98% / -28.21%
    Sequential:         140,356 / 43,907   vs         108,995 / 27,622    :   -22.34% / -37.09%
  CPU Idleness:                      36%   vs                      69%    :                 33%

Latency in ns (Read/Write)
        Random:   17,468,093 / 6,125,374   vs 20,407,513 / 323,664,604    :   16.83% / 5184.00%
    Sequential:    3,726,745 / 5,531,482   vs   6,282,552 / 92,793,978    :   68.58% / 1577.56%
  CPU Idleness:                      56%   vs                      84%    :                 28%

I also did some standalone kbench tests for each, but they more or less confirmed the results above.
Trying to rule out BW as my bottleneck, I configured a separate longhorn storageclass to keep only a single replica and did another comparison:

================================
FIO Benchmark Comparsion Summary
For: longhorn-single vs longhorn
SIZE: 30G
QUICK MODE: DISABLED
================================
                         longhorn-single   vs                 longhorn    :              Change
IOPS (Read/Write)
        Random:              764 / 2,017   vs                755 / 896    :    -1.18% / -55.58%
    Sequential:            3,766 / 3,348   vs            3,189 / 1,505    :   -15.32% / -55.05%
  CPU Idleness:                      42%   vs                      42%    :                  0%

Bandwidth in KiB/sec (Read/Write)
        Random:          54,010 / 75,360   vs          75,282 / 33,579    :    39.39% / -55.44%
    Sequential:         104,818 / 68,571   vs         140,474 / 47,664    :    34.02% / -30.49%
  CPU Idleness:                      42%   vs                      36%    :                 -6%

Latency in ns (Read/Write)
        Random:   21,460,384 / 2,744,107   vs   17,864,406 / 6,888,851    :   -16.76% / 151.04%
    Sequential:    3,580,601 / 3,675,651   vs    3,800,925 / 7,019,119    :      6.15% / 90.96%
  CPU Idleness:                      59%   vs                      51%    :                 -8%

I am somewhat baffled by these results. Shouldn't performance drastically increase if I take away the distribution part? Obviously write performance improved a little, but it is nowhere near local-path / vsphere BW.

As I'm no expert on storage, I didn't try optimizing anything in the longhorn / ceph setups and just went with the defaults. So maybe it's simply a configuration issue, but I wouldn't know where to begin.

If anyone can provide input on what to try or do differently, I would greatly appreciate it.

yasker · 2021-09-09T17:08:49Z

Hi @h0lk ,

Can you describe your vsphere storage setup? e.g. SSD vs spinning disk, how many disks per node etc, how many memories you have for each node. Also, it seems the vsphere storage is running on a different network (or is it non-HA?) since it can exceed the maximum possible bandwidth of 125MB/s.

Sounds like your network bandwidth is a major bottleneck for distributed storage systems. From your iperf result, 900 MBit = 125MB/s, which is the maximum possible value for either read or write bandwidth of a distributed storage system. It's likely the reason of why you see the read bandwidth of both Longhorn and Ceph is only around 100MB/s.
it seems you're using spinning disk or hybrid instead of SSD? The discrepancy between random read and sequential read IOPS is too big, even for local path. Or, it can be the caching mechanism of vsphere.
One thing that is weird is that Longhorn reports lower IOPS on random access vs Ceph. As you can see from the latency part, single Longhorn IO's latency is much lower than Ceph. So in general it should result in better IOPS, as seen from the sequential IOPS part. Maybe the CPU becomes a bottleneck in this case.
For your question regarding the single replica vs multiple replicas, normally it doesn't play a big factor in Longhorn. So I am a little surprised to see the local replica result in double the write performance compare to 3 replica setup. It might due to a combination of CPU and network limits.
When you're testing the single replica case, is that replica on the same node as the attached node? You can enable the data locality feature in Longhorn for that.
Currently no matured distributed storage system has near-native performance as far as I know. Because to implement the HA piece on top of local storage, there will be overhead, both in terms of performance and CPU utilization. We're working on a new generation of storage engine, trying to address that. At the moment based on our testing, Longhorn should be one of the fastest distributed storage system available. As you can see from the latency result, Longhorn is a few magnitudes better than Ceph, especially on writing.

As for suggestions, if you can get a 10G network and more CPUs, I think the result will improve a lot, especially on the bandwidth side. But I will not expect it to be on the same level of native disk in the terms of IOPS. Matching the bandwidth is possible. You can take a look at https://longhorn.io/blog/performance-scalability-report-aug-2020/ for more information on our last benchmark result. We will update it with v1.2.0 soon.

larssb · 2022-02-18T20:34:12Z

Did you update the performance report for v1.2?

github-actions · 2024-01-25T01:49:35Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

yasker changed the title ~~Performance~~ Epic: Performance Jul 2, 2020

yasker added area/v1-data-engine v1 data engine (iSCSI tgt) Epic labels Jul 2, 2020

yasker changed the title ~~Epic: Performance~~ Epic: Data Path Performance Jul 2, 2020

yasker added the kind/feature Feature request, new feature label Jul 2, 2020

yasker mentioned this issue Jul 2, 2020

Tuning & Performance #1067

Closed

stale bot added the wontfix label Mar 19, 2021

stale bot removed the wontfix label Mar 19, 2021

longhorn deleted a comment from stale bot Mar 20, 2021

joshimoo mentioned this issue Apr 27, 2022

[FEATURE] Erasure Coding #1964

Open

github-actions bot added the stale label Jan 25, 2024

innobead removed the stale label Jan 25, 2024

innobead added this to the Backlog milestone Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Data Path Performance #1541

Epic: Data Path Performance #1541

yasker commented Jul 2, 2020 •

edited

liyimeng commented Jul 17, 2020

yasker commented Jul 17, 2020 •

edited

liyimeng commented Jul 18, 2020

liyimeng commented Nov 28, 2020 •

edited

yasker commented Nov 30, 2020

liyimeng commented Dec 2, 2020

rajivml commented Feb 6, 2021

jonathon2nd commented Mar 19, 2021

michaelandrepearce commented Apr 27, 2021

hofalk commented Sep 9, 2021 •

edited

yasker commented Sep 9, 2021

larssb commented Feb 18, 2022

github-actions bot commented Jan 25, 2024

Epic: Data Path Performance #1541

Epic: Data Path Performance #1541

Comments

yasker commented Jul 2, 2020 • edited

liyimeng commented Jul 17, 2020

yasker commented Jul 17, 2020 • edited

liyimeng commented Jul 18, 2020

liyimeng commented Nov 28, 2020 • edited

yasker commented Nov 30, 2020

liyimeng commented Dec 2, 2020

rajivml commented Feb 6, 2021

jonathon2nd commented Mar 19, 2021

michaelandrepearce commented Apr 27, 2021

hofalk commented Sep 9, 2021 • edited

yasker commented Sep 9, 2021

larssb commented Feb 18, 2022

github-actions bot commented Jan 25, 2024

yasker commented Jul 2, 2020 •

edited

yasker commented Jul 17, 2020 •

edited

liyimeng commented Nov 28, 2020 •

edited

hofalk commented Sep 9, 2021 •

edited