Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Data Path Performance #1541

Open
yasker opened this issue Jul 2, 2020 · 13 comments
Open

Epic: Data Path Performance #1541

yasker opened this issue Jul 2, 2020 · 13 comments
Labels
area/v1-data-engine v1 data engine (iSCSI tgt) Epic kind/feature Feature request, new feature
Milestone

Comments

@yasker
Copy link
Member

yasker commented Jul 2, 2020

This serves as the general thread for performance-related discussion in Longhorn.

@yasker yasker changed the title Performance Epic: Performance Jul 2, 2020
@yasker yasker added area/v1-data-engine v1 data engine (iSCSI tgt) Epic labels Jul 2, 2020
@yasker yasker changed the title Epic: Performance Epic: Data Path Performance Jul 2, 2020
@yasker yasker added the kind/feature Feature request, new feature label Jul 2, 2020
@liyimeng
Copy link

@yasker do you have any benchmark number to share? I only see this from Internet https://itnext.io/state-of-persistent-storage-in-k8s-a-benchmark-77a96bb1ac29 it seems longhorn is not impressed with performance.

@yasker
Copy link
Member Author

yasker commented Jul 17, 2020

@liyimeng We're still working on optimizing the performance. But that article is not quite apple to apple comparison since it's comparing Longhorn (who is crash-consistent and sync to multiple replicas) with others are either cached (e.g. without O_DIRECT) or async (Piraeus) or not replicated (1 replica). So the more valid case in comparison for that picture is more like this: Longhorn vs OpenEBS vs Piraeus - 2 replicas and O_DIRECT. Also, by default Longhorn is using 3 replicas, so I am not sure if the author tweaks that too.

Nonetheless, Longhorn is not the fastest in the group. And we're aiming to change that.

@liyimeng
Copy link

@yasker thanks for sharing the info and the awesome work!

@liyimeng
Copy link

liyimeng commented Nov 28, 2020

@yasker this (openebs) seems impressive. Dose longhorn have any plan to adapt this? They have learned the architecture from longhorn, is it time for longhorn to take some good bit from them?

@yasker
Copy link
Member Author

yasker commented Nov 30, 2020

We already have SPDK (which is what Mayastor is using for frontend) on our roadmap for v1.2 release.

@liyimeng
Copy link

liyimeng commented Dec 2, 2020

@yasker super exciting 👍

@rajivml
Copy link

rajivml commented Feb 6, 2021

@yasker do you have metrics comparing longhorn vs ceph performance, we have almost decided to use longhorn but after reading the blogpost from longhorn on the IOPS hit that one has to take compared to bare metal disk we started exploring other options

@stale stale bot added the wontfix label Mar 19, 2021
@jonathon2nd
Copy link

bump

@stale stale bot removed the wontfix label Mar 19, 2021
@longhorn longhorn deleted a comment from stale bot Mar 20, 2021
@michaelandrepearce
Copy link

https://storageos.com/wp-content/uploads/2021/02/Performance-Benchmarking-Cloud-Native-Storage-Solutions-for-Kubernetes.pdf

Seems there’s been some competitor performance testing, clearly will be skewed to their product. Of note though is in conclusion section, where longhorn misses a trick with local caching and read from memory.

@hofalk
Copy link

hofalk commented Sep 9, 2021

Hi @yasker,

We are currently evaluating, if we can switch to a distributed storage solution for our Rancher vSphere clusters to replace the native vsphere storage (currently using in-tree as vsphere is < v7), which - at times - is quite flunky.

Some of the deployments we have are unfortunately very I/O sensitive, so I am trying to figure out how much of a decline in performance we will have. This is why I am running some performance tests comparing local-path/vsphere (which have pretty much identical performance) with longhorn v1.2 and rook-ceph v1.7.2.

To be honest, I don't understand the results I am getting, as they are very bad on the distributed storage side (for both longhorn and ceph), so maybe I am doing something wrong?

For benchmarking I mostly relied on the https://github.com/yasker/kbench test, but also did some general iperf / dd steps to establish a baseline.

BASELINE:

iperf over 60s:
~ 5 GBit (best-case, for communication between two nodes on the same physical host)
~ 900 MBit (worst-case, for communication between two nodes on different physical hosts)

dd if=/dev/zero of=here bs=1G count=1 oflag=direct
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.20552 s, 255 MB/s

From those results I presumed, that the default 30GB PVC of kbench should be more than sufficient to avoid cache (5 GBit * 25 = ~15.6 GB)

kbench standalone results for local-path / vsphere in-tree kbench (average of 5 runs) are as follows:

IOPS (Read/Write)
  Random:   1204 / 22010
  Sequential: 7960 / 22099
  CPU Idleness: 70%

Bandwidth in KiB/sec (Read/Write)
  Random: 144,463 / 495,160
  Sequential: 722,922 / 636,442
  CPU Idleness: 75%

Latency in ns (Read/Write)
  Random: 14,326,507 / 397,082
  Sequential: 1,289,344 / 377,331
  CPU Idleness: 74%

I then compared vsphere/longhorn vsphere/ceph and longhorn/ceph, where both distributed system were configured to keep 3 replicas spread over multiple physical hosts (worst BW scenario):

================================
FIO Benchmark Comparsion Summary
For: vsphere vs Longhorn
SIZE: 30G
QUICK MODE: DISABLED
================================
                              vsphere      vs                 Longhorn    :              Change
IOPS (Read/Write)
        Random:           1,467 / 24,981   vs              811 / 1,443    :   -44.72% / -94.22%
    Sequential:          10,556 / 25,520   vs            3,811 / 2,795    :   -63.90% / -89.05%
  CPU Idleness:                      72%   vs                      48%    :                -24%

Bandwidth in KiB/sec (Read/Write)
        Random:        142,946 / 483,583   vs          69,222 / 27,082    :   -51.57% / -94.40%
    Sequential:        717,757 / 646,241   vs         100,296 / 30,402    :   -86.03% / -95.30%
  CPU Idleness:                      78%   vs                      45%    :                -33%

Latency in ns (Read/Write)
        Random:     17,399,691 / 318,467   vs   16,328,140 / 4,767,117    :   -6.16% / 1396.90%
    Sequential:      1,590,285 / 321,547   vs    3,744,054 / 4,509,962    :  135.43% / 1302.58%
  CPU Idleness:                      78%   vs                      66%    :                -12%

================================
FIO Benchmark Comparsion Summary
For: vsphere vs ceph
SIZE: 30G
QUICK MODE: DISABLED
================================
                                 vsphere   vs                     ceph    :              Change
IOPS (Read/Write)
        Random:             967 / 21,653   vs                777 / 886    :   -19.65% / -95.91%
    Sequential:           8,520 / 20,253   vs                702 / 763    :   -91.76% / -96.23%
  CPU Idleness:                      69%   vs                      62%    :                 -7%

Bandwidth in KiB/sec (Read/Write)
        Random:        148,331 / 477,384   vs          79,271 / 36,099    :   -46.56% / -92.44%
    Sequential:        763,028 / 567,657   vs         113,277 / 30,039    :   -85.15% / -94.71%
  CPU Idleness:                      64%   vs                      63%    :                 -1%

Latency in ns (Read/Write)
        Random:     11,390,927 / 421,355   vs 24,575,176 / 348,190,854    : 115.74% / 82535.98%
    Sequential:      1,153,712 / 411,457   vs   7,540,324 / 33,412,297    :  553.57% / 8020.48%
  CPU Idleness:                      74%   vs                      67%    :                 -7%


================================
FIO Benchmark Comparsion Summary
For: longhorn vs ceph
SIZE: 30G
QUICK MODE: DISABLED
================================
                                longhorn   vs                     ceph    :              Change
IOPS (Read/Write)
        Random:                507 / 801   vs              799 / 1,113    :     57.59% / 38.95%
    Sequential:            2,170 / 1,551   vs                868 / 924    :   -60.00% / -40.43%
  CPU Idleness:                      47%   vs                      67%    :                 20%

Bandwidth in KiB/sec (Read/Write)
        Random:          74,736 / 29,953   vs          80,697 / 21,504    :     7.98% / -28.21%
    Sequential:         140,356 / 43,907   vs         108,995 / 27,622    :   -22.34% / -37.09%
  CPU Idleness:                      36%   vs                      69%    :                 33%

Latency in ns (Read/Write)
        Random:   17,468,093 / 6,125,374   vs 20,407,513 / 323,664,604    :   16.83% / 5184.00%
    Sequential:    3,726,745 / 5,531,482   vs   6,282,552 / 92,793,978    :   68.58% / 1577.56%
  CPU Idleness:                      56%   vs                      84%    :                 28%

I also did some standalone kbench tests for each, but they more or less confirmed the results above.
Trying to rule out BW as my bottleneck, I configured a separate longhorn storageclass to keep only a single replica and did another comparison:

================================
FIO Benchmark Comparsion Summary
For: longhorn-single vs longhorn
SIZE: 30G
QUICK MODE: DISABLED
================================
                         longhorn-single   vs                 longhorn    :              Change
IOPS (Read/Write)
        Random:              764 / 2,017   vs                755 / 896    :    -1.18% / -55.58%
    Sequential:            3,766 / 3,348   vs            3,189 / 1,505    :   -15.32% / -55.05%
  CPU Idleness:                      42%   vs                      42%    :                  0%

Bandwidth in KiB/sec (Read/Write)
        Random:          54,010 / 75,360   vs          75,282 / 33,579    :    39.39% / -55.44%
    Sequential:         104,818 / 68,571   vs         140,474 / 47,664    :    34.02% / -30.49%
  CPU Idleness:                      42%   vs                      36%    :                 -6%

Latency in ns (Read/Write)
        Random:   21,460,384 / 2,744,107   vs   17,864,406 / 6,888,851    :   -16.76% / 151.04%
    Sequential:    3,580,601 / 3,675,651   vs    3,800,925 / 7,019,119    :      6.15% / 90.96%
  CPU Idleness:                      59%   vs                      51%    :                 -8%

I am somewhat baffled by these results. Shouldn't performance drastically increase if I take away the distribution part? Obviously write performance improved a little, but it is nowhere near local-path / vsphere BW.

As I'm no expert on storage, I didn't try optimizing anything in the longhorn / ceph setups and just went with the defaults. So maybe it's simply a configuration issue, but I wouldn't know where to begin.

If anyone can provide input on what to try or do differently, I would greatly appreciate it.

@yasker
Copy link
Member Author

yasker commented Sep 9, 2021

Hi @h0lk ,

Can you describe your vsphere storage setup? e.g. SSD vs spinning disk, how many disks per node etc, how many memories you have for each node. Also, it seems the vsphere storage is running on a different network (or is it non-HA?) since it can exceed the maximum possible bandwidth of 125MB/s.

  1. Sounds like your network bandwidth is a major bottleneck for distributed storage systems. From your iperf result, 900 MBit = 125MB/s, which is the maximum possible value for either read or write bandwidth of a distributed storage system. It's likely the reason of why you see the read bandwidth of both Longhorn and Ceph is only around 100MB/s.
  2. it seems you're using spinning disk or hybrid instead of SSD? The discrepancy between random read and sequential read IOPS is too big, even for local path. Or, it can be the caching mechanism of vsphere.
  3. One thing that is weird is that Longhorn reports lower IOPS on random access vs Ceph. As you can see from the latency part, single Longhorn IO's latency is much lower than Ceph. So in general it should result in better IOPS, as seen from the sequential IOPS part. Maybe the CPU becomes a bottleneck in this case.
  4. For your question regarding the single replica vs multiple replicas, normally it doesn't play a big factor in Longhorn. So I am a little surprised to see the local replica result in double the write performance compare to 3 replica setup. It might due to a combination of CPU and network limits.
  5. When you're testing the single replica case, is that replica on the same node as the attached node? You can enable the data locality feature in Longhorn for that.
  6. Currently no matured distributed storage system has near-native performance as far as I know. Because to implement the HA piece on top of local storage, there will be overhead, both in terms of performance and CPU utilization. We're working on a new generation of storage engine, trying to address that. At the moment based on our testing, Longhorn should be one of the fastest distributed storage system available. As you can see from the latency result, Longhorn is a few magnitudes better than Ceph, especially on writing.

As for suggestions, if you can get a 10G network and more CPUs, I think the result will improve a lot, especially on the bandwidth side. But I will not expect it to be on the same level of native disk in the terms of IOPS. Matching the bandwidth is possible. You can take a look at https://longhorn.io/blog/performance-scalability-report-aug-2020/ for more information on our last benchmark result. We will update it with v1.2.0 soon.

@larssb
Copy link

larssb commented Feb 18, 2022

Did you update the performance report for v1.2?

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Jan 25, 2024
@innobead innobead removed the stale label Jan 25, 2024
@innobead innobead added this to the Backlog milestone Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/v1-data-engine v1 data engine (iSCSI tgt) Epic kind/feature Feature request, new feature
Projects
None yet
Development

No branches or pull requests

8 participants