New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] Very disappointing performance, is this expected? #3037
Comments
According to https://longhorn.io/blog/performance-scalability-report-aug-2020/ , the bw should be expected to close the native disk, but my test seems 10 times less, what could be wrong? |
Can you try https://github.com/yasker/kbench ? In your fio job, you're testing bandwidth with 4k block, which is mostly used for IOPS test (since the block size is small). Also with 16 jobs at the same time, I think the CPU resource might become a contention point. |
@yasker Thanks for your attention!
I was expecting better iops. My ssd reach 50K+ iops at raw, 47K iops with formatted as ext4. Something I have done wrong? |
@liyimeng I think your latest result looks valid. There is performance overhead on using Longhorn. Though Longhorn should still be better than most other software-defined distributed storage solutions out there due to it's simple architecture. The only thing I don't quite understand is the IOPS discrepancy between the random RW and sequential RW. You can try to use the comparison mode in kbench, using In general, the read performed very well IMO. Your write latency is slightly higher than expected though it's hard for me to tell why ATM. |
@yasker Hi here come the result of local-path vs longhorn. Local-path can easily maintain 40K+ write iops, while longhorn show a hug drop.
What even confuse me is that longhorn achieves higher write bandwidth! I actually post another result early, which show similar result, but I then found out that longhorn was running on a different model of SSD. So I delete it and re-test, making sure both local-path and longhorn run on the model of SSD. |
Repeaating same test, seem just confirm previous test
|
@liyimeng Higher write bandwidth is probably just due to the fluctuations of the test. Increase the test size might help. The latest result is consistent with what we've observed and is expected. The main reason for IOPS drop is because of the latency increase, which is due to Longhorn is adding additional layers on top of native disks for HA/snapshot etc mechanism. It's very hard to achieve near native performance in term of IOPS or latency. Longhorn is already one of the fastest SDS out there (with similar functionality set). We're working on a prototype engine which is able to do but it would likely need a couple of years to achieve the feature parity of the current mature Longhorn engine. |
@yasker Thanks a lot! I understand latency is unfortunately unavoidable when more work need to be done. Is it possible to increase the iops by introducing some kind of parallelism? I guess in current implementation, the underneath disk still have bandwidth. Mayastor guys say they have borrowed some ideas nvnme implementation, which significantly improve iops. Is it something longhorn can try? I know little storage, but their number looks tempting. btw, jumbo frame seem not helping, which usually improve for iscsi. |
@liyimeng IOs are already happening in parallel. The new prototype Longhorn is building is based on SPDK (which is also used by Mayastor and other storage vendors), which should provide near-native performance. However, it's going to be hard to further optimize the current Longhorn engine. You might be able to increase performance a bit more with iSCSI queue depth, but that will result in more CPU consumption as well. |
@yasker I am still not getting the picture here. Take the example output as above. The latency is about 600us, i.e. 0.6ms. For a single write thread, it is about 1600 iops. 32 threads did not end up with 50K(1600 x 32) iops, but 6k iops. why parallelism is dropping that much, even write is not contending to the same file? |
There are some inefficiencies in the data path, like contending for CPU, context switch, memory copy, the efficiency of the protocol etc. In general, it won't be 1600 x 32 even you have 32 threads. Also, Longhorn is using 16 threads for each volume (instead of 32). |
Thanks for sharing the insight! @yasker We gota wait for the next generation longhorn engine then :D |
Actually, there is a setting named Notice that it is a relatively dangerous setting in terms of HA. That's why we did not enable it by default. If you are interested in it, you can take a quick try. |
@shuo-wu I try to disable revision counter, but at the UI, it says:
How that can be applied to a dynamic provision volume as kbench is using? |
@liyimeng Maybe you can create such a volume with PV/PVC in UI first then modify the bench deployment YAML file so that it would use the existing Longhorn volume for testing. |
Thanks, I will see if I have chance for doing so. Will report back if I can make it. |
Can this be re-open? It is time to look at performance for longhorn, I guess? |
Another day - another test. The test result with hardware TEST_FILE: /volume1/test
TEST_OUTPUT_PREFIX: Local-Path
TEST_SIZE: 39G
Benchmarking iops.fio into Local-Path-iops.json
Benchmarking bandwidth.fio into Local-Path-bandwidth.json
Benchmarking latency.fio into Local-Path-latency.json
TEST_FILE: /volume2/test
TEST_OUTPUT_PREFIX: Longhorn
TEST_SIZE: 39G
Benchmarking iops.fio into Longhorn-iops.json
Benchmarking bandwidth.fio into Longhorn-bandwidth.json
Benchmarking latency.fio into Longhorn-latency.json
================================
FIO Benchmark Comparsion Summary
For: Local-Path vs Longhorn
CPU Idleness Profiling: disabled
Size: 39G
Quick Mode: disabled
================================
Local-Path vs Longhorn : Change
IOPS (Read/Write)
Random: 47,687 / 9,780 vs 6,935 / 258 : -85.46% / -97.36%
Sequential: 39,378 / 12,350 vs 10,995 / 578 : -72.08% / -95.32%
Bandwidth in KiB/sec (Read/Write)
Random: 1,075,840 / 250,161 vs 411,294 / 28,939 : -61.77% / -88.43%
Sequential: 1,111,645 / 319,936 vs 566,433 / 70,555 : -49.05% / -77.95%
Latency in ns (Read/Write)
Random: 488,575 / 2,559,169 vs 1,951,327 / 9,963,974 : 299.39% / 289.34%
Sequential: 451,978 / 2,287,137 vs 1,439,435 / 7,084,941 : 218.47% / 209.77% So no change |
Hi @LarsBingBong , thanks for the benchmarking. In general, we don't recommend running Longhorn on top of another Software-Defined Storage. The performance characteristics can be very hard to determine in that case. Though I know many users are using Longhorn on top of VMWare vSAN, so we will look into that. cc @joshimoo |
Thank you very much for chiming in @yasker - much appreciated. I can imagine you have a very busy schedule. Okay, I wasn't aware on the "we don't recommend running Longhorn on top of another Software-Defined Storage". But, I can see there being a potential issue there. Would love for that to work better performance wise. So, I'm happy that the Longhorn team will give it a look. Much appreciated indeed! |
The results where a bit better when I tried without using the Cilium CNI. To use flannel instead. But, still - not the smoking gun. I'll stop my testing for now and wait for the Longhorn team to give this a look - ;-) @joshimoo |
@LarsBingBong I don't remember if the fio test run with fsync. If not, vSAN might cheat you by caching write locally, while longhorn by its design nature, will never cache write/read locally. If you do wanna see what is longhorn can truly achieve. Better go with raw disks. But again, with my raw disks testing, longhorn see significant drop of performance. @yasker Good to see you are still around :D. Longing forward to seeing longhorn come with a new engine update. |
@joshimoo just kindly asking whether you guys have planned to look at this? Thank you very much. The Longhorn product and all you do to make it prosper is greatly appreciated. |
@LarsBingBong thanks for all the tests, we are very much looking at general io performance enhancements. @keithalucas is currently working on the SPDK frontend tracked at #3202 We are also looking at smaller optimizations on the existing longhorn-engine to lower the io latency, which should improve the overall io performance. |
Hi, is there any estimation when will it be released? Without this, the HDD is almost not usable. |
@joshimoo is this something coming shortly? |
This is still a huge issue. Any updates? |
Seems like the keithlucas left rancher mid-year and there hasn't been any work done on this anymore since then. So I guess SPDK is on hold then, is there anyone from rancher left working on it? |
@joshimoo It has been more than 3 years since longhorn team start to look into spdk. Is it technically achievable in longhorn framework? |
While setting up a new cluster I performed a bunch of performance tests. I gathered information into Google Sheet and posting it. All tests performed via KBench, all nodes have perfect 1 GiB/s local network and same SSD drives. The cluster is vanilla K3s with single master and Flannel network. Conclusions:
So, Longhorn is really feature-rich, easy in use and stable software, but it has very high performance cost. I believe and hope that someday this price will be less and my database migrations (large number of operations very dependent on Latency) at testing environments will be applied 2-3 times longer, not ~10 like now. |
hi, did you try new data locality feature from the latest 1.4.0 release (Data Locality setting, Strict Local)? |
Hello! Now Longhorn 1.3.2 in my use, I will check new release. Data Locality ofc tested, there is a column |
Yes @lictw - Strict Local is a new v1.4.0 only feature. |
According to benchmark in #3957, it is still far from ideal. |
Why I have IOPS drop like 5-6 times compared to local-path in my tests even with 1 replica and 1 node (so best-effort), when in your tests it's just 2 times..? What the reason can be? It's performance node with 2 x AMD EPYC 7282 (64 cores in total) without any load during tests, raid 1 SSD with LVM under the /var/lib/longhorn, I can't understand what's wrong there.. K3s version 1.21.14, can it matter? |
@lictw Suggest providing your steps, scripts and disk model. Thank you. |
I used KBench with 10GiB file, SC was:
Steps:
Disks are 2x INTEL SSDSC2KB96 960GB, raid 1 with LVM, LV for /var/lib/longhorn. Thanks in advance. |
Can you provide the kbench results of of local-path provisioner and
longhorn? In addition to xfs, you can test ext4 as well. The results we
provided is on ext4 filesystem.
Tw1ce ***@***.***>於 2023年1月23日 週一,下午5:34寫道:
… I used KBench <https://github.com/yasker/kbench> with 10GiB file, SC was:
parameters:
dataLocality: disabled
fstype: xfs
numberOfReplicas: "1"
staleReplicaTimeout: "30"
Steps:
1. Install K3s v1.21.14 (single node).
2. Install Longhorn v1.3.2 (CSI components single replica).
3. Start testing.
Disks are 2x INTEL SSDSC2KB96 960GB, raid 1 with LVM, LV for
/var/lib/longhorn. Thanks in advance.
—
Reply to this email directly, view it on GitHub
<#3037 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC7SNAN5WZNTNQAYYAYD4PLWTZGBLANCNFSM5EL6W5MA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
local-path is the first row (base) in table, all other results are compared to this result. I will test ext4, I use xfs cause of lost+found directory. |
I'm seeing similar performance. I've used kbench to test 2 combinations:
My nodes have 1Gbit networking, so that should be a bottleneck in the Longhorn case. However, it shouldn't be in the
I am wondering what could be going on. I would expect |
Question
I run a fio test with longhorn 1.1.2, the result is kind of disappointing. On native ssd, 4K random write could reach IOPS=42.5k, BW=166MiB/s, while on longhorn, only get about IOPS=6k, BW=16MiB/s
Is this expected longhorn performance?
Environment:
Additional context
Add any other context about the problem here.
ssd test (not raw disk, already formatted as ext4 and mounted)
longhorn with 3 replicas, and directly test again longhorn volume
longhorn with 1 replica with data locality (to exclude potential network negative impact)
The text was updated successfully, but these errors were encountered: