-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UDP Performance Issues #1772
Comments
Congratulations on a nice project :) I've been using your service to run some of your benchmarks myself (by taking some shortcuts and tweaking some numbers), and the bottleneck seems to be that all the requests are being printed to serial. This is very expensive since writing a char to the serial port causes VM exit. With printing enabled I'm reaching kinda the same values as you (loss around 900 QPS). With printing disabled I'm still running the benchmark and just passed 2600 QPS without loss. Here is a nifty snippet that we use in the kernel to enable/disable debugging output.
You can add this to your service and replace all your |
At about 21 500 QPS the throughput starts to fall off when running emulated with qemu on my Mac, using the
|
I also run some tests on a linux machine (@fwsGonzo can fill in the specs) which is not emulated. With virtionet I reach around 150 000 QPS.
|
Wow, I would never have guessed the serial output would be the root cause of this. I originally set it up in order to present the functionality to my professors. I probably should have run the test with a version without serial output, one shortcut too many on my part. :/ Thanks for the feedback, I'll be sure to update the report with this updated information to reflect the error came from my lack of knowledge of the process and my benchmarking process rather than IncludeOS. :) Out of curiosity, concerning the various QPS results, what do you reckon are the main bottlenecks of these performances? Is it the interaction with the hypervisors (which would explain the increase on bare metal)? |
Hehe well it's not obvious that print is expensive. Please keep us updated with any new results/comparisons you find! :) Not totally sure with what you mean with the various results? If you mean by comparing Test 6 and Test 7 in my post above I don't actually know. |
I was mostly referring to the difference between the emulated and non emulated. I'm assuming that by emulated you refer to virtualized, thus non-emulated would be bare-metal. So my curiosity lies with how come there is such a gap in performance in emulated vs non-emulated and what would be the root cause of this? Furthermore, have you performed any benchmarks on other hypervisors like ESXi and openStack or even on cloud platforms? |
@GaetanLongree in both cases we're talking about virtual machines controlled by Qemu. By "emulated" we mean qemu running without hardware acceleration, e.g. hardware supported virtualization which has to be supported both by the CPU and in the host kernel. On linux we enable hardware virtualization using the |
There is some good discussion here to explain emulation vs virtualization and why there is a performance gap: https://stackoverflow.com/questions/6044978/full-emulation-vs-full-virtualization |
As a I mentioned a while back on Gitter, during a internship project, I attempted testing the UDP performance of unikernels by developing a very (probably extremely) simplistic DNS server using the UDP sockets in IncludeOS (code for the DNS server here).
Due to my command line only environment, I used DNSPerf as a benchmarking tool to perform queries from a benchmarking server (100 simulated clients over 4 threads) to another server hosting the service. The tests were performed over 5-minute periods, starting at 100 queries per second then increasing until service failure.
My results, partially posted here with full data here showed that the service refused to process queries at a higher throughput than approximately 850 queries per second.
I doubt the issue is related to the code, as the same code used with containers and tested in the same manner proved capable of processing higher throughput (results of my benchmark here).
This benchmark should be reproducible, as I've written scripts to both deploy the unikernel and launch the benchmark, which is available on the project's repository. Do note that I performed my tests on Ubuntu 16.04 using KVM/QEMU (as should be documented on the project repo).
The text was updated successfully, but these errors were encountered: