Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slower networking of OSv on firecracker vs QEMU/KVM #1034

Open
wkozaczuk opened this issue Mar 24, 2019 · 5 comments

Comments

Projects
None yet
4 participants
@wkozaczuk
Copy link

commented Mar 24, 2019

Before I dive into details of my performance test results I would to take this occasion to announce on this forum that firecracker is officially and fully supported by OSv unikernel as of latest 0.53.0 release (nickname "Firebird", for details please read here). It can boot in as low as 5ms per bootchart and 10ms per firecracker guest boot time measurement. Maybe it is worth mentioning on https://firecracker-microvm.github.io/ (section "What operating systems are supported by Firecracker?") that besides Linux, OSv can boot on firecracker as well ;-) ? From what I am aware of, OSv is the only unikernel and possibly the only other OS besides Linux that can claim this as of this point in time.

As far as performance comparison between OSv running on firecracker vs QEMU/KVM goes, first I must say that at least in one aspect firecracker beats QEMU - file I/O. I have not done any other elaborate file I/IO tests but for example mounting ZFS filesystem is at least 5 times faster on firecracker - on average 60ms on firecracker vs 260ms on QEMU.
Now as far networking goes, OSv performs a little worse on firecracker vs QEMU and it varies between 50-90% of the performance on QEMU in terms of requests per second depending mostly on number of vCPUs and type of the application I used to test.

My tests were focused of number of REST API requests handled per seconds by a typical microservice app implemented in Rust, using hyper, Golang and Java using vertx.io. Each app in essence implements simple todo REST api returning a json payload of 100-200 characters long.

The test setup looked like this:

Host:

  • MacBook Pro with Intel i7 4 cores CPU with hyperthreading (8 cpus reported by lscpu) with 16GB of RAM with Ubuntu 18.10 on it
  • firecracker 0.15.0
  • QEMU 2.12.0

Client machine:

The host and client machine were connected directly to 1 GBit ethernet switch and host exposed guest IP using a bridged TAP nic.

Here is a list of pure req/sec results:

Go 1 CPU - FC & QEMU
-------------------
Requests/sec:  16422.33
Requests/sec:  16540.24
Requests/sec:  16721.56
-------------------
Requests/sec:  23300.26
Requests/sec:  23874.74
Requests/sec:  24313.06

Go 2 CPU - FC & QEMU
-------------------
Requests/sec:  26676.68
Requests/sec:  28100.00
Requests/sec:  28538.35
-------------------
Requests/sec:  33581.87
Requests/sec:  35475.22
Requests/sec:  37089.26


Rust 1 CPU - FC & QEMU
-------------------
Requests/sec:  23379.86
Requests/sec:  23477.19
Requests/sec:  23604.27
-------------------
Requests/sec:  41100.07
Requests/sec:  43455.34
Requests/sec:  43927.73

Rust 2 CPU - FC & QEMU
-------------------
Requests/sec:  46128.15
Requests/sec:  46590.41
Requests/sec:  46973.84
-------------------
Requests/sec:  48076.98
Requests/sec:  49120.31
Requests/sec:  49298.28


Java 1 CPU - FC & QEMU
-------------------
Requests/sec:  20191.95
Requests/sec:  21384.60
Requests/sec:  21705.82
-------------------
Requests/sec:  41049.41
Requests/sec:  43622.81
Requests/sec:  44777.60

Java 2 CPU - FC & QEMU
-------------------
Requests/sec:  40625.69
Requests/sec:  40876.17
Requests/sec:  43766.45
-------------------
Requests/sec:  45746.48
Requests/sec:  46224.42
Requests/sec:  46245.95

For more detailed results please see the files where I captured full output from wrk - https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote/OSv_firecracker and https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote/OSv_qemu.

Would you have any insight of what might be the reason of relatively slower performance of firecracker? I think I have disabled the rate limiting which is what this script does - https://github.com/cloudius-systems/osv/blob/master/scripts/firecracker.py#L23-L97. It could be also that virtio-mmio implementation on OSv side is not very well optimized - with QEMU OSv uses virtio-pci.

Any help will be greatly appreciated.

@andreeaflorescu

This comment has been minimized.

Copy link
Member

commented Mar 24, 2019

If you do not want to have a rate limiter, you can just write:

 def add_network_interface(self, interface_name, host_interface_name, ):
        self.make_put_call('/network-interfaces/%s' % interface_name, {
            'iface_id': interface_name,
            'host_dev_name': host_interface_name,
            'guest_mac': "52:54:00:12:34:56"
})

because the rate_limiter is an optional field. I am not sure what are the effects of setting every field of the rate_limiter to 0.

We will get back to you after we get a chance to investigate this.

@raduweiss

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2019

@wkozaczuk , first of all I'll say that the folks in the Firecracker maintainer team have seen (and several of us are really excited by) OSv running with Firecracker. Your start-up times are awesome! Frankly, I think we're a bit behind on recognizing Firecracker integrations with other projects, and we will be working on making this better.

We also appreciate the in-depth issue descriptions that have helped us make Firecracker better.

The website (and maybe a docs page in the repo) is one place to showcase this, but I'd also like to write about out our current integrations in something like a blog post. If that's all right with you, we'll get in touch once we have a more clear idea, probably in a couple of weeks.

Regarding IO, I'm not surprised by the results. Rate limiting aside, we simply didn't spend much time on IO optimization (especially disk), since it wasn't a priority for our current users/customers. While IO is definitely something we want to improve on, prioritizing it will depend on user/customer demand (unless someone contributes it 🙂). Here I mean users customers in a sense that includes everyone using Firecracker. So if your group has a specific use case where you're IO-bottlenecked, let us know.

@wkozaczuk

This comment has been minimized.

Copy link
Author

commented Mar 26, 2019

@raduweiss I am very much open in collaborating on blog post.

I do not think that we have any specific use case in mind. I was myself curious to compare how OSv fares on firecracker vs QEMU. I wonder if slowness on networking side is cause by more frequent exists to the host comparing with QEMU. I would be nice to do similar comparison with Linux.

@wkozaczuk

This comment has been minimized.

Copy link
Author

commented Apr 23, 2019

@raduweiss I have just recently published an article on OSv blog about what it took to enhance OSv to make it boot on firecracker.

@andreeaflorescu

This comment has been minimized.

Copy link
Member

commented Apr 23, 2019

@wkozaczuk Nice article! Congrats!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.