Skip to content

Commit

Permalink
Merge pull request #575 from egernst/CLH-docs
Browse files Browse the repository at this point in the history
arch: add virtualization specific document
  • Loading branch information
jcvenegas committed Dec 9, 2019
2 parents 1fd8ac6 + 952c98d commit 851db27
Show file tree
Hide file tree
Showing 5 changed files with 143 additions and 54 deletions.
Binary file added design/arch-images/api-to-construct.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added design/arch-images/construct-to-vm-concept.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added design/arch-images/vm-concept-to-tech.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
68 changes: 14 additions & 54 deletions design/architecture.md
@@ -1,11 +1,13 @@
# Kata Containers Architecture


* [Overview](#overview)
* [Hypervisor](#hypervisor)
* [Assets](#assets)
* [Virtualization](#virtualization)
* [Guest assets](#guest-assets)
* [Guest kernel](#guest-kernel)
* [Root filesystem image](#root-filesystem-image)
* [Initrd image](#initrd-image)
* [Guest Image](#guest-image)
* [Root filesystem image](#root-filesystem-image)
* [Initrd image](#initrd-image)
* [Agent](#agent)
* [Runtime](#runtime)
* [Configuration](#configuration)
Expand Down Expand Up @@ -101,71 +103,29 @@ configured, `virtio-scsi` will be used. In all other cases a 9pfs VIRTIO mount p
will be used. `kata-agent` uses this mount point as the root filesystem for the
container processes.

## Hypervisor

Kata Containers is designed to support multiple virtual machine monitors (VMMs) and hypervisors.

As of the 1.9 release, Kata Containers supports [QEMU](http://www.qemu-project.org/)/[KVM](http://www.linux-kvm.org/page/Main_Page),
[Firecracker](https://github.com/firecracker-microvm/firecracker)/KVM, as well as the [ACRN hypervisor](https://projectacrn.org/).

### QEMU/KVM

Depending on the host architecture, Kata Containers supports various machine types,
for example `pc` and `q35` on x86 systems, `virt` on ARM systems and `pseries` on IBM Power systems. The default Kata Containers
machine type is `pc`. The machine type and its [`Machine accelerators`](#machine-accelerators) can
be changed by editing the runtime [`configuration`](#configuration) file.

The following QEMU features are used in Kata Containers to manage resource constraints, improve
boot time and reduce memory footprint:

- Machine accelerators.
- Hot plug devices.

Each feature is documented below.

#### Machine accelerators

Machine accelerators are architecture specific and can be used to improve the performance
and enable specific features of the machine types. The following machine accelerators
are used in Kata Containers:

- NVDIMM: This machine accelerator is x86 specific and only supported by `pc` and
`q35` machine types. `nvdimm` is used to provide the root filesystem as a persistent
memory device to the Virtual Machine.

#### Hot plug devices

The Kata Containers VM starts with a minimum amount of resources, allowing for faster boot time and a reduction in memory footprint. As the container launch progresses, devices are hotplugged to the VM. For example, when a CPU constraint is specified which includes additional CPUs, they can be hot added. Kata Containers has support for hot-adding the following devices:
- Virtio block
- Virtio SCSI
- VFIO
- CPU

### Firecracker/KVM
## Virtualization

As of the 1.5 release of Kata Containers, Firecracker VMM is supported. Because of its limited
device support, Firecracker does not support filesystem sharing (good for security and footprint!) As a result,
only block-based storage drivers are supported. Similarly, Firecracker does not support updating
container resources after boot (there is not any device hotplug support), nor does it support VFIO.
How Kata Containers maps container concepts to virtual machine technologies, and how this is realized in the multiple
hypervisors and VMMs that Kata supports is described within the [virtualization documentation](./virtualization.md)

### Assets
## Guest assets

The hypervisor will launch a virtual machine which includes a minimal guest kernel
and a guest image.

#### Guest kernel
### Guest kernel

The guest kernel is passed to the hypervisor and used to boot the virtual
machine. The default kernel provided in Kata Containers is highly optimized for
kernel boot time and minimal memory footprint, providing only those services
required by a container workload. This is based on a very current upstream Linux
kernel.

#### Guest image
### Guest image

Kata Containers supports both an `initrd` and `rootfs` based minimal guest image.

##### Root filesystem image
#### Root filesystem image

The default packaged root filesystem image, sometimes referred to as the "mini O/S", is a
highly optimized container bootstrap system based on [Clear Linux](https://clearlinux.org/). It provides an extremely minimal environment and
Expand All @@ -187,7 +147,7 @@ For example, when `docker run -ti ubuntu date` is run:
new context, first setting the root filesystem to the expected Ubuntu\* root
filesystem.

##### Initrd image
#### Initrd image

A compressed `cpio(1)` archive, created from a rootfs which is loaded into memory and used as part of the Linux startup process. During startup, the kernel unpacks it into a special instance of a `tmpfs` that becomes the initial root filesystem.

Expand Down
129 changes: 129 additions & 0 deletions design/virtualization.md
@@ -0,0 +1,129 @@
# Virtualization in Kata Containers

- [Virtualization in Kata Containers](#virtualization-in-kata-containers)
- [Mapping container concepts to virtual machine technologies](#mapping-container-concepts-to-virtual-machine-technologies)
- [Kata Containers Hypervisor and VMM support](#kata-containers-hypervisor-and-vmm-support)
- [QEMU/KVM](#qemukvm)
- [Machine accelerators](#machine-accelerators)
- [Hotplug devices](#hotplug-devices)
- [Firecracker/KVM](#firecrackerkvm)
- [Cloud Hypervisor/KVM](#cloud-hypervisorkvm)
- [Summary](#summary)


Kata Containers, a second layer of isolation is created on top of those provided by traditional namespace-containers. The
hardware virtualization interface is the basis of this additional layer. Kata will launch a lightweight virtual machine,
and use the guest’s Linux kernel to create a container workload, or workloads in the case of multi-container pods. In Kubernetes
and in the Kata implementation, the sandbox is carried out at the pod level. In Kata, this sandbox is created using a virtual machine.

This document describes how Kata Containers maps container technologies to virtual machines technologies, and how this is realized in
the multiple hypervisors and virtual machine monitors that Kata supports.

## Mapping container concepts to virtual machine technologies

A typical deployment of Kata Containers will be in Kubernetes by way of a Container Runtime Interface (CRI) implementation. On every node,
Kubelet will interact with a CRI implementor (such as containerd or CRI-O), which will in turn interface with Kata Containers (an OCI based runtime).

The CRI API, as defined at the [Kubernetes CRI-API repo](https://github.com/kubernetes/cri-api/), implies a few constructs being supported by the
CRI implementation, and ultimately in Kata Containers. In order to support the full [API](https://github.com/kubernetes/cri-api/blob/a6f63f369f6d50e9d0886f2eda63d585fbd1ab6a/pkg/apis/runtime/v1alpha2/api.proto#L34-L110) with the CRI-implementor, Kata must provide the following constructs:

![API to construct](./arch-images/api-to-construct.png)

These constructs can then be further mapped to what devices are necessary for interfacing with the virtual machine:

![construct to VM concept](./arch-images/construct-to-vm-concept.png)

Ultimately, these concepts map to specific para-virtualized devices or virtualization technologies.

![VM concept to underlying technology](./arch-images/vm-concept-to-tech.png)

Each hypervisor or VMM varies on how or if it handles each of these.

## Kata Containers Hypervisor and VMM support

Kata Containers is designed to support multiple virtual machine monitors (VMMs) and hypervisors.
Kata Containers supports:
- [ACRN hypervisor](https://projectacrn.org/)
- [Cloud Hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor)/[KVM](https://www.linux-kvm.org/page/Main_Page)
- [Firecracker](https://github.com/firecracker-microvm/firecracker)/KVM
- [QEMU](http://www.qemu-project.org/)/KVM

Which configuration to use will depend on the end user's requirements. Details of each solution and a summary are provided below.

### QEMU/KVM

Kata Containers with QEMU has complete compatibility with Kubernetes.

Depending on the host architecture, Kata Containers supports various machine types,
for example `pc` and `q35` on x86 systems, `virt` on ARM systems and `pseries` on IBM Power systems. The default Kata Containers
machine type is `pc`. The machine type and its [`Machine accelerators`](#machine-accelerators) can
be changed by editing the runtime [`configuration`](./architecture.md/#configuration) file.

Devices and features used:
- virtio VSOCK or virtio serial
- virtio block or virtio SCSI
- virtio net
- virtio fs or virtio 9p (recommend: virtio fs)
- VFIO
- hotplug
- machine accelerators

Machine accelerators and hotplug are used in Kata Containers to manage resource constraints, improve boot time and reduce memory footprint. These are documented below.

#### Machine accelerators

Machine accelerators are architecture specific and can be used to improve the performance
and enable specific features of the machine types. The following machine accelerators
are used in Kata Containers:

- NVDIMM: This machine accelerator is x86 specific and only supported by `pc` and
`q35` machine types. `nvdimm` is used to provide the root filesystem as a persistent
memory device to the Virtual Machine.

#### Hotplug devices

The Kata Containers VM starts with a minimum amount of resources, allowing for faster boot time and a reduction in memory footprint. As the container launch progresses,
devices are hotplugged to the VM. For example, when a CPU constraint is specified which includes additional CPUs, they can be hot added. Kata Containers has support
for hot-adding the following devices:
- Virtio block
- Virtio SCSI
- VFIO
- CPU

### Firecracker/KVM

Firecracker, built on many rust crates that are within [rust-VMM](https://github.com/rust-vmm), has a very limited device model, providing a lighter
footprint and attack surface, focusing on function-as-a-service like use cases. As a result, Kata Containers with Firecracker VMM supports a subset of the CRI API.
Firecracker does not support file-system sharing, and as a result only block-based storage drivers are supported. Firecracker does not support device
hotplug nor does it support VFIO. As a result, Kata Containers with Firecracker VMM does not support updating container resources after boot, nor
does it support device passthrough.

Devices used:
- virtio VSOCK
- virtio block
- virtio net

### Cloud Hypervisor/KVM

Cloud Hypervisor, based on [rust-VMM](https://github.com/rust-vmm), is designed to have a lighter footprint and attack surface. For Kata Containers,
relative to Firecracker, the Cloud Hypervisor configuration provides better compatibility at the expense of exposing additional devices: file system
sharing and direct device assignment. As of the 1.10 release of Kata Containers, Cloud Hypervisor does not support device hotplug, and as a result
does not support updating container resources after boot, or utilizing block based volumes. While Cloud Hypervisor does support VFIO, Kata is still adding
this support. As of 1.10, Kata does not support block based volumes or direct device assignment. See [Cloud Hypervisor device support documentation](https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/docs/device_model.md)
for more details on Cloud Hypervisor.

Devices used:
- virtio VSOCK
- virtio block
- virtio net
- virtio fs

### Summary

| Solution | release introduced | brief summary |
|-|-|-|
| QEMU | 1.0 | upstream QEMU, with support for hotplug and filesystem sharing |
| NEMU | 1.4 | Deprecated, removed as of 1.10 release. Slimmed down fork of QEMU, with experimental support of virtio-fs |
| Firecracker | 1.5 | upstream Firecracker, rust-VMM based, no VFIO, no FS sharing, no memory/CPU hotplug |
| QEMU-virtio-fs | 1.7 | upstream QEMU with support for virtio-fs. Will be removed once virtio-fs lands in upstream QEMU |
| Cloud Hypervisor | 1.10 | rust-VMM based, includes VFIO and FS sharing through virtio-fs, no hotplug |

0 comments on commit 851db27

Please sign in to comment.