kata-deploy: add support for deploying Kata on K8S

A Dockerfile is created and reference daemonsets are also provided for deploying Kata Containers onto a running Kubernetes cluster. A few daemonsets are introduced: 1) runtime-labeler: This daemonset will create a label on each node in the cluster identifying the CRI shim in use. For example, container-runtime=crio or container-runtime=containerd. 2) crio and containerd kata installer: Assuming either CRIO or containerd is the CRI runtime on the node (determined based on label from (1),, either the crio or containerd variant will execute. These daemonsets will install the VM artifacts and host binaries required for using Kata Containers. Once installed, it will add a node label kata-runtime=true and reconfigure either crio or containerd to make use of Kata for untrusted workloads. As a final step it will restart the CRI shim and kubelet. Upon deletion, the daemonset will remove the kata binaries and VM artifacts and update the label to kata-runtime=cleanup. 3) crio and containerd cleanup: Either of these two daemonsets will run, pending the container-runtime label value and if the node has label kata-runtime=cleanup. This daemonset simply restarts crio/containerd as well as kubelet. This was not feasible in a preStepHook, hence the seperate cleanup step. An RBAC is created to allow the daemonsets to modify labels on the node. To deploy kata: kubectl apply -f kata-rbac.yaml kubectl apply -f kata-deploy.yaml To remove kata: kubectl delete -f kata-deploy.yaml kubectl apply -f kata-cleanup.yaml kubectl delete -f kata-cleanup.yaml kubectl delete -f kata-rbac.yaml This initial commit is based on contributions by a few folks on github.com/egernst/kata-deploy Also-by: Saikrishna Edupuganti <saikrishna.edupuganti@intel.com> Signed-off-by: Eric Ernst <eric.ernst@intel.com> Signed-off-by: Jon Olson <jonolson@google.com> Signed-off-by: Ricardo Aravena <raravena@branch.io> Signed-off-by: Eric Ernst <eric.ernst@intel.com>
kata-containers · Jul 10, 2018 · e642e32 · e642e32
1 parent 540d3a2
commit e642e32
Show file tree

Hide file tree

Showing 18 changed files with 625 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1,3 +1,8 @@
 # Kata Containers packaging
 
-This repository is used to generate packages for Kata Containers components.
+Kata Containers currently supports packages for many distributions. Tooling to aid in creating these
+packages are contained within this repository.
+
+In addition, Kata build artifacts are available within a container image, created by a
+[Dockerfile](kata-deploy/Dockerfile).  Reference daemonsets are provided in [kata-deploy](kata-deploy),
+which make installation of Kata Containers in a running Kubernetes Cluster very straightforward.
diff --git a/kata-deploy/Dockerfile b/kata-deploy/Dockerfile
@@ -0,0 +1,21 @@
+FROM centos/systemd
+ARG KATA_VER=1.0.0
+ARG KATA_URL=https://github.com/kata-containers/runtime/releases/download/${KATA_VER}
+
+RUN yum install -y wget
+WORKDIR /tmp/kata/
+RUN wget -q ${KATA_URL}/{vmlinuz.container,kata-containers.img}
+
+WORKDIR /tmp/kata/bin/
+RUN wget -q ${KATA_URL}/{kata-runtime,kata-proxy,kata-shim}
+
+ARG KUBECTL_VER=v1.10.2
+RUN wget -qO /bin/kubectl https://storage.googleapis.com/kubernetes-release/release/${KUBECTL_VER}/bin/linux/amd64/kubectl && \
+    chmod +x /bin/kubectl
+
+COPY bin /tmp/kata/bin
+COPY qemu-artifacts /tmp/kata/share/qemu
+
+COPY configuration.toml /tmp/kata/
+COPY scripts /tmp/kata/scripts
+
diff --git a/kata-deploy/README.md b/kata-deploy/README.md
@@ -0,0 +1,131 @@
+# kata-deploy
+
+
+- [kata-deploy](#kata-deploy)
+  * [Quick start](#quick-start-)
+    + [Install Kata on a running Kubernetes cluster](#install-kata-on-a-running-kubernetes-cluster)
+    + [Run a sample workload](#run-a-sample-workload-)
+    + [Remove Kata from the Kubernetes cluster](#remove-kata-from-the-kubernetes-cluster-)
+  * [kata-deploy details](#kata-deploy-details)
+    + [Dockerfile](#dockerfile)
+    + [Daemonsets and RBAC](#daemonsets-and-rbac-)
+      - [runtime-labeler](#runtime-labeler-)
+      - [CRI-O and containerd kata installer](#cri-o-and-containerd-kata-installer-)
+    + [Kata cleanup](#kata-cleanup-)
+
+
+[kata-deploy](kata-deploy) provides a Dockerfile which contains all of the binaries
+and artifacts required to run Kata Containers, as well as reference daemonsets which can be utilized to install Kata Containers on a running Kubernetes cluster.
+
+Note, installation through daemonsets only succesfully installs `kata-containers.io/kata-runtime` on
+a node if it uses either containerd or CRI-O CRI-shims.
+
+## Quick start:
+
+### Install Kata on a running Kubernetes cluster
+
+```
+kubectl apply -f kata-rbac.yaml
+kubectl apply -f kata-deploy.yaml
+```
+
+### Run a sample workload
+
+Untrusted workloads can node-select based on ```kata-containers.io/kata-runtime=true```, and are
+run through ```kata-containers.io/kata-runtime``` if they are marked with the appropriate CRIO or containerd
+annotation:
+```
+CRIO:           io.kubernetes.cri-o.TrustedSandbox: "false"
+containerd:     io.kubernetes.cri.untrusted-workload: "true"
+```
+
+The following is a sample workload for running untrusted on a kata-enabled node:
+```
+apiVersion: v1
+kind: Pod
+metadata:
+  name: nginx
+   annotations:
+    io.kubernetes.cri-o.TrustedSandbox: "false"
+    io.kubernetes.cri.untrusted-workload: "true"
+  labels:
+    env: test
+spec:
+  containers:
+  - name: nginx
+    image: nginx
+    imagePullPolicy: IfNotPresent
+  nodeSelector:
+    kata-containers.io/kata-runtime: "true"
+```    
+
+To run:
+```
+kubectl apply -f examples/nginx-untrusted.yaml
+```
+
+Now, you should see the pod start. You can verify that the pod is making use of
+```kata-containers.io/kata-runtime``` by comparing the container ID observed with the following:
+```
+/opt/kata/bin/kata-containers.io/kata-runtime list
+kubectl describe pod nginx-untrusted
+```
+
+The following removes the test pod:
+```
+kubectl delete -f examples/nginx-untrusted.yaml
+```
+
+### Remove Kata from the Kubernetes cluster
+
+```
+kubectl delete -f kata-deploy.yaml
+kubectl apply -f kata-cleanup.yaml
+kubectl delete -f kata-cleanup.yaml
+kubectl delete -f kata-rbac.yaml
+```
+
+## kata-deploy Details
+
+### Dockerfile
+
+The Dockerfile used to create the container image deployed in the DaemonSet is provided here.
+This image contains all the necessary artifacts for running Kata Containers.
+
+Host artifacts:
+* kata-containers.io/kata-runtime: pulled from Kata GitHub releases page
+* kata-proxy: pulled from Kata GitHub releases page
+* kata-shim: pulled from Kata GitHub releases page
+* qemu-system-x86_64: statically built and included in this repo, based on Kata's QEMU repo
+* qemu/* : supporting binaries required for qemu-system-x86_64
+
+Virtual Machine artifacts:
+* kata-containers.img: pulled from Kata github releases page
+* vmliuz.container: pulled from Kata github releases page
+
+### Daemonsets and RBAC:
+
+A few daemonsets are introduced for kata-deploy, as well as an RBAC to facilitate
+appyling labels to the nodes.
+
+#### runtime-labeler:
+
+This daemonset creates a label on each node in
+the cluster identifying the CRI shim in use. For example,
+`kata-containers.io/container-runtime=crio` or `kata-containers.io/container-runtime=containerd.`
+
+#### CRI-O and containerd kata installer
+
+Depending the value of `kata-containers.io/container-runtime` label on the node, either the CRI-O or
+containerd kata installation daemonset executes. These daemonsets install
+the necessary kata binaries, configuration files and virtual machine artifacts on
+the node. Once installed, the daemonset adds a node label `kata-containers.io/kata-runtime=true` and reconfigures
+either CRI-O or containerd to make use of Kata for untrusted workloads. As a final step the daemonset
+restarts either CRI-O or containerd and kubelet. Upon deletion, the daemonset removes the kata binaries
+and VM artifacts and updates the node label to `kata-containers.io/kata-runtime=cleanup.`
+
+### Kata cleanup:
+This daemonset runs of the node has the label `kata-containers.io/kata-runtime=cleanup.` This daemonsets removes
+the `kata-containers.io/container-runtime` and `kata-containers.io/kata-runtime` labels as well as restarts either CRI-O or containerd systemctl
+daemon and kubelet. You cannot execute these restets during the preStopHook of the Kata installer daemonset,
+which necessitated this final cleanup daemonset.
diff --git a/kata-deploy/bin/qemu-system-x86_64 b/kata-deploy/bin/qemu-system-x86_64
diff --git a/kata-deploy/configuration.toml b/kata-deploy/configuration.toml
@@ -0,0 +1,144 @@
+# XXX: WARNING: this file is auto-generated.
+# XXX:
+# XXX: Source file: "cli/config/configuration.toml.in"
+# XXX: Project:
+# XXX:   Name: Kata Containers
+# XXX:   Type: kata
+
+[hypervisor.qemu]
+path = "/opt/kata/bin/qemu-system-x86_64"
+kernel = "/opt/kata/vmlinuz.container"
+# initrd = "/opt/kata/vm-artifacts/kata-containers-initrd.img"
+image = "/opt/kata/kata-containers.img"
+machine_type = "pc"
+
+# Optional space-separated list of options to pass to the guest kernel.
+# For example, use `kernel_params = "vsyscall=emulate"` if you are having
+# trouble running pre-2.15 glibc.
+#
+# WARNING: - any parameter specified here will take priority over the default
+# parameter value of the same name used to start the virtual machine.
+# Do not set values here unless you understand the impact of doing so as you
+# may stop the virtual machine from booting.
+# To see the list of default parameters, enable hypervisor debug, create a
+# container and look for 'default-kernel-parameters' log entries.
+kernel_params = ""
+#kernel_params = " agent.log=debug"
+
+# Path to the firmware.
+# If you want that qemu uses the default firmware leave this option empty
+firmware = ""
+
+# Machine accelerators
+# comma-separated list of machine accelerators to pass to the hypervisor.
+# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
+machine_accelerators=""
+
+# Default number of vCPUs per POD/VM:
+# unspecified or 0                --> will be set to 1
+# < 0                             --> will be set to the actual number of physical cores
+# > 0 <= number of physical cores --> will be set to the specified number
+# > number of physical cores      --> will be set to the actual number of physical cores
+default_vcpus = 1
+
+
+# Bridges can be used to hot plug devices.
+# Limitations:
+# * Currently only pci bridges are supported
+# * Until 30 devices per bridge can be hot plugged.
+# * Until 5 PCI bridges can be cold plugged per VM.
+#   This limitation could be a bug in qemu or in the kernel
+# Default number of bridges per POD/VM:
+# unspecified or 0   --> will be set to 1
+# > 1 <= 5           --> will be set to the specified number
+# > 5                --> will be set to 5
+default_bridges = 1
+
+# Default memory size in MiB for POD/VM.
+# If unspecified then it will be set 2048 MiB.
+#default_memory = 2048
+
+# Disable block device from being used for a container's rootfs.
+# In case of a storage driver like devicemapper where a container's 
+# root file system is backed by a block device, the block device is passed
+# directly to the hypervisor for performance reasons. 
+# This flag prevents the block device from being passed to the hypervisor, 
+# 9pfs is used instead to pass the rootfs.
+disable_block_device_use = false
+
+# Block storage driver to be used for the hypervisor in case the container
+# rootfs is backed by a block device. This is either virtio-scsi or 
+# virtio-blk.
+block_device_driver = "virtio-scsi"
+
+# Enable pre allocation of VM RAM, default false
+# Enabling this will result in lower container density
+# as all of the memory will be allocated and locked
+# This is useful when you want to reserve all the memory
+# upfront or in the cases where you want memory latencies
+# to be very predictable
+# Default false
+#enable_mem_prealloc = true
+
+# Enable huge pages for VM RAM, default false
+# Enabling this will result in the VM memory
+# being allocated using huge pages.
+# This is useful when you want to use vhost-user network
+# stacks within the container. This will automatically 
+# result in memory pre allocation
+#enable_hugepages = true
+
+# Enable swap of vm memory. Default false.
+# The behaviour is undefined if mem_prealloc is also set to true
+#enable_swap = true
+
+# This option changes the default hypervisor and kernel parameters
+# to enable debug output where available. This extra output is added
+# to the proxy logs, but only when proxy debug is also enabled.
+# 
+# Default false
+#enable_debug = true
+
+# Disable the customizations done in the runtime when it detects
+# that it is running on top a VMM. This will result in the runtime
+# behaving as it would when running on bare metal.
+# 
+#disable_nesting_checks = true
+
+[proxy.kata]
+path = "/opt/kata/bin/kata-proxy"
+
+# If enabled, proxy messages will be sent to the system log
+# (default: disabled)
+#enable_debug = true
+
+[shim.kata]
+path = "/opt/kata/bin/kata-shim"
+
+# If enabled, shim messages will be sent to the system log
+# (default: disabled)
+#enable_debug = true
+
+[agent.kata]
+# There is no field for this section. The goal is only to be able to
+# specify which type of agent the user wants to use.
+
+[runtime]
+# If enabled, the runtime will log additional debug messages to the
+# system log
+# (default: disabled)
+#enable_debug = true
+#
+# Internetworking model
+# Determines how the VM should be connected to the
+# the container network interface
+# Options:
+#
+#   - bridged
+#     Uses a linux bridge to interconnect the container interface to
+#     the VM. Works for most cases except macvlan and ipvlan.
+#
+#   - macvtap
+#     Used when the Container network interface can be bridged using
+#     macvtap.
+internetworking_model="macvtap"
diff --git a/kata-deploy/example/nginx-untrusted.yaml b/kata-deploy/example/nginx-untrusted.yaml
@@ -0,0 +1,14 @@
+---
+apiVersion: v1
+kind: Pod
+metadata:
+  annotations:
+    io.kubernetes.cri-o.TrustedSandbox: "false"
+    io.kubernetes.cri.untrusted-workload: "true"
+  name: nginx-untrusted
+spec:
+  containers:
+  - name: nginx
+    image: nginx
+  nodeSelector:
+    kata-runtime: "true"
diff --git a/kata-deploy/kata-cleanup.yaml b/kata-deploy/kata-cleanup.yaml
@@ -0,0 +1,50 @@
+---
+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: kubelet-kata-cleanup
+  namespace: kube-system
+spec:
+  selector:
+      matchLabels:
+        name: kubelet-kata-cleanup
+  template:
+    metadata:
+        labels:
+          name: kubelet-kata-cleanup
+    spec:
+      serviceAccountName: kata-label-node
+      nodeSelector:
+          kata-containers.io/kata-runtime: cleanup
+      containers:
+      - name: kube-kata-cleanup
+        image: egernst/kata-deploy
+        imagePullPolicy: Always
+        command: [ "sh", "-c" ]
+        args:
+        - kubectl label node $NODE_NAME kata-containers.io/container-runtime- kata-containers.io/kata-runtime-;
+          systemctl daemon-reload && systemctl restart containerd && systemctl restart crio && systemctl restart kubelet;
+          tail -f /dev/null;
+        env:
+        - name: NODE_NAME
+          valueFrom:
+            fieldRef:
+              fieldPath: spec.nodeName
+        securityContext:
+          privileged: false
+        volumeMounts:
+        - name: dbus
+          mountPath: /var/run/dbus
+        - name: systemd
+          mountPath: /run/systemd
+      volumes:
+        - name: dbus
+          hostPath:
+            path: /var/run/dbus
+        - name: systemd
+          hostPath:
+            path: /run/systemd
+  updateStrategy:
+    rollingUpdate:
+      maxUnavailable: 1
+    type: RollingUpdate