docs: improve installation docs

Reorganize cluster node setup, start adding sections for each k8s implementation I've tried. Signed-off-by: Quentin Young <qlyoung@qlyoung.net>
qlyoung · Aug 25, 2020 · ea196e0 · ea196e0
1 parent f580eb4
commit ea196e0
Showing 1 changed file with 184 additions and 95 deletions.
diff --git a/docs/installing.rst b/docs/installing.rst
@@ -32,95 +32,122 @@ Guide
 The steps below assume you are using Ubuntu 18.04 LTS on your cluster nodes.
 More generic instructions should be available prior to the initial release.
 
-Cluster Configuration
-^^^^^^^^^^^^^^^^^^^^^
+NFS
+^^^
+lagopus uses NFS as its storage system. This allows you to keep lagopus storage
+on any device you want; it doesn't even have to be on a cluster node. As long
+as the NFS server is accessible from the cluster you can use it.
 
-* Recommended platform: Ubuntu 18.04
-* Recommended k8s: `microk8s <https://microk8s.io/>`_ or `k3s <https://k3s.io/>`_
+This section describes how to set up an NFS share on Ubuntu 18.04. If you want
+to use some other system, that's fine; there are lots of tutorials on how to
+set up NFS shares online, it's pretty easy.
 
-(k3s is better but all the instructions here are for microk8s)
+- Pick somewhere to host NFS on - the master node is okay for this and usually
+  easiest, but any cluster-accessible machine will work.
 
-If you only have 1 node, `minikube <https://minikube.sigs.k8s.io/docs/>`_ is
-acceptable, though it invokes virtualization overhead.
+  .. warning::
 
-If you already have a cluster set up, here is an Ansible playbook to do all of
-the steps described if your nodes are running microk8s on Ubuntu 18.04. Change
-``qlyoung`` to any root-privileged account.
+     This node should have **lots** of disk space, at least 200gb for
+     production deployments; more depending on how heavy your usage is.
+     Presently Lagopus doesn't do any management of disk resources itself,
+     which is a known limitation; for now, just give yourself as much storage
+     headroom as you can. If you're just trying it out, 10gb or so should be
+     sufficient depending on your job sizes.
 
-.. code-block:: yaml
+- Install NFS::
 
-   - hosts: fuzzers
-     vars:
-       fuzzing_user: qlyoung
-     remote_user: {{ fuzzing_user }}
-     become: yes
-     become_method: sudo
-     gather_facts: no
-     pre_tasks:
-       - name: 'install python2'
-         raw: sudo apt-get -y install python
-     tasks:
-     - name: install-microk8s
-       command: snap install microk8s --classic
-     - name: microk8s-perms
-       command: sudo usermod -a -G microk8s {{ fuzzing_user }}
-     - name: microk8s-enable-dns
-       command: microk8s.enable dns
-     - name: disable-apport
-       shell: |
-         systemctl disable apport
-         systemctl stop apport
-       ignore_errors: yes
-     - name: set-kernel-core-pattern
-       shell: echo 'kernel.core_pattern=core' >> /etc/sysctl.conf && sysctl -p
-     - name: set-kubelet-resources
-       shell: |
-         echo '--cpu-manager-policy=static' >> /var/snap/microk8s/current/args/kubelet
-         echo '--kube-reserved="cpu=200m,memory=512Mi"' >> /var/snap/microk8s/current/args/kubelet
-         rm /var/snap/microk8s/common/var/lib/kubelet/cpu_manager_state
-         systemctl reset-failed snap.microk8s.daemon-kubelet
-         systemctl restart snap.microk8s.daemon-kubelet
-     - name: install-nfs
-       command: apt install -y nfs-common
-     - name: set-kernel-scheduler-performance
-       command: cd /sys/devices/system/cpu; echo performance | tee cpu*/cpufreq/scaling_governor
-       ignore_errors: yes
+     sudo apt update && sudo apt install -y nfs-kernel-server
 
-1. Install Kubernetes on your nodes
+- Make a share directory::
+
+     sudo mkdir -p /opt/lagopus_storage
+     sudo chown nobody:nogroup /opt/lagopus_storage
 
-2. Join your nodes to the cluster
+- Export this share to NFS::
 
-   * microk8s:
+     echo "/opt/lagopus_storage *(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
+     systemctl restart nfs-server
+
+- Open firewall to allow NFS, if necessary
+
+- Verify that NFS is working by trying to access it from a cluster node::
+
+     apt install -y nfs-common && showmount -e <nfs_host>
+
+  If it's working, you should see:
+
+  .. code-block:: console
+
+     Export list for <nfs_host>:
+     /opt/lagopus_storage ::
+
+
+Take note of the hostname or IP address of the NFS server, and the share path.
+You will need to specify them when installing lagopus.
+
+Cluster Configuration
+^^^^^^^^^^^^^^^^^^^^^
+
+This section is broken down by platform. Each k8s implementation has its
+quirks. If you're setting up a new cluster I recommend `k3s
+<https://k3s.io/>`_. If you want to test locally I recommend `kind
+<https://kind.sigs.k8s.io/>`_ or `minikube
+<https://kubernetes.io/docs/tasks/tools/install-minikube/>`_. `microk8s
+<https://microk8s.io/>`_. is also an acceptable choice, but you have to deal
+with snaps, which have many problems.  Don't use microk8s if you have ZFS
+anywhere in your cluster, your troubles will be endless.
 
-     On the "master" node::
+.. _basic_node_setup:
 
-        microk8s.add-node
+Basic node setup
+""""""""""""""""
 
-     On the node (run what the above command told you to run)::
+This section assumes you already have a cluster. It is agnostic to whatever
+implementation of k8s you choose.
 
-        microk8s.join <id>
+Each node in the cluster needs a few tweaks to support lagopus. The necessary
+changes are:
 
-3. On your nodes, run the following::
+* Install NFS support
+* Normalize core dumps
+* Disable apport (Ubuntu only)
+* Disable swap
+* Allow the kubelet to provision static cpu resources
+  (``--cpu-manager-policy=static``)
+* Set kernel CPU scheduler to performance mode
+
+The last 3 are required for AFL to work as a fuzzing driver.
+
+On each node, do the following:
+
+1. Install NFS support
+
+   This is OS-dependent. For example, on Ubuntu::
+
+      apt update
+      apt install -y nfs-common
+
+2. Normalize core dumps::
 
       echo "kernel.core_pattern=core" >> /etc/sysctl.conf
       sysctl -p
 
-   If on Ubuntu, this setting will be overwritten by Apport each boot. You
-   thus need to disable Apport::
+3. If on Ubuntu, the previous setting will be overwritten by Apport each boot.
+   You need to disable Apport::
 
       systemctl stop apport
       systemctl disable apport
 
-   Next, disable swap to prevent fuzzer memory from being swapped, which hurts
+4. Next, disable swap to prevent fuzzer memory from being swapped, which hurts
    performance::
 
       swapoff -a
 
-   Set the CPU governor to performance, which is required by ``AFLplusplus``::
+5. Set the CPU governor to ``performance``::
 
       cd /sys/devices/system/cpu; echo performance | tee cpu*/cpufreq/scaling_governor
 
-4. Set the following kubelet parameters on each of your nodes and restart
+6. Set the following kubelet parameters on each of your nodes and restart
    kubelet::
 
      --cpu-manager-policy=static
@@ -143,63 +170,124 @@ the steps described if your nodes are running microk8s on Ubuntu 18.04. Change
      If the service fails, check ``journalctl -u snap.microk8s.daemon-kubelet``
      for debugging logs.
 
-5. Verify your cluster is configured on the control plane node, e.g.:
+On the master node (or the host when using ``kind``) you need to install `Helm
+<https://github.com/helm/helm>`_. Lagopus is packaged as a Helm Chart, so you
+need Helm to install it. Go to th
 
-   .. code-block:: console
+Installing helm is easy; go `here <https://github.com/helm/helm/releases>`_,
+download the latest 3.x release for your platform, extract the tarball and put
+the ``helm`` binary in :file:`/usr/local/bin`.
 
-      root@k8s-master:~# kubectl get no
-      NAME         STATUS   ROLES    AGE     VERSION
-      microk8s-1   Ready    <none>   38m     v1.17.0
-      k8s-master   Ready    <none>   5d15h   v1.17.0
 
-   All nodes should read ``Ready``.
+kind
+""""
 
+`kind <https://kind.sigs.k8s.io/>`_ is a nice option for running locally
+without needing a physical cluster.  ``kind`` spins up a cluster on your local
+machine by running k8s inside of docker. It's oriented towards proof-of-concept
+and local deployments.
 
-Next you must configure an NFS share, which is used by fuzzers to download jobs
-and then store the results when done.
+Follow the instructions on the ``kind`` homepage to install kind and create a
+cluster. After creating a cluster, go through the steps in
+:ref:`basic_node_setup`.
 
-On Ubuntu 18.04:
+In ``kind``, you can log into the nodes as you would a docker container. Find
+the container IDs of the cluster nodes with ``docker ps``:
 
-- Pick somewhere to host NFS on - the master node is okay for this and usually
-  easiest.
+::
 
-  .. warning::
+   qlyoung@host ~> docker ps
+   CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                       NAMES
+   98bae8548619        kindest/node:v1.18.2   "/usr/local/bin/entr…"   2 hours ago         Up 2 hours          127.0.0.1:39245->6443/tcp   kind-control-plane
 
-     This node should have **lots** of disk space, at least 200gb for
-     production deployments; more depending on how heavy your usage is.
-     Presently Lagopus doesn't do any management of disk resources itself,
-     which is a known limitation; for now, just give yourself as much storage
-     headroom as you can. If you're just trying it out, 10gb or so should be
-     sufficient depending on your job sizes.
 
-- Install NFS::
+After running through the :ref:`basic_node_setup`, you need to get the LAN IP
+of the ``kind`` master node. This is the IP that lagopus will expose its web
+interface on. Log into the master node, then:
 
-     sudo apt update && sudo apt install -y nfs-kernel-server
+.. code-block:: console
 
-- Make a share directory::
+   ip addr show eth0
 
-     sudo mkdir -p /opt/lagopus_storage
-     sudo chown nobody:nogroup /opt/lagopus_storage
+It should be the first address. For example, on my ``kind`` cluster:
 
-- Export this share to NFS::
+.. code-block:: console
 
-     echo "/opt/lagopus_storage *(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
-     systemctl restart nfs-server
+   # ip addr show eth0
+   30: eth0@if31: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
+       link/ether 02:42:ac:13:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
+       inet 172.19.0.2/16 brd 172.19.255.255 scope global eth0
+          valid_lft forever preferred_lft forever
+       inet6 fc00:f853:ccd:e793::2/64 scope global nodad
+          valid_lft forever preferred_lft forever
+       inet6 fe80::42:acff:fe13:2/64 scope link
+          valid_lft forever preferred_lft forever
 
-- Open firewall to allow NFS, if necessary
+The address is ``172.19.0.2``. You should verify that this address is reachable
+from your host by pinging it. Note this address; this is what you'll use as
+``lagopusIP`` when installing lagopus.
 
-- Verify that NFS is working by trying to access it from a cluster node::
+At this point you can skip to :ref:`installing`.
 
-     apt install -y nfs-common && showmount -e <nfs_host>
 
-  If it's working, you should see:
+k3s
+"""
 
-  .. code-block:: console
+Go through the steps in :ref:`basic_node_setup`.
 
-     Export list for <nfs_host>:
-     /opt/lagopus_storage ::
+TODO: document how to enable static CPU scheduling for k3s kubelets
+
+
+microk8s
+""""""""
+
+If you already have a cluster set up, here is an Ansible playbook to do all of
+the steps described if your nodes are running microk8s on Ubuntu 18.04. Change
+``qlyoung`` to any root-privileged account.
+
+.. code-block:: yaml
+
+   - hosts: fuzzers
+     vars:
+       fuzzing_user: qlyoung
+     remote_user: {{ fuzzing_user }}
+     become: yes
+     become_method: sudo
+     gather_facts: no
+     pre_tasks:
+       - name: 'install python2'
+         raw: sudo apt-get -y install python
+     tasks:
+     - name: install-microk8s
+       command: snap install microk8s --classic
+     - name: microk8s-perms
+       command: sudo usermod -a -G microk8s {{ fuzzing_user }}
+     - name: microk8s-enable-dns
+       command: microk8s.enable dns
+     - name: disable-apport
+       shell: |
+         systemctl disable apport
+         systemctl stop apport
+       ignore_errors: yes
+     - name: set-kernel-core-pattern
+       shell: echo 'kernel.core_pattern=core' >> /etc/sysctl.conf && sysctl -p
+     - name: set-kubelet-resources
+       shell: |
+         echo '--cpu-manager-policy=static' >> /var/snap/microk8s/current/args/kubelet
+         echo '--kube-reserved="cpu=200m,memory=512Mi"' >> /var/snap/microk8s/current/args/kubelet
+         rm /var/snap/microk8s/common/var/lib/kubelet/cpu_manager_state
+         systemctl reset-failed snap.microk8s.daemon-kubelet
+         systemctl restart snap.microk8s.daemon-kubelet
+     - name: install-nfs
+       command: apt install -y nfs-common
+     - name: set-kernel-scheduler-performance
+       command: cd /sys/devices/system/cpu; echo performance | tee cpu*/cpufreq/scaling_governor
+       ignore_errors: yes
+
+
+If the service fails, check ``journalctl -u snap.microk8s.daemon-kubelet``
+for debugging logs.
 
-At this point the cluster is set up to run fuzzing jobs.
 
 Building
 ^^^^^^^^
@@ -215,6 +303,7 @@ After that you need to replace all the hardcoded references to my repo in the
 Helm templates with yours (look for ``qlyoung`` in
 ``chart/lagopus/templates``).
 
+.. _installing:
 
 Installing
 ^^^^^^^^^^