Skip to content

Commit

Permalink
docs: improve installation docs
Browse files Browse the repository at this point in the history
Reorganize cluster node setup, start adding sections for each k8s
implementation I've tried.

Signed-off-by: Quentin Young <qlyoung@qlyoung.net>
  • Loading branch information
qlyoung committed Aug 25, 2020
1 parent f580eb4 commit ea196e0
Showing 1 changed file with 184 additions and 95 deletions.
279 changes: 184 additions & 95 deletions docs/installing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,95 +32,122 @@ Guide
The steps below assume you are using Ubuntu 18.04 LTS on your cluster nodes.
More generic instructions should be available prior to the initial release.

Cluster Configuration
^^^^^^^^^^^^^^^^^^^^^
NFS
^^^
lagopus uses NFS as its storage system. This allows you to keep lagopus storage
on any device you want; it doesn't even have to be on a cluster node. As long
as the NFS server is accessible from the cluster you can use it.

* Recommended platform: Ubuntu 18.04
* Recommended k8s: `microk8s <https://microk8s.io/>`_ or `k3s <https://k3s.io/>`_
This section describes how to set up an NFS share on Ubuntu 18.04. If you want
to use some other system, that's fine; there are lots of tutorials on how to
set up NFS shares online, it's pretty easy.

(k3s is better but all the instructions here are for microk8s)
- Pick somewhere to host NFS on - the master node is okay for this and usually
easiest, but any cluster-accessible machine will work.

If you only have 1 node, `minikube <https://minikube.sigs.k8s.io/docs/>`_ is
acceptable, though it invokes virtualization overhead.
.. warning::

If you already have a cluster set up, here is an Ansible playbook to do all of
the steps described if your nodes are running microk8s on Ubuntu 18.04. Change
``qlyoung`` to any root-privileged account.
This node should have **lots** of disk space, at least 200gb for
production deployments; more depending on how heavy your usage is.
Presently Lagopus doesn't do any management of disk resources itself,
which is a known limitation; for now, just give yourself as much storage
headroom as you can. If you're just trying it out, 10gb or so should be
sufficient depending on your job sizes.

.. code-block:: yaml
- Install NFS::

- hosts: fuzzers
vars:
fuzzing_user: qlyoung
remote_user: {{ fuzzing_user }}
become: yes
become_method: sudo
gather_facts: no
pre_tasks:
- name: 'install python2'
raw: sudo apt-get -y install python
tasks:
- name: install-microk8s
command: snap install microk8s --classic
- name: microk8s-perms
command: sudo usermod -a -G microk8s {{ fuzzing_user }}
- name: microk8s-enable-dns
command: microk8s.enable dns
- name: disable-apport
shell: |
systemctl disable apport
systemctl stop apport
ignore_errors: yes
- name: set-kernel-core-pattern
shell: echo 'kernel.core_pattern=core' >> /etc/sysctl.conf && sysctl -p
- name: set-kubelet-resources
shell: |
echo '--cpu-manager-policy=static' >> /var/snap/microk8s/current/args/kubelet
echo '--kube-reserved="cpu=200m,memory=512Mi"' >> /var/snap/microk8s/current/args/kubelet
rm /var/snap/microk8s/common/var/lib/kubelet/cpu_manager_state
systemctl reset-failed snap.microk8s.daemon-kubelet
systemctl restart snap.microk8s.daemon-kubelet
- name: install-nfs
command: apt install -y nfs-common
- name: set-kernel-scheduler-performance
command: cd /sys/devices/system/cpu; echo performance | tee cpu*/cpufreq/scaling_governor
ignore_errors: yes
sudo apt update && sudo apt install -y nfs-kernel-server

1. Install Kubernetes on your nodes
- Make a share directory::

sudo mkdir -p /opt/lagopus_storage
sudo chown nobody:nogroup /opt/lagopus_storage

2. Join your nodes to the cluster
- Export this share to NFS::

* microk8s:
echo "/opt/lagopus_storage *(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
systemctl restart nfs-server

- Open firewall to allow NFS, if necessary

- Verify that NFS is working by trying to access it from a cluster node::

apt install -y nfs-common && showmount -e <nfs_host>

If it's working, you should see:

.. code-block:: console

Export list for <nfs_host>:
/opt/lagopus_storage ::


Take note of the hostname or IP address of the NFS server, and the share path.
You will need to specify them when installing lagopus.

Cluster Configuration
^^^^^^^^^^^^^^^^^^^^^

This section is broken down by platform. Each k8s implementation has its
quirks. If you're setting up a new cluster I recommend `k3s
<https://k3s.io/>`_. If you want to test locally I recommend `kind
<https://kind.sigs.k8s.io/>`_ or `minikube
<https://kubernetes.io/docs/tasks/tools/install-minikube/>`_. `microk8s
<https://microk8s.io/>`_. is also an acceptable choice, but you have to deal
with snaps, which have many problems. Don't use microk8s if you have ZFS
anywhere in your cluster, your troubles will be endless.

On the "master" node::
.. _basic_node_setup:

microk8s.add-node
Basic node setup
""""""""""""""""

On the node (run what the above command told you to run)::
This section assumes you already have a cluster. It is agnostic to whatever
implementation of k8s you choose.

microk8s.join <id>
Each node in the cluster needs a few tweaks to support lagopus. The necessary
changes are:

3. On your nodes, run the following::
* Install NFS support
* Normalize core dumps
* Disable apport (Ubuntu only)
* Disable swap
* Allow the kubelet to provision static cpu resources
(``--cpu-manager-policy=static``)
* Set kernel CPU scheduler to performance mode

The last 3 are required for AFL to work as a fuzzing driver.

On each node, do the following:

1. Install NFS support

This is OS-dependent. For example, on Ubuntu::

apt update
apt install -y nfs-common

2. Normalize core dumps::

echo "kernel.core_pattern=core" >> /etc/sysctl.conf
sysctl -p

If on Ubuntu, this setting will be overwritten by Apport each boot. You
thus need to disable Apport::
3. If on Ubuntu, the previous setting will be overwritten by Apport each boot.
You need to disable Apport::

systemctl stop apport
systemctl disable apport

Next, disable swap to prevent fuzzer memory from being swapped, which hurts
4. Next, disable swap to prevent fuzzer memory from being swapped, which hurts
performance::

swapoff -a

Set the CPU governor to performance, which is required by ``AFLplusplus``::
5. Set the CPU governor to ``performance``::

cd /sys/devices/system/cpu; echo performance | tee cpu*/cpufreq/scaling_governor

4. Set the following kubelet parameters on each of your nodes and restart
6. Set the following kubelet parameters on each of your nodes and restart
kubelet::

--cpu-manager-policy=static
Expand All @@ -143,63 +170,124 @@ the steps described if your nodes are running microk8s on Ubuntu 18.04. Change
If the service fails, check ``journalctl -u snap.microk8s.daemon-kubelet``
for debugging logs.

5. Verify your cluster is configured on the control plane node, e.g.:
On the master node (or the host when using ``kind``) you need to install `Helm
<https://github.com/helm/helm>`_. Lagopus is packaged as a Helm Chart, so you
need Helm to install it. Go to th

.. code-block:: console
Installing helm is easy; go `here <https://github.com/helm/helm/releases>`_,
download the latest 3.x release for your platform, extract the tarball and put
the ``helm`` binary in :file:`/usr/local/bin`.

root@k8s-master:~# kubectl get no
NAME STATUS ROLES AGE VERSION
microk8s-1 Ready <none> 38m v1.17.0
k8s-master Ready <none> 5d15h v1.17.0

All nodes should read ``Ready``.
kind
""""

`kind <https://kind.sigs.k8s.io/>`_ is a nice option for running locally
without needing a physical cluster. ``kind`` spins up a cluster on your local
machine by running k8s inside of docker. It's oriented towards proof-of-concept
and local deployments.

Next you must configure an NFS share, which is used by fuzzers to download jobs
and then store the results when done.
Follow the instructions on the ``kind`` homepage to install kind and create a
cluster. After creating a cluster, go through the steps in
:ref:`basic_node_setup`.

On Ubuntu 18.04:
In ``kind``, you can log into the nodes as you would a docker container. Find
the container IDs of the cluster nodes with ``docker ps``:

- Pick somewhere to host NFS on - the master node is okay for this and usually
easiest.
::

.. warning::
qlyoung@host ~> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
98bae8548619 kindest/node:v1.18.2 "/usr/local/bin/entr…" 2 hours ago Up 2 hours 127.0.0.1:39245->6443/tcp kind-control-plane

This node should have **lots** of disk space, at least 200gb for
production deployments; more depending on how heavy your usage is.
Presently Lagopus doesn't do any management of disk resources itself,
which is a known limitation; for now, just give yourself as much storage
headroom as you can. If you're just trying it out, 10gb or so should be
sufficient depending on your job sizes.

- Install NFS::
After running through the :ref:`basic_node_setup`, you need to get the LAN IP
of the ``kind`` master node. This is the IP that lagopus will expose its web
interface on. Log into the master node, then:

sudo apt update && sudo apt install -y nfs-kernel-server
.. code-block:: console
- Make a share directory::
ip addr show eth0
sudo mkdir -p /opt/lagopus_storage
sudo chown nobody:nogroup /opt/lagopus_storage
It should be the first address. For example, on my ``kind`` cluster:

- Export this share to NFS::
.. code-block:: console
echo "/opt/lagopus_storage *(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
systemctl restart nfs-server
# ip addr show eth0
30: eth0@if31: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:13:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.19.0.2/16 brd 172.19.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fc00:f853:ccd:e793::2/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe13:2/64 scope link
valid_lft forever preferred_lft forever
- Open firewall to allow NFS, if necessary
The address is ``172.19.0.2``. You should verify that this address is reachable
from your host by pinging it. Note this address; this is what you'll use as
``lagopusIP`` when installing lagopus.

- Verify that NFS is working by trying to access it from a cluster node::
At this point you can skip to :ref:`installing`.

apt install -y nfs-common && showmount -e <nfs_host>

If it's working, you should see:
k3s
"""

.. code-block:: console
Go through the steps in :ref:`basic_node_setup`.

Export list for <nfs_host>:
/opt/lagopus_storage ::
TODO: document how to enable static CPU scheduling for k3s kubelets


microk8s
""""""""

If you already have a cluster set up, here is an Ansible playbook to do all of
the steps described if your nodes are running microk8s on Ubuntu 18.04. Change
``qlyoung`` to any root-privileged account.

.. code-block:: yaml
- hosts: fuzzers
vars:
fuzzing_user: qlyoung
remote_user: {{ fuzzing_user }}
become: yes
become_method: sudo
gather_facts: no
pre_tasks:
- name: 'install python2'
raw: sudo apt-get -y install python
tasks:
- name: install-microk8s
command: snap install microk8s --classic
- name: microk8s-perms
command: sudo usermod -a -G microk8s {{ fuzzing_user }}
- name: microk8s-enable-dns
command: microk8s.enable dns
- name: disable-apport
shell: |
systemctl disable apport
systemctl stop apport
ignore_errors: yes
- name: set-kernel-core-pattern
shell: echo 'kernel.core_pattern=core' >> /etc/sysctl.conf && sysctl -p
- name: set-kubelet-resources
shell: |
echo '--cpu-manager-policy=static' >> /var/snap/microk8s/current/args/kubelet
echo '--kube-reserved="cpu=200m,memory=512Mi"' >> /var/snap/microk8s/current/args/kubelet
rm /var/snap/microk8s/common/var/lib/kubelet/cpu_manager_state
systemctl reset-failed snap.microk8s.daemon-kubelet
systemctl restart snap.microk8s.daemon-kubelet
- name: install-nfs
command: apt install -y nfs-common
- name: set-kernel-scheduler-performance
command: cd /sys/devices/system/cpu; echo performance | tee cpu*/cpufreq/scaling_governor
ignore_errors: yes
If the service fails, check ``journalctl -u snap.microk8s.daemon-kubelet``
for debugging logs.

At this point the cluster is set up to run fuzzing jobs.

Building
^^^^^^^^
Expand All @@ -215,6 +303,7 @@ After that you need to replace all the hardcoded references to my repo in the
Helm templates with yours (look for ``qlyoung`` in
``chart/lagopus/templates``).

.. _installing:

Installing
^^^^^^^^^^
Expand Down

0 comments on commit ea196e0

Please sign in to comment.