ci: add self-hosted runners #278

mudler · 2021-06-15T11:01:04Z

We are at capacity with the GHA concurrent jobs limits, and this slows down development quite a lot.

Let's see if we can configure AWS spot instances or w/e else can provide a bunch of runner to be run in this repository.

e.g. by following among the lines of https://github.com/philips-labs/terraform-aws-github-runner to run our tests on. We have also to figure out if we can run vbox (or qemu) to run our test suite on top.

kkaempf · 2021-06-15T12:07:03Z

A self-hosted runner is ready. I just need to boot it.

mudler · 2021-06-15T12:10:07Z

This issue is about having more than one. We run quite some jobs in parallel, and it would quickly become another bottleneck

mudler · 2021-06-21T08:20:43Z

@Itxaka I'll take care of the single node only setup then 👍

Itxaka · 2021-06-21T08:22:01Z

I will have a look at some heat templates to easily add/remove ecp self-hosted runners

mudler · 2021-06-21T08:27:25Z

@Itxaka added a sample k8s deployment of gh runner here: https://github.com/rancher-sandbox/cOS-toolkit/wiki/Github-runner-on-k8s

Itxaka · 2021-06-21T13:58:33Z

This is a POC of a script to deploy github runners in ECP: #304

Tested and working, see readme for details

Works...ok-ish. Workflows would need some adaptations to fully work, user-data migth need also adaptations if we want to use this, but it makes no sense to develop it further.

Need to remove the build dir on each job as it can be left over from a previous job
No vagrant/qemu/packer installed by default, so not all jobs can run.
We can add those packages and run everything in the same base, no more osX required for qemu builds
We could have workers based on size and OS and then use the labels to run the jobs on them, still requires more config and work to the deployment scripts and workflows.
We could have a "master " node that receives one job, creates an on demand instance with this scripts and uses it for one job, deleting it afterwards.
- Makes it behave more like github runners.
- Requires 1 "master" machine to sync everything
- Should be easy to automate the creation, deletion, token adquisition, etc...
- Requires investing in automation
- Requires creating custom images with software preinstalled to avoid 5/10 minute machine boot and configuration per job
We could also have a big node that spawns several docker containers with the workers.
- Allows us more flexibility
- Supports wide range of base os
- REALLY easy to grow or decrease
- If managed via this scripts, its really easy to duplicate/delete/recreate

mudler · 2021-06-24T08:22:02Z

#319 should fix the problem for the time being.

The templating mechanism support to switch to local-runners - I've added ~8 of them without noticing notable peformance gains except the increase parallelism. Although that wouldn't last long as we run many more parallel jobs than 8 for each run.

Pipeline has been reworked and build times have been shrinked - the template supports using the local-runner only as build node and not as test-node (as requires virtualization and such).

To bring up the workers, I've created a cOS VM with the following cloud-init config:

name: "Default user"
stages:
   boot:
     - name: "Hostname and setup"
       hostname: "cos-node-1"
       commands:
       - echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
       dns:
        nameservers:
        - X
#       commands:
#       - passwd -d root
   network:
     - name: "Setup SSH keys"
       authorized_keys:
         admincos:
         - github:mudler
         root:
         - github:mudler
     - if: '[ -z "$(blkid -L COS_SYSTEM || true)" ]'
       name: "Load persisted ssh fingerprint"
       commands:
       - |
            # load ssh fingerprint
            if [ ! -d /usr/local/etc/ssh ]; then
            systemctl start sshd
            mkdir /usr/local/etc/ssh || true
            for i in /etc/ssh/*.pub; do cp -rf $i /usr/local/etc/ssh; done
            fi
     - name: "Setup k3s"
       if: '[ -z "$(blkid -L COS_SYSTEM || true)" ]'
       directories:
       - path: "/usr/local/bin"
         permissions: 0755
         owner: 0
         group: 0
       commands:
       - |
            curl -sfL https://get.k3s.io | \
            INSTALL_K3S_VERSION="v1.20.4+k3s1" \
            INSTALL_K3S_EXEC="--tls-san additional-outside-ip" \
            INSTALL_K3S_SELINUX_WARN="true" \
            sh -
   initramfs:
     - if: '[ -z "$(blkid -L COS_SYSTEM || true)" ]'
       name: "Persist"
       commands:
       - |
            target=/usr/local/.cos-state

            # Always want the latest update of systemd conf from the image
            mkdir -p ${target}/etc/systemd/
            rsync -av /etc/systemd/ ${target}/etc/systemd/
            # Only populate ssh conf once
            if [ ! -e ${target}/etc/ssh ]; then
            mkdir -p ${target}/etc/ssh/
            rsync -av /etc/ssh/ ${target}/etc/ssh/
            fi
            # make /tmp tmpfs
            cp -f /usr/share/systemd/tmp.mount ${target}/etc/systemd/system/
            # undo /home /opt mount from cos immutable-rootfs module
            sed -i '/overlay \/home /d' /etc/fstab
            sed -i '/overlay \/opt /d' /etc/fstab
            umount /home
            umount /opt
            # setup directories as persistent
            for i in root opt home var/lib/rancher var/lib/kubelet etc/systemd etc/rancher etc/ssh usr/libexec; do
            mkdir -p ${target}/$i /$i
            mount ${target}/$i /$i -t none -o bind
            done
            # This is hidden so that if you run some selinux label checking or relabeling the bind
            # mount won't screw up things.  If you have two files at different paths they will get
            # labeled with two different labels.
            mkdir -p ${target}/empty
            mount ${target}/empty ${target} -o bind,ro
            # persist machine-id
            if [ -s /usr/local/etc/machine-id ]; then
            cat /usr/local/etc/machine-id > /etc/machine-id
            else
            mkdir -p /usr/local/etc
            cp /etc/machine-id /usr/local/etc
            fi
            # ensure /var/log/journal exists so it's labeled correctly
            mkdir -p /var/log/journal
     - name: "Setup users"
       users: 
          admincos: 
            homedir: "/home/admincos"
     - name: "groups"
       ensure_entities: 
       - entity: |
                 kind: "group"
                 group_name: "wheel"
                 password: "x"
                 gid: 1020
                 users: "admincos"
       files: 
       - path: "/etc/sudoers.d/wheel"
         owner: 0
         group: 0
         permission: 0600   
         content: |
                   %wheel ALL=(ALL) NOPASSWD: ALL
       - path: "/etc/modprobe.d/ipv6.conf"
         owner: 0
         group: 0
         permission: 0664  
         content: |
                    alias net-pf-10 off
                    alias ipv6 off
                    options ipv6 disable_ipv6=1

For the GH deployment, I've followed https://github.com/rancher-sandbox/cOS-toolkit/wiki/Github-runner-on-k8s

…her#278) This commit makes upgrade|reset|install to create and upgrade `state.yaml` file including system wide data (deployed images, partition labels, etc.) It introduces the concept of installation state and stores such a data in `state.yaml` file in two different locations, state partition root and recovery partition root. The purpose of this duplication is to be able to always find the state.yaml file in a known location regardless of the image we are booting.

mudler added the kind/enhancement New feature or request label Jun 15, 2021

mudler self-assigned this Jun 15, 2021

mudler added this to 💡 Untriaged in Releases Jun 15, 2021

mudler removed their assignment Jun 15, 2021

mudler self-assigned this Jun 21, 2021

mudler moved this from 💡 Untriaged to 🏃🏼‍♂️ In Progress in Releases Jun 21, 2021

mudler assigned Itxaka and mudler and unassigned mudler Jun 21, 2021

mudler unassigned Itxaka Jun 22, 2021

mudler closed this as completed Jun 24, 2021

Releases automation moved this from 🏃🏼‍♂️ In Progress to ✅ Done Jun 24, 2021

mudler mentioned this issue Dec 7, 2021

ci: add cleanup images plugin #934

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add self-hosted runners #278

ci: add self-hosted runners #278

mudler commented Jun 15, 2021 •

edited

Loading

kkaempf commented Jun 15, 2021

mudler commented Jun 15, 2021

mudler commented Jun 21, 2021

Itxaka commented Jun 21, 2021

mudler commented Jun 21, 2021

Itxaka commented Jun 21, 2021

mudler commented Jun 24, 2021

ci: add self-hosted runners #278

ci: add self-hosted runners #278

Comments

mudler commented Jun 15, 2021 • edited Loading

kkaempf commented Jun 15, 2021

mudler commented Jun 15, 2021

mudler commented Jun 21, 2021

Itxaka commented Jun 21, 2021

mudler commented Jun 21, 2021

Itxaka commented Jun 21, 2021

mudler commented Jun 24, 2021

mudler commented Jun 15, 2021 •

edited

Loading