Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: LXC support for self-hosted runners #1682

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

earl-warren
Copy link

@earl-warren earl-warren commented Mar 16, 2023

Description

The LXC support for self-hosted runners is used to run tests that do not fit the constraints imposed by the docker backend such as having a systemd capable environment.

It creates a container from scratch on every run. If the tests accidentally damage essential services such as the ssh server, it will not have any impact on the host running the LXC container. If the same accident happens without the isolation provided by the LXC container, the host itself will be damaged. The LXC support provides a robust isolation for each job in the workflows, which the self-hosted platform does not.

Implementation details

It is roughly the equivalent of doing the following:

  • Creating a LXC container (with lxc-create)
  • Running the commands in the container (with lxc-attach)
  • Destroying the LXC container (with lxc-destroy)

This is inherently insecure, in the same way the self-hosted platform is. Hardening LXC containers is possible but it makes them no more useful than docker containers.

FAQ

  • Why not run LXC inside a Docker container? Because it does not work. Although it could probably be hacked, that would not be more than that.
  • Why not run the LXC logic from the sef-hosted platform? Because it would require significant work and LXC knowledge from each ACT user. Instead they only have to care for their own tests, not to setup and teardown well isolated LXC containers.
  • Why not run the LXC containers as unprivileged to improve security? Because it does not work with most templates. Given the restrictions it imposes, the user is better of using Docker.
  • Why is apparmor disabled? Because it cannot be configured to allow essential operations such as mounting file systems

@earl-warren earl-warren requested a review from a team as a code owner March 16, 2023 11:37
@earl-warren
Copy link
Author

This is an early draft go gauge interest and I'm ready to do the needful if there is. It is developed and used in the context of the https://forgejo.org project.

Comment on lines +144 to +213
var startTemplate = template.Must(template.New("start").Parse(`#!/bin/sh -xe
lxc-create --name="{{.Name}}" --template={{.Template}} -- --release {{.Release}} $packages
tee -a /var/lib/lxc/{{.Name}}/config <<'EOF'
security.nesting = true
lxc.cap.drop =
lxc.apparmor.profile = unconfined
#
# /dev/net (docker won't work without /dev/net/tun)
#
lxc.cgroup2.devices.allow = c 10:200 rwm
lxc.mount.entry = /dev/net dev/net none bind,create=dir 0 0
#
# /dev/kvm (libvirt / kvm won't work without /dev/kvm)
#
lxc.cgroup2.devices.allow = c 10:232 rwm
lxc.mount.entry = /dev/kvm dev/kvm none bind,create=file 0 0
#
# /dev/loop
#
lxc.cgroup2.devices.allow = c 10:237 rwm
lxc.cgroup2.devices.allow = b 7:* rwm
lxc.mount.entry = /dev/loop-control dev/loop-control none bind,create=file 0 0
#
# /dev/mapper
#
lxc.cgroup2.devices.allow = c 10:236 rwm
lxc.mount.entry = /dev/mapper dev/mapper none bind,create=dir 0 0
#
# /dev/fuse
#
lxc.cgroup2.devices.allow = b 10:229 rwm
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file 0 0
EOF

mkdir -p /var/lib/lxc/{{.Name}}/rootfs/{{ .Root }}
mount --bind {{ .Root }} /var/lib/lxc/{{.Name}}/rootfs/{{ .Root }}

mkdir /var/lib/lxc/{{.Name}}/rootfs/tmpdir
mount --bind {{.TmpDir}} /var/lib/lxc/{{.Name}}/rootfs/tmpdir

lxc-start {{.Name}}
lxc-wait --name {{.Name}} --state RUNNING

#
# Wait for the network to come up
#
cat > /var/lib/lxc/{{.Name}}/rootfs/tmpdir/networking.sh <<'EOF'
#!/bin/sh -xe
for d in $(seq 60); do
getent hosts wikipedia.org > /dev/null && break
sleep 1
done
getent hosts wikipedia.org
EOF
chmod +x /var/lib/lxc/{{.Name}}/rootfs/tmpdir/networking.sh

lxc-attach --name {{.Name}} -- /tmpdir/networking.sh

cat > /var/lib/lxc/{{.Name}}/rootfs/tmpdir/node.sh <<'EOF'
#!/bin/sh -xe
# https://github.com/nodesource/distributions#debinstall
apt-get install -y curl git
curl -fsSL https://deb.nodesource.com/setup_16.x | bash -
apt-get install -y nodejs
EOF
chmod +x /var/lib/lxc/{{.Name}}/rootfs/tmpdir/node.sh

lxc-attach --name {{.Name}} -- /tmpdir/node.sh

`))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be separate file an embedded into go

Comment on lines +215 to +222
var stopTemplate = template.Must(template.New("stop").Parse(`#!/bin/sh -x
lxc-ls -1 --filter="^{{.Name}}" | while read container ; do
lxc-stop --kill --name="$container"
umount "/var/lib/lxc/$container/rootfs/{{ .Root }}"
umount "/var/lib/lxc/$container/rootfs/tmpdir"
lxc-destroy --force --name="$container"
done
`))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

command := make([]string, len(commandparam))
copy(command, commandparam)
if user == "root" {
command = append([]string{"/usr/bin/sudo"}, command...)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoded sudo

command = append([]string{"/usr/bin/sudo"}, command...)
} else {
common.Logger(ctx).Debugf("lxc-attach --name %v %v", e.Name, command)
command = append([]string{"/usr/bin/sudo", "--preserve-env", "--preserve-env=PATH", "/usr/bin/lxc-attach", "--keep-env", "--name", e.Name, "--"}, command...)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@panekj
Copy link

panekj commented Mar 16, 2023

This should be separate backend from host

@KnisterPeter
Copy link
Member

@earl-warren This sounds very interesting. I haven't read through the code but I would like to see it integrated into act. I guess we need to discuss about the structure and interfaces a bit more. It's not always clearly separated currently.

@ChristopherHX might have some ideas and expectations here as well.

@earl-warren
Copy link
Author

earl-warren commented Mar 23, 2023

Great to hear 🎉

It could also be an optional feature of the self-hosted platform. If LXC is available and the option is set, wrap it into a LXC container for better isolation. Otherwise not. It would be entirely transparent to the workflows being run.

@ChristopherHX is is something you think to be a valuable addition to the self-hosted platform you implemented? Or would you prefer to go in another direction?

@ChristopherHX
Copy link
Contributor

ChristopherHX commented Mar 23, 2023

@ChristopherHX is is something you think to be a valuable addition to the self-hosted platform you implemented? Or would you prefer to go in another direction?

I primary made my self-hosted for platforms like freebsd, openbsd, plan9 etc. where docker is not a thing (and GitHub Actions doesn't support), so this should be a different implemention of the ExecutionsEnvironment interface.

I prefer that you create a new struct, which overrides (in golang it is probably a different wording, using interfaces it should work) some HostEnvironment functions. Then instantiate it via a custom identifier from the runcontext, maybe -lxc:<baseimage> instead of -self-hosted ( needs design consent, it took long for my PR to land ).

I don't think we need to add Root() and Name() to the interface. You can override the Close() executor inside the struct with access to custom fields. The start method also have access to the fields, if you create a local var with the specfic type and later assign it to jobcontainer.

@earl-warren
Copy link
Author

Thanks for the feedback @ChristopherHX

I prefer that you create a new struct, which overrides (in golang it is probably a different wording, using interfaces it should work) some HostEnvironment functions. Then instantiate it via a custom identifier from the runcontext, maybe -lxc: instead of -self-hosted ( needs design consent, it took long for my PR to land ).

I'm happy to go in this direction. The current implementation is more of a proof of concept / hack than anything else and reworking it entirely is what I was expecting anyways.

If anyone think it should go differently please speak up. Otherwise I'll start working as soon as time permits, which could be as soon as in two weeks from now.

@mergify
Copy link
Contributor

mergify bot commented Jul 17, 2023

@earl-warren this pull request is now in conflict 😩

@mergify mergify bot added the conflict PR has conflicts label Jul 17, 2023
@earl-warren
Copy link
Author

I'm still motivated, it just takes a little longer than expected.

@ChristopherHX
Copy link
Contributor

The HostEnvironment took me ca. 1,5 years (1 year in my fork, 1/2 years in review) to get in here. You can still be faster than me 😅.

Copy link
Contributor

PR is stale and will be closed in 14 days unless there is new activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conflict PR has conflicts size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants