Skip to content

Latest commit

 

History

History
439 lines (304 loc) · 13.4 KB

19-linux_os_virt_tech.md

File metadata and controls

439 lines (304 loc) · 13.4 KB
SPDX-FileCopyrightText SPDX-License-Identifier title author footer description keywords color class style
© 2023 Menacit AB <foss@menacit.se>
CC-BY-SA-4.0
Virtualisation course: OS-level virtualisation technology
Joel Rangsmo <joel@menacit.se>
© Course authors (CC BY-SA 4.0)
Overview of features and technology which enables OS-level virtualisation on Linux
virtualisation
os
vm
container
internals
cgroup
namespaces
chroot
linux
devops
#ffffff
invert
section.center { text-align: center; }

OS-level virtualisation

How does it work on Linux?

bg right:30%


There is no such thing as a OS-level VM in the Linux kernel.

Gluing together features like chroot, namespaces and cgroup creates the illusion.

This functionality has other neat use-cases besides virtualisation.

bg right:30%


chroot

  • Introduced in UNIX during the 70s
  • "Change file system root"
  • Not designed as a security feature

bg right:30%


$ sudo debootstrap buster my_debian_root http://deb.debian.org/debian/

I: Retrieving InRelease 
I: Resolving dependencies of required packages...
I: Retrieving libacl1 2.2.53-4
[...]
I: Configuring libc-bin...
I: Base system installed successfully.
$ ls my_debian_root/

bin  boot  dev  etc  home  lib  lib32  lib64 [...]

$ cat /etc/os-release | grep -F PRETTY_NAME
PRETTY_NAME="Ubuntu 22.04.1 LTS"

$ sudo chroot my_debian_root /bin/bash

root@node-1:/# cat /etc/os-release | grep -F PRETTY_NAME
PRETTY_NAME="Debian GNU/Linux 10 (buster)"

root@node-1:/# dmesg | wc -l
1002

root@node-1:/# ps -xa | grep password
/bin/pacemaker loop --password G0d!

root@node-1:/# tcpdump -i eth0

tcpdump: listening on eth0
[...]
160 packets captured

bg right:30%


Namespaces

"Functionality to partition a group of processes view of the system".

Host can see through all, members can't.

We'll focus on "process", "network" and "user".

bg right:30%


Process/PID namespace

Members get their own view of the process tree.

bg right:30%


$ ps -e | head -n 4
PID   TTY   TIME      CMD
1     ?     00:00:01  systemd
2     ?     00:00:00  kthreadd
3     ?     00:00:00  rcu_gp

$ sudo chroot my_debian_root ps -e | head -n 4
PID   TTY   TIME      CMD
1     ?     00:00:01  systemd
2     ?     00:00:00  kthreadd
3     ?     00:00:00  rcu_gp

bg right:30%


$ ps -e | head -n 4
PID   TTY   TIME      CMD
1     ?     00:00:01  systemd
2     ?     00:00:00  kthreadd
3     ?     00:00:00  rcu_gp

$ sudo unshare --fork --pid -- chroot my_debian_root /bin/bash

root@node-1:/# ps -e
PID   TTY   TIME      CMD
1     ?     00:00:00  bash
2     ?     00:00:00  ps

Network namespace

Separate "network stack" for members processes.

Example use-cases:

  • Configure per-application FW rules
  • Force cherry-picked services through a VPN
  • Handle overlapping network segments

bg right:30%


User namespace

Root in chroot is root.

Lots of things such as package managers expect root privileges, but don't really need it.

User namespaces give members their own group and user lists.

bg right:30%


cgroup

Members usage of system resources (CPU, memory, disk I/O, etc.) can be limited.

Used together with CRIU for live migration.

Not just used for virtualisation.

bg right:30%


seccomp

Limit which/how system calls can be used.

Some syscalls allows breakout of isolation.

Minimize attack surface of shared kernel.

bg right:30%


Capabilities

Originally developed to make root less omnipotent.

Caps like "NET_BIND_SERVICE" and "SYS_CHROOT" can be given to non-root users.

Not very fine-grained and some are unsafe.

bg right:30%


These and other features make up the beautiful mess we call OS-level virtualisation on Linux!

bg right:30%