Summary of our first workshop about linux containers
- unshare: run a program with namespace disassociated from it's parent
# run bash in a separate pid namespace
sudo unshare --fork --pid --mount-proc bash
ps -aux # this will show 2 processes (bash[pid=1] and ps[pid=2])
# run bash in a separate user namespace
sudo unshare --map-root-user --user bash
whoami #this will show root
- nsenter: run a program with namespaces of another
docker ps --ns # get namespaces info about contaniners including the pid.
sudo nsenter --target <pid-of-container> --mount --user --uts --ipc --net --pid /bin/sh
ps -aux # it will list the processes run inside the container (inside the pid namespace of the container)
- Cgroups: a linux technology used to limit process ressources, cgroups informations are available under
/sys/fs/cgroup/
# limit memory to 250M and cpus number to 1
docker run -m 250m --cpus=1 <container-image>
# disable swap
docker run -m 250m --memory-swap=250 <container-image>
- When the container surpass the memory limit the kernel throws a OOME (Out Of Memory Exception) and starts killing processes starting (most of the time) with the containers (0).
- If
--memory-swap
is set to the same value of-m or --memory
. swap is disabled. (0)
# get the pid of the container
docker ps --ns
# get and decode Capabilities used by a container.
cat /proc/<pid-container>/status | grep ^Cap | cut -d: -f2 | xargs -I{} -n1 capsh --decode={}
- Each image is composed from multiple layers (tar files) each with a json file containing metadata. + a manifest containing metadata about the entire image.
- Each tar file contains a file system with files added in the equivalent Docker Command (from the Dockerfile)
docker pull busybox # download the image busubox, you can pull whatever image you want
docker save busybox > busybox.tar # save the image into a tar file
tar -xvf busybox.tar -C busybox # extract the tar file
tree # or ls to explore the files
- Containers are stored in the file system
/var/lib/containers/storage/overlay-containers
for podman/var/lib/docker
for docker. - Container create a rw layer on top of the image layers containing the data written by the container.
docker commit container-id # write the data by the rw layer of the container to a new image
docker export container-id > image.tar # export the container to a tar file
import image.tar image-name # import the tar to a image and store it in a repository
pull -> expend -> mount -> create a Spec file (config.json) -> run
- Container engine (containerd (docker), podman, Cri-O ...): is responsible of pull -> expend -> mount -> create a Spec file (config.json)
- Container runtime (runc, crun, katacontainers ...): reads the spec file and instructs the kernel. linux kernel: responsible for running and killing the container.
- linux kernel: responsible for running and killing the container.
watch -n 0.1 -e ps aux | grep "runc|crun" # watch the container runtime in a separate terminal
docker run -it <image> bash # you'll notice runc appearing and disapearing briefly in the watch
Security-Enhanced Linux for mere mortals (Red Hat Summit) - Youtube
Using SELinux with container runtimes (DevConf) - YouTube
How Docker Works - Intro to Namespaces (LiveOverflow) - Youtube
Containers unplugged: Linux namespaces (NDC Conferences / Michael Kerrisk) - Youtube
Cgroups, namespaces, and beyond: what are containers made from? -Youtube