# Working with Containers

![Status](https://img.shields.io/static/v1.svg?label=Status&message=Finished&color=brightgreen)
[![Source](https://img.shields.io/static/v1.svg?label=GitHub&message=Source&color=181717&logo=GitHub)](https://github.com/particle1331/ok-transformer/blob/master/docs/nb/dk/00-containers.ipynb)
[![Stars](https://img.shields.io/github/stars/particle1331/ok-transformer?style=social)](https://github.com/particle1331/ok-transformer)

---

## Introduction

Docker at its core solves the problem of installing dependencies across different machines. This along with reproducibility is a central concern in deploying machine learning systems. To work with Docker, we have to understand the concepts of **images** and **containers**. Docker after all can be thought of as an entire ecosystem around creating containers from images and running containers. This notebook is adapted from the first three sections of [this course](https://www.udemy.com/course/docker-and-kubernetes-the-complete-guide/).

```{figure} diagrams/00-dockerfix.jpeg
---
name: dockerfix
---
Docker solves the problem of installing software consistently.
```

### OS Kernel

To understand containers, we give a quick overview of **operating systems** (OS). Most OS has a **kernel** which runs software process that governs access between all programs and physical hardware connected on your computer. the kernel acts as an intermediate layer between running processes and hardware through system calls. Docker is able to isolate resources using namespaces and set usage and prioritization limits using control groups which are features of the Linux kernel. See this [blog post](https://www.nginx.com/blog/what-are-namespaces-cgroups-how-do-they-work/) for more details.

```{figure} diagrams/00-osarch.svg
---
name: osarch
---
OS architecture.
```

Note that the host computer does not necessarily have a Linux OS, what happens is that Docker sets up a Linux **virtual machine** (VM) inside the host computer. This can be seen below where the OS of the running Docker server is shown. Here we have [Docker Desktop](https://www.docker.com/products/docker-desktop/) running on the background on a macOS laptop.

In [1]:
!docker version

Client:
 Cloud integration: v1.0.29
 Version:           20.10.21
 API version:       1.41
 Go version:        go1.18.7
 Git commit:        baeda1f
 Built:             Tue Oct 25 18:01:18 2022
 OS/Arch:           darwin/arm64
 Context:           desktop-linux
 Experimental:      true

Server: Docker Desktop 4.14.1 (91661)
 Engine:
  Version:          20.10.21
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.7
  Git commit:       3056208
  Built:            Tue Oct 25 17:59:41 2022
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.9
  GitCommit:        1c90a442489720eec95342e1789ee8a5e1b9536f
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0


### Hello world

The following example demonstrates building and running a container:

In [2]:
!docker run hello-world

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world

[1Be35b49f5: Pull complete 208kB/3.208kBB[1A[2K[1A[2KDigest: sha256:6e8b6f026e0b9c419ea0fd02d3905dd0952ad1feea67543f525c73a0a790fefb
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https:/

The above message tells the entire process of how the `hello-world` container eventually is able to run on our machine. The image was pulled on [Docker Hub](https://hub.docker.com/) which is a registry of Docker images. Note that the creation of images occurs locally since the local machine is also our compute layer. The container proceeds to run its default command which is to execute the `/hello` program which prints the message on the terminal. The `hello-world` image produces a minimal container whose sole purpose is to print this message.

```{figure} diagrams/00-dockerhub.svg
---
name: dockerhub
---
Running `hello-world` on our local machine. 
```

It would be significantly faster to run this image a second time since Docker uses a **cache** of it. This makes sense since multiple containers are usually created from the same image. The architecture of an image and a container is shown on the following diagram.

```{figure} diagrams/00-helloworld.svg
---
name: helloworld
---
Anatomy of a Docker image and the resulting `hello-world` container in the context of the Linux kernel. Note the specific partition on the hard disk for the filesystem of the image.  
```

Here we see that an **image** is essentially a filesystem snapshot with startup commands. This can be thought of as a read-only template which provides the daemon a set of instructions for creating a container. A **container** on the other hand is a running process in the machine in the Linux VM with partitioned hardware resources allocated by the kernel.

### Container isolation

As mentioned, containers have **isolated filesystems** by default. This means we can blow up a container and just create a fresh healthy container from the same image. This also ensures that our running processes will not affect the host computer which can be running other important processes.

```bash
$ docker run -it ubuntu
root@0d8680f49620:/# ls
bin   dev  home  media  opt   root  sbin  sys  usr
boot  etc  lib   mnt    proc  run   srv   tmp  var
root@0d8680f49620:/# rm -rf bin/ls
root@0d8680f49620:/# ls
bash: /usr/bin/ls: No such file or directory
```

Creating a fresh container that can run `ls`. Note that the container ID is different:

```bash
$ docker run -it ubuntu
root@b1230895737b:/# ls
bin   dev  home  media  opt   root  sbin  sys  usr
boot  etc  lib   mnt    proc  run   srv   tmp  var
root@b1230895737b:/#
```

## Manipulating containers using the Docker Client

In this section, we take a deeper look at commands available in the Docker client. First, we run an `ubuntu` image which is quite a bit more complex than `hello-world`. This runs `bash` by default then exits, which makes it look like nothing happens after the image is pulled from Docker Hub:

In [3]:
!docker run ubuntu

Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu

[1BDigest: sha256:9a0bdde4188b896a372804be2384015e90e3f84906b750c1a53539b585fbbe7f[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K
Status: Downloaded newer image for ubuntu:latest


### Overriding startup command

Instead, we can override the default command using some other command such as `ls`. Note that this command works because `ls` is a program that exists in the `ubuntu` image. The familiar program `echo` also exists, so we can also test that.

In [4]:
!docker run ubuntu ls -C

bin   dev  home  media	opt   root  sbin  sys  usr
boot  etc  lib	 mnt	proc  run   srv   tmp  var


In [5]:
!docker run ubuntu echo 'hello, world!'

hello, world!


```{figure} diagrams/00-ubuntu.svg
---
name: ubuntu
---
Overriding the default command of `ubuntu`. Running `ls` instead of `bash`. 
```

To list all running containers we can use the following command. This will be very useful for determining the ID of a running container when we want to issue commands on specific containers.

In [6]:
!docker ps

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


All our previous containers immediately exit after running so that there is no running container, to see all containers we use the `--all` flag. Note that we have multiple containers (distinct IDs) for the same `ubuntu` image when we called run twice. This is discussed further below. Also, the [exit status](https://docs.docker.com/engine/reference/run/#exit-status) are all zero which means no errors were encountered:

In [7]:
!docker ps --all

CONTAINER ID   IMAGE         COMMAND                  CREATED          STATUS                              PORTS     NAMES
455b29036415   ubuntu        "echo 'hello, world!'"   1 second ago     Exited (0) Less than a second ago             musing_torvalds
55154e481da2   ubuntu        "ls -C"                  1 second ago     Exited (0) Less than a second ago             trusting_germain
58ba2877d113   ubuntu        "/bin/bash"              2 seconds ago    Exited (0) 1 second ago                       cranky_merkle
b6103fa1567e   hello-world   "/hello"                 16 seconds ago   Exited (0) 15 seconds ago                     hopeful_cerf


### Container life cycle

From above we saw that running the same image twice resulted in two distinct containers. Docker run is actually identical to two separate processes: **create** and **start**. Creating an image sets up its filesystem while starting executes its default startup command. It follows that to start a container we have to point to a specific container ID:

In [10]:
!docker start -a 455b29036415

hello, world!


The `-a` flag is to attach the container to the terminal so we can view its output. Alternatively, we can use `docker logs <id>` to see all the logs from the container. We can see that the restarted container exited more recently:

In [11]:
!docker ps --all

CONTAINER ID   IMAGE         COMMAND                  CREATED              STATUS                          PORTS     NAMES
455b29036415   ubuntu        "echo 'hello, world!'"   55 seconds ago       Exited (0) 1 second ago                   musing_torvalds
55154e481da2   ubuntu        "ls -C"                  55 seconds ago       Exited (0) 54 seconds ago                 trusting_germain
58ba2877d113   ubuntu        "/bin/bash"              56 seconds ago       Exited (0) 55 seconds ago                 cranky_merkle
b6103fa1567e   hello-world   "/hello"                 About a minute ago   Exited (0) About a minute ago             hopeful_cerf


As mentioned, we can create a new container without starting it. Below we create an `ubuntu` container with a modified startup command. Note that we cannot modify the startup command of a created container. This is important: startup command can only overridden at container creation. The status of a newly created container is `Created` instead of the usual `Exited`:

In [12]:
!docker create ubuntu echo 'hi, there'

23bc0fa3195160671f7104bc3a5c5ca91a9f32f1865a270306c88eedb437c688


In [13]:
!docker ps --all

CONTAINER ID   IMAGE         COMMAND                  CREATED              STATUS                          PORTS     NAMES
23bc0fa31951   ubuntu        "echo 'hi, there'"       6 seconds ago        Created                                   clever_margulis
455b29036415   ubuntu        "echo 'hello, world!'"   About a minute ago   Exited (0) 18 seconds ago                 musing_torvalds
55154e481da2   ubuntu        "ls -C"                  About a minute ago   Exited (0) About a minute ago             trusting_germain
58ba2877d113   ubuntu        "/bin/bash"              About a minute ago   Exited (0) About a minute ago             cranky_merkle
b6103fa1567e   hello-world   "/hello"                 About a minute ago   Exited (0) About a minute ago             hopeful_cerf


### Stopping containers

Let us create a container that runs for a long time (e.g. forever). Here we use `busybox` which combines tiny versions of many common UNIX utilities into a single small executable. Running `busybox` on our machine only used about 4 MB compared to 60+ MB for the `ubuntu` image. Also, the `ping` command does not exist on `ubuntu`. 

In [15]:
!docker create busybox ping google.com

Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox

[1BDigest: sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K
Status: Downloaded newer image for busybox:latest
c8ed44e2c4e82464b2a49a597f85546d5eb10476b563aa7c9c88edd48c79a8fe


In [16]:
!docker start c8ed44e2c4e8

c8ed44e2c4e8


Notice the up status:

In [17]:
!docker ps

CONTAINER ID   IMAGE     COMMAND             CREATED          STATUS         PORTS     NAMES
c8ed44e2c4e8   busybox   "ping google.com"   12 seconds ago   Up 4 seconds             friendly_wilson


To stop containers we can use either `stop` or `kill`. The `stop` command sends a SIGSTOP to the running process. This gives 10 seconds for cleanup, then a fallback SIGKILL is sent to immediately terminate the process. Indeed, the stop command below takes 10.5 seconds while the kill command takes less than a second.

In [18]:
!docker stop c8ed44e2c4e82

c8ed44e2c4e82


For some reason this gets nonzero exit status:

In [19]:
!docker ps --all --filter id=c8ed44e2c4e82

CONTAINER ID   IMAGE     COMMAND             CREATED          STATUS                                PORTS     NAMES
c8ed44e2c4e8   busybox   "ping google.com"   31 seconds ago   Exited (137) Less than a second ago             friendly_wilson


### Interacting with containers

In this section, we will use the [Redis](https://redis.io/) services from the `redis` image. This is an in-memory data store that is useful as a database and cache engine. Like `ubuntu` this container has multiple programs installed. In particular, there are two main commands that are interesting for us `redis-server` and `redis-cli`. Here we run `redis` on the background using the detach flag `-d`:

In [20]:
!docker run -d redis

Unable to find image 'redis:latest' locally
latest: Pulling from library/redis

[1Badb3a4ab: Pulling fs layer 
[1Bd00da4bd: Pulling fs layer 
[1B1d284940: Pulling fs layer 
[1B7ed7779d: Pulling fs layer 
[1B9c3f82f2: Pulling fs layer 
[1BDigest: sha256:6a59f1cbb8d28ac484176d52c473494859a512ddba3ea62a547258cf16c9b3ae[4A[2K[4A[2K[4A[2K[6A[2K[4A[2K[6A[2K[3A[2K[6A[2K[3A[2K[3A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[3A[2K[2A[2KDownload complete [3A[2K[3A[2K[6A[2K[3A[2K[3A[2K[3A[2K[6A[2K[3A[2K[3A[2K[6A[2K[6A[2K[3A[2K[6A[2K[3A[2K[3A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[3A[2K[6A[2K[3A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[3A[2K[6A[2K[3A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A

From the logs, we see that the redis server is running:

In [21]:
!docker logs 9ae89bda2c27f

1:C 27 Feb 2023 19:52:11.028 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 27 Feb 2023 19:52:11.028 # Redis version=7.0.8, bits=64, commit=00000000, modified=0, pid=1, just started
1:M 27 Feb 2023 19:52:11.029 * monotonic clock: POSIX clock_gettime
1:M 27 Feb 2023 19:52:11.029 * Running mode=standalone, port=6379.
1:M 27 Feb 2023 19:52:11.029 # Server initialized
1:M 27 Feb 2023 19:52:11.031 * Ready to accept connections


Since the server is isolated inside the container, we need to start the client inside the container to access it. This can be done using `exec` with the `-it` interactive flag since we want to maintain control over the client using our terminal.

In [23]:
!docker exec -it 9ae89bda2c27f redis-cli

127.0.0.1:6379> 


```{figure} diagrams/00-interactive.svg
---
width: 80%
name: exec
---
Executing a command from our terminal to a running process inside the container. Each running Linux process in a container has STDIN, STDOUT, and STDERR channels. These channels connect with the terminal during interactive mode.
```

**Remark.** Note that we can also `docker run` with an `-it` flag, e.g. with `bash` or `sh` to access the shell. Although this means that the default startup command will not run. This can be useful for exploring the default filesystem without the startup command running.

## Building custom images

Throughout the above examples we have been using public images created by other engineers and pushed to Docker Hub. We want to know how to create our own images so that we can run our own applications inside custom containers. This also means that our custom images can be uploaded to an image registry (e.g. Docker Hub) which our services can then pull to run our applications. This can be done using a **Dockerfile** which can be thought of as container config.

```{figure} diagrams/00-images.svg
---
width: 80%
name: images
---
Existing images from Docker Hub.
```

### Dockerfile

Dockerfiles start with specifying a **base image**. This makes sense for the level of abstraction that we are working in. To demonstrate this, we create a Dockerfile for a minimal container that runs the `redis-server`:

In [30]:
!tree redis-image

[01;34mredis-image[0m
└── [00mDockerfile[0m

0 directories, 1 file


```{figure} diagrams/00-dockerfile.svg
---
name: dockerfile
---
Dockerfiles have three main parts which define container behavior.
```

We use [`alpine`](https://hub.docker.com/_/alpine) as base image which is based on [Alpine Linux](https://www.alpinelinux.org/). For our purposes, we choose this as a minimal image (only 5 MB!) that come with a sufficiently useful preinstalled set of programs.

In [31]:
!cat redis-image/Dockerfile

FROM alpine

RUN apk add --update redis

CMD ["redis-server"]


### Docker build process

Docker [build](https://docs.docker.com/engine/reference/commandline/build) simulates actual sequential installation steps starting from the base image. First, a container runs from the base image with startup given by the first `RUN` command. A snapshot is taken of the resulting FS to get an intermediate image that is used as the base of the next `RUN` command. This is repeated sequentially with each `RUN` command. Finally, the last intermediate container is created with the startup command changed to that in `CMD`. A snapshot of this container's filesystem is taken to get the final image which completes the build process. Note that the [actual build process](https://www.youtube.com/watch?v=ImgeeEnnS-w) based on [BuildKit](https://docs.docker.com/build/buildkit/) is a bit more complex and uses DAGs to make the build process concurrent and cache-efficient.

**Remark.** Note that we set `DOCKER_BUILDKIT=0` since the BuildKit does not expose the intermediate containers. Moreover, all intermediate containers are shut down and removed unless we set `--rm=false`.

In [48]:
!DOCKER_BUILDKIT=0 docker build redis-image --rm=false

Sending build context to Docker daemon  2.048kB
Step 1/3 : FROM alpine
latest: Pulling from library/alpine

[1Baf76a39c: Already exists Digest: sha256:69665d02cb32192e52e07644d76bc6f25abeb5410edc1c7a81a10ba3f0efb90a
Status: Downloaded newer image for alpine:latest
 ---> d74e625d9115
Step 2/3 : RUN apk add --update redis
 ---> Running in 7f486acb1990
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/main/aarch64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/community/aarch64/APKINDEX.tar.gz
(1/1) Installing redis (7.0.8-r0)
Executing redis-7.0.8-r0.pre-install
Executing redis-7.0.8-r0.post-install
Executing busybox-1.35.0-r29.trigger
OK: 11 MiB in 16 packages
 ---> e1cec61c476f
Step 3/3 : CMD ["redis-server"]
 ---> Running in 8c3061228392
 ---> 6144cc6393d1
Successfully built 6144cc6393d1


From the build logs we see that container `7f486acb1990` installs redis based on the `alpine` base image `d74e625d9115`. The final container is `8c3061228392` with `redis-server` command resulting in the built image `6144cc6393d1`. In practice, we do not set `DOCKER_BUILDKIT=0` as BuildKit significantly improves build performance and avoids the side effect of creating intermediate images and containers. Indeed, the two intermediate containers were created with the said commands:

In [49]:
!docker ps --all --no-trunc 

CONTAINER ID                                                       IMAGE                                                                     COMMAND                                           CREATED          STATUS                     PORTS     NAMES
8c3061228392860c176f9fb026a1dd153745c838cb178e33d520c731701fa74d   sha256:e1cec61c476fbe276662d0f45953821898790fd3f0b0d491f0b887fc4007e1fe   "/bin/sh -c '#(nop) ' 'CMD [\"redis-server\"]'"   10 seconds ago   Created                              funny_albattani
7f486acb1990e43392ad567b82aa237461b16b3b9514d107a700ed7a9a0e08d8   sha256:d74e625d91152966d38fe8a62c60daadb96d4b94c1a366de01fab5f334806239   "/bin/sh -c 'apk add --update redis'"             13 seconds ago   Exited (0) 9 seconds ago             silly_stonebraker


<br>

```{figure} diagrams/00-imagebuild.svg
---
width: 80%
name: imagebuild
---
Building the Dockerfile sequentially.
```

<br>

To see the built image:

In [50]:
!docker image ls

REPOSITORY   TAG       IMAGE ID       CREATED              SIZE
<none>       <none>    6144cc6393d1   About a minute ago   13.4MB
alpine       latest    d74e625d9115   2 weeks ago          7.46MB


This can be tagged to get a more descriptive image. Note that this is the same image:

In [54]:
!DOCKER_BUILDKIT=0 docker build -t okt/redis:latest redis-image --quiet

sha256:6144cc6393d161f26bb341623316085478c29379dfd7ef47434261c6e5690d9b


In [55]:
!docker image ls

REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
okt/redis    latest    6144cc6393d1   5 minutes ago   13.4MB
alpine       latest    d74e625d9115   2 weeks ago     7.46MB


Running now the container server from the background. Note that this automatically runs the latest version. We will access the redis client on this container using the `exec` command:

In [56]:
!docker run -d okt/redis

46942562be022b6577e0cb3fd9ec7972bf87b0ee694dafed993db668a6494b3f


In [57]:
!docker ps

CONTAINER ID   IMAGE       COMMAND          CREATED         STATUS         PORTS     NAMES
46942562be02   okt/redis   "redis-server"   3 seconds ago   Up 2 seconds             sharp_spence


In [58]:
!docker exec 46942562be02 redis-cli set hello world

OK


In [59]:
!docker exec 46942562be02 redis-cli get hello

world


### Appendix: Vulnerabilities

The `okt/redis` image that we created runs a minimal container that runs the redis server. This is smaller and hence more secure than the [official redis image](https://hub.docker.com/_/redis). Indeed, we can check this by using the image vulnerabilities feature of the Desktop UI:

```{figure} diagrams/00-redisimages.png
---
name: redisimages
---
```

```{figure} diagrams/00-oktredisvul.png
---
name: oktredisvul
---
```

```{figure} diagrams/00-offredisvul.png
---
name: offredisvul
---
```

Turns out this redis image is based on `debian`. Instead, we can pull the version based on `alpine` to get a significantly smaller and more secure image:

```{figure} diagrams/00-offredisalpinevul.png
---
name: offredisalpinevul
---
```