(dk/00-containers)=
# Docker Containers

![Status](https://img.shields.io/static/v1.svg?label=Status&message=Finished&color=brightgreen)
[![Source](https://img.shields.io/static/v1.svg?label=GitHub&message=Source&color=181717&logo=GitHub)](https://github.com/particle1331/ok-transformer/blob/master/docs/nb/notes/dk/00-containers.ipynb)
[![Stars](https://img.shields.io/github/stars/particle1331/ok-transformer?style=social)](https://github.com/particle1331/ok-transformer)

---

**References:** https://github.com/StephenGrider/DockerCasts

## Introduction

Containerization solves the problem of running applications having multiple dependencies on the same machne, as well as having running applications consistently across different machines. This along with reproducibility is a central concern in deploying machine learning systems. This is done by means of **images** and **containers**. Images can be thought of as templates specifying installation steps which are built by the host machine to get actual running containers. Images can then be pushed onto a registry which other machines can pull from (e.g. during an update {numref}`docker-envs`). This ensures no configuration drift between the rebuilt containers. Finally, applications run inside isolated containers. [Docker](https://www.docker.com/) provides an entire ecosystem for efficiently working with containers.


```{figure} diagrams/03-docker.png
---
width: 600px
name: docker-envs
---
Updating a new dependency version and shipping it to other environments. Source: [KodeKloud](https://kodekloud.com/wp-content/uploads/2023/03/Kubernetes-Crash-Course-For-PDF-1.pdf)
```

### OS Kernel

Docker sets up a Linux **virtual machine** (VM) inside the host computer. This can be seen below where the OS of the running Docker server is shown. Here we have [Docker Desktop](https://www.docker.com/products/docker-desktop/) running on the background on a macOS laptop.

In [28]:
!docker version

Client:
 Cloud integration: v1.0.31
 Version:           20.10.23
 API version:       1.41
 Go version:        go1.18.10
 Git commit:        7155243
 Built:             Thu Jan 19 17:35:19 2023
 OS/Arch:           darwin/arm64
 Context:           desktop-linux
 Experimental:      true

Server: Docker Desktop 4.17.0 (99724)
 Engine:
  Version:          20.10.23
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.10
  Git commit:       6051f14
  Built:            Thu Jan 19 17:31:28 2023
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.18
  GitCommit:        2456e983eb9e37e47538f59ea18f2043c9a73640
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0


Docker is able to isolate resources using **namespaces** and set usage and prioritization limits using **control groups** which are features of the **Linux kernel**. The kernel acts as an intermediate layer between running processes and hardware through system calls.  See this [blog post](https://www.nginx.com/blog/what-are-namespaces-cgroups-how-do-they-work/) for more details.

```{figure} diagrams/00-osarch.svg
---
width: 80%
name: osarch
---
OS architecture.
```

### Hello world

The following example demonstrates building and running a container:

In [29]:
!docker run hello-world

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world

[1BDigest: sha256:fc6cf906cbfa013e80938cdf0bb199fbdbb86d6e3e013783e5a766f50f5dbce0
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:


The above message tells the entire process of how the `hello-world` container eventually is able to run on our machine. The image was pulled on [Docker Hub](https://hub.docker.com/) which is a registry of Docker images. Note that the creation of images occurs locally since the local machine is also our compute layer. 

The container proceeds to run its default command which is to execute the `/hello` program which prints the message on the terminal. The `hello-world` image produces a minimal container whose sole purpose is to print this message.

```{figure} diagrams/00-dockerhub.svg
---
name: dockerhub
---
Running `hello-world` on our local machine. 
```

It would be significantly faster to run this image a second time since Docker uses a **cache** of it. This makes sense since multiple containers are usually created from the same image. The architecture of an image and a container is shown on the following diagram.

```{figure} diagrams/00-helloworld.svg
---
name: helloworld
---
Anatomy of a Docker image and the resulting `hello-world` container in the context of the Linux kernel. Note the specific partition on the hard disk for the filesystem of the image.  
```

Here we see that an **image** is essentially a filesystem snapshot with startup commands. This can be thought of as a read-only template which provides the daemon a set of instructions for creating a container. A **container** on the other hand is a running process in the machine in the Linux VM with partitioned hardware resources allocated by the kernel.

### Container isolation

As mentioned, containers have **isolated filesystems** by default. This means we can blow up a container and just create a fresh healthy container from the same image. This also ensures that our running processes will not affect the host computer which can be running other important processes.

```bash
$ docker run -it ubuntu
root@0d8680f49620:/# ls
bin   dev  home  media  opt   root  sbin  sys  usr
boot  etc  lib   mnt    proc  run   srv   tmp  var
root@0d8680f49620:/# rm -rf bin/ls
root@0d8680f49620:/# ls
bash: /usr/bin/ls: No such file or directory
```

Creating a fresh container that can run `ls`. Note that the container ID is different:

```bash
$ docker run -it ubuntu
root@b1230895737b:/# ls
bin   dev  home  media  opt   root  sbin  sys  usr
boot  etc  lib   mnt    proc  run   srv   tmp  var
root@b1230895737b:/#
```

## Docker Client

In this section, we look at commands available in the Docker client for working with containers. We will use the `ubuntu` image which is an order of magnitude more complex than `hello-world`. This runs `bash` by default then exits, so it looks like nothing happened after the image is pulled from Docker Hub:

In [3]:
!docker run ubuntu

Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu

[1BDigest: sha256:9a0bdde4188b896a372804be2384015e90e3f84906b750c1a53539b585fbbe7f[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K
Status: Downloaded newer image for ubuntu:latest


```{figure} diagrams/00-lifecycle.png
---
name: lifecycle
---
Complete Docker container lifecycle. [[source](https://docker-saigon.github.io/post/Docker-Internals/)]

```

### Startup command

Instead, we can override the default command using some other command such as `ls`. Note that this command works because `ls` is a program that exists in the `ubuntu` image. The familiar program `echo` also exists.

In [4]:
!docker run ubuntu ls -C

bin   dev  home  media	opt   root  sbin  sys  usr
boot  etc  lib	 mnt	proc  run   srv   tmp  var


In [5]:
!docker run ubuntu echo 'hello, world!'

hello, world!


```{figure} diagrams/00-ubuntu.svg
---
width: 80%
name: ubuntu
---
Overriding the default command of `ubuntu`. Running `ls` instead of `bash`. 
```

To list all running containers we can use the following command. This will be very useful for determining the ID of a running container when we want to issue commands on specific containers.

In [6]:
!docker ps

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


All our previous containers immediately exit after running so that there is no running container, to see all containers we use the `--all` flag. Note that we have multiple containers (distinct IDs) for the same `ubuntu` image when we called run twice. This is discussed further below. Also, the [exit status](https://docs.docker.com/engine/reference/run/#exit-status) are all zero which means no errors were encountered:

In [7]:
!docker ps --all

CONTAINER ID   IMAGE         COMMAND                  CREATED          STATUS                              PORTS     NAMES
455b29036415   ubuntu        "echo 'hello, world!'"   1 second ago     Exited (0) Less than a second ago             musing_torvalds
55154e481da2   ubuntu        "ls -C"                  1 second ago     Exited (0) Less than a second ago             trusting_germain
58ba2877d113   ubuntu        "/bin/bash"              2 seconds ago    Exited (0) 1 second ago                       cranky_merkle
b6103fa1567e   hello-world   "/hello"                 16 seconds ago   Exited (0) 15 seconds ago                     hopeful_cerf


### Start and create

We saw that running the same image twice resulted in two distinct containers. Docker run is actually identical to two separate processes: **create** and **start**. Creating an image sets up its filesystem while starting executes its default startup command. It follows that to start a container we have to point to a specific container ID:

In [10]:
!docker start -a 455b29036415

hello, world!


The `-a` flag is to attach the container to the terminal so we can view its output. Alternatively, we can use `docker logs <id>` to see all the logs from the container. We can see that the restarted container exited more recently:

In [11]:
!docker ps --all

CONTAINER ID   IMAGE         COMMAND                  CREATED              STATUS                          PORTS     NAMES
455b29036415   ubuntu        "echo 'hello, world!'"   55 seconds ago       Exited (0) 1 second ago                   musing_torvalds
55154e481da2   ubuntu        "ls -C"                  55 seconds ago       Exited (0) 54 seconds ago                 trusting_germain
58ba2877d113   ubuntu        "/bin/bash"              56 seconds ago       Exited (0) 55 seconds ago                 cranky_merkle
b6103fa1567e   hello-world   "/hello"                 About a minute ago   Exited (0) About a minute ago             hopeful_cerf


Note that the startup command of a container can only be overridden at container creation. Hence, we cannot assign a new startup command when starting a container. Below we create an `ubuntu` container. This allocates resources but does not yet execute the startup command:

In [12]:
!docker create ubuntu echo 'hi, there'

23bc0fa3195160671f7104bc3a5c5ca91a9f32f1865a270306c88eedb437c688


In [13]:
!docker ps --all

CONTAINER ID   IMAGE         COMMAND                  CREATED              STATUS                          PORTS     NAMES
23bc0fa31951   ubuntu        "echo 'hi, there'"       6 seconds ago        Created                                   clever_margulis
455b29036415   ubuntu        "echo 'hello, world!'"   About a minute ago   Exited (0) 18 seconds ago                 musing_torvalds
55154e481da2   ubuntu        "ls -C"                  About a minute ago   Exited (0) About a minute ago             trusting_germain
58ba2877d113   ubuntu        "/bin/bash"              About a minute ago   Exited (0) About a minute ago             cranky_merkle
b6103fa1567e   hello-world   "/hello"                 About a minute ago   Exited (0) About a minute ago             hopeful_cerf


### Stop

Let us create a container that runs for a long time (e.g. forever). Here we use `busybox` which combines tiny versions of many common UNIX utilities into a single small executable. Running `busybox` on our machine only used about 4 MB compared to 60+ MB for the `ubuntu` image. Also, the `ping` command does not exist on `ubuntu`. 

In [15]:
!docker create busybox ping google.com

Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox

[1BDigest: sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K
Status: Downloaded newer image for busybox:latest
c8ed44e2c4e82464b2a49a597f85546d5eb10476b563aa7c9c88edd48c79a8fe


In [16]:
!docker start c8ed44e2c4e8

c8ed44e2c4e8


Notice the up status:

In [17]:
!docker ps

CONTAINER ID   IMAGE     COMMAND             CREATED          STATUS         PORTS     NAMES
c8ed44e2c4e8   busybox   "ping google.com"   12 seconds ago   Up 4 seconds             friendly_wilson


To stop containers we can use either `stop` or `kill`. The `stop` command sends a SIGTERM to the running process. This gives 10 seconds for cleanup, then a fallback SIGKILL is sent to immediately terminate the process:

In [18]:
!docker stop c8ed44e2c4e82

c8ed44e2c4e82


For some reason this gets nonzero exit status:

In [19]:
!docker ps --all --filter id=c8ed44e2c4e82

CONTAINER ID   IMAGE     COMMAND             CREATED          STATUS                                PORTS     NAMES
c8ed44e2c4e8   busybox   "ping google.com"   31 seconds ago   Exited (137) Less than a second ago             friendly_wilson


### Interacting with containers

In this section, we will use services from the `redis` image. [Redis](https://redis.io/) is an in-memory data store that is useful as a database and cache engine. Like `ubuntu` this container has multiple programs installed. In particular, there are two main commands that are interesting for us `redis-server` and `redis-cli`. Here we run `redis` on the background using the detach flag `-d`:

In [20]:
!docker run -d redis

Unable to find image 'redis:latest' locally
latest: Pulling from library/redis

[1Badb3a4ab: Pulling fs layer 
[1Bd00da4bd: Pulling fs layer 
[1B1d284940: Pulling fs layer 
[1B7ed7779d: Pulling fs layer 
[1B9c3f82f2: Pulling fs layer 
[1BDigest: sha256:6a59f1cbb8d28ac484176d52c473494859a512ddba3ea62a547258cf16c9b3ae[4A[2K[4A[2K[4A[2K[6A[2K[4A[2K[6A[2K[3A[2K[6A[2K[3A[2K[3A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[3A[2K[2A[2KDownload complete [3A[2K[3A[2K[6A[2K[3A[2K[3A[2K[3A[2K[6A[2K[3A[2K[3A[2K[6A[2K[6A[2K[3A[2K[6A[2K[3A[2K[3A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[3A[2K[6A[2K[3A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[3A[2K[6A[2K[3A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[6A[2K[3A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A[2K[6A

From the logs, we see that the redis server is running:

In [21]:
!docker logs 9ae89bda2c27f

1:C 27 Feb 2023 19:52:11.028 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 27 Feb 2023 19:52:11.028 # Redis version=7.0.8, bits=64, commit=00000000, modified=0, pid=1, just started
1:M 27 Feb 2023 19:52:11.029 * monotonic clock: POSIX clock_gettime
1:M 27 Feb 2023 19:52:11.029 * Running mode=standalone, port=6379.
1:M 27 Feb 2023 19:52:11.029 # Server initialized
1:M 27 Feb 2023 19:52:11.031 * Ready to accept connections


Since the server is isolated inside the container, we need to start the client inside the container to access it. This can be done using `exec` with the `-it` interactive flag since we want to maintain control over the client using our terminal.

In [23]:
!docker exec -it 9ae89bda2c27f redis-cli

127.0.0.1:6379> 


```{figure} diagrams/00-interactive.svg
---
width: 80%
name: exec
---
Executing a command from our terminal to a running process inside the container. Each running Linux process in a container has STDIN, STDOUT, and STDERR channels. These channels connect with the terminal during interactive mode.
```

**Remark.** Note that we can also `docker run` with an `-it` flag, e.g. with `bash` or `sh` to access the shell. Although this means that the default startup command will not run. This can be useful for exploring the default filesystem without the startup command running.

## Docker Build

Throughout the above examples we have been using public images from Docker Hub. 
In this section, we create our own images for running our own containers. Our custom images can be pushed to container
repositories, such as Docker Hub or [ECR](https://aws.amazon.com/ecr/), which our servers can pull 
to run our containers remotely. This is done using a **Dockerfile** which acts as a configuration file.

```{figure} diagrams/00-images.svg
---
width: 80%
name: images
---
Existing images from Docker Hub that we used for examples.
```

### Dockerfile

Dockerfiles start with specifying a **base image**. This makes sense for the level of abstraction that we are working in. To demonstrate this, we create a Dockerfile for a minimal container that runs the `redis-server`:

In [30]:
!tree redis-image

[01;34mredis-image[0m
└── [00mDockerfile[0m

0 directories, 1 file


```{figure} diagrams/00-dockerfile.svg
---
name: dockerfile
---
Dockerfiles have three main step groups for the build process.
```

We use [`alpine`](https://hub.docker.com/_/alpine) as base image which is based on [Alpine Linux](https://www.alpinelinux.org/). For our purposes, we choose this as a minimal image (only 5 MB!) that come with a sufficiently useful preinstalled set of programs.

In [31]:
!cat redis-image/Dockerfile

FROM alpine

RUN apk add --update redis

CMD ["redis-server"]


### Build process

Docker [build](https://docs.docker.com/engine/reference/commandline/build) simulates actual sequential installation steps by following the commands in the Dockerfile starting from the base image specified in `FROM`. Docker builds the image using a **layered** architecture. Each line creates a new layer in the image with just the changes from the previous layer. This makes the build process cache-efficient.

In [14]:
!docker build redis-image -t okt/redis

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.1s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 36B                                        0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m => [internal] load metadata for docker.io/library/alpine:latest           0.0s
[?25h[1A[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/3)                                                         
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 36B                                        0.0s
[0m[34m => [internal] load .dockerignore                           

The built layers can be viewed as follows:

In [15]:
!docker history okt/redis

IMAGE          CREATED         CREATED BY                                      SIZE      COMMENT
c0a93b037d13   8 minutes ago   CMD ["redis-server"]                            0B        buildkit.dockerfile.v0
<missing>      8 minutes ago   RUN /bin/sh -c apk add --update redis # buil…   5.28MB    buildkit.dockerfile.v0
<missing>      2 weeks ago     /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B        
<missing>      2 weeks ago     /bin/sh -c #(nop) ADD file:df7fccc3453b6ec14…   7.73MB    


Note that the size of the layers add up. To see built images:

In [16]:
!docker image ls

REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
okt/redis    latest    c0a93b037d13   10 minutes ago   13MB


(00-containers-cache-busting)=
### Cache busting

During the build process, each [layer](https://docs.docker.com/glossary/#layer) of an image is **cached**. Any change in a layer results in this and subsequent layers to be rebuilt. This means the ordering of layers in a Dockerfile is important, and we want to put expensive steps earlier in the build process when possible to take advantage of caching. For example, the following Dockerfile is not efficient:

```Dockerfile
FROM python:3.9.15-slim

COPY main.py requirements.txt ./
RUN pip install -r requirements.txt

CMD ["python", "main.py"]
```

Updating the `main.py` script here results in reinstalling all packages. Instead, the expensive install layer at the start of the build process, so that only the copy layer and is rebuilt when we refactor.

```Dockerfile
FROM python:3.9.15-slim

COPY requirements.txt ./
RUN pip install -r requirements.txt

COPY main.py ./
CMD ["python", "main.py"]
```

<br>

```{figure} diagrams/00-cache-busting.svg
---
name: cache
---
Busting an expensive cached layer (left). Cache optimized version (right).
```

Note that cached layers [3/4] after modifying `main.py`:

```
$ docker build . -t cache-busting
[+] Building 1.4s (9/9) FINISHED
 => [internal] load build definition from Dockerfile          0.0s
 => => transferring dockerfile: 36B                           0.0s
 => [internal] load .dockerignore                             0.0s
 => => transferring context: 2B                               0.0s
 => [internal] load metadata for docker.io/library/python:3.  1.2s
 => [1/4] FROM docker.io/library/python:3.9.15-slim@sha256:f  0.0s
 => [internal] load build context                             0.0s
 => => transferring context: 97B                              0.0s
 => CACHED [2/4] COPY requirements.txt ./                     0.0s
 => CACHED [3/4] RUN pip install -r requirements.txt          0.0s
 => [4/4] COPY main.py ./                                     0.0s
 => exporting to image                                        0.0s
 => => exporting layers                                       0.0s
 => => writing image sha256:b813245428c1122921fab76c7b13bfff  0.0s
 => => naming to docker.io/library/cache-busting              0.0s
```

**Remark.** See this article for [best practices](https://testdriven.io/blog/docker-best-practices/) with writing Dockerfiles (e.g. `ENTRYPOINT` vs `CMD`).

## Appendix: Vulnerabilities

The `okt/redis` image that we created runs a minimal container that runs the redis server. This is smaller and hence more secure than the [official redis image](https://hub.docker.com/_/redis). Indeed, we can check this by using the image vulnerabilities feature of the Desktop UI:

```{figure} diagrams/00-redisimages.png
---
name: redisimages
---
```

The official redis image is based on `debian`:

```{figure} diagrams/00-offredisvul.png
---
name: offredisvul
---
```

The version based on `alpine` is significantly smaller and more secure:

```{figure} diagrams/00-offredisalpinevul.png
---
name: offredisalpinevul
---
```