# `Industrial Machine Learning on Hadoop and Spark`
## `Seminar 01: Docker intro`

### `Maks Nakhodnov (nakhodnov17@gmail.com)`
#### `Bremen, 2025`

What can be learned from this notebook:
* Principles of portable application development
* Containerization
* Docker

Source of illustrations: [http://pointful.github.io/docker-intro/#/](http://pointful.github.io/docker-intro/#/)

### `What problem are we trying to solve?`

* The diversity of development tools and runtime environments requires complex dependency management
* Setting up the environment for complex projects becomes a difficult task
* Manual management of all components is **non-portable** and **hard to maintain**
* Modifying components in an existing system becomes impossible (**dependency hell**)

![Example](https://pointful.github.io/docker-intro/docker-img/the-challenge.png)

### `Approach to solving the problem`

To solve such a problem, **an abstraction mechanism** is needed that allows working with different types of environments and applications in a similar way.

#### `"Real world" solution`

![An analogy from everyday life](https://pointful.github.io/docker-intro/docker-img/cargo-transport-pre-1960.png)

![Решение](https://pointful.github.io/docker-intro/docker-img/intermodal-shipping-container.png)

#### `Returning to software`

![ISO-Docker-контейнер](https://pointful.github.io/docker-intro/docker-img/shipping-container-for-code.png)

### `Advantages of Docker containers`

* **Encapsulation** of infrastructure components
* **Platform independence** (both hardware and software levels)
* Infrastructure as **code** (versioning, snapshots, reproducibility)
* **Isolation** of code, environments, and runtimes
* **Automation** of routine operations (testing, deployment, CI/CD)
* Simplified creation of **resilient** services
* **Simplified** collaboration between different development teams
* Easier work on **software integration** by third-party developers

And all of this with minimal resource overhead!

### `What's under the hood?`

* Docker uses the **host OS kernel**
* Through **resource isolation**, it allows creating **isolated** groups of processes and file systems

### `Differences from VM / Hypervisor`

- [x] Docker does not launch a separate kernel for each process individually
- [x] Containers start up much faster
- [x] VMs require more resources to run
- [x] Controlling changes and versioning in VMs is difficult
- [x] Containers can share resources with each other (binaries, libraries)
- [x] When an ~~container~~ image is modified, only the difference between the original and the new version is saved

- [ ] VMs provide stronger isolation between each other
- [ ] VMs are slightly easier to use

![](https://pointful.github.io/docker-intro/docker-img/containers-vs-vms.png)

### `Basic Docker infrastructure`

![](https://pointful.github.io/docker-intro/docker-img/basics-of-docker-system.png)

![](https://pointful.github.io/docker-intro/docker-img/changes-and-updates.png)

### `Key Docker concepts`

Docker ~~containers~~ images are stored as a **filesystem**. Only at runtime is a process space created. At the same time, processes run through the **host system kernel**!

<dl>
  <dd>0. Dockerfile — <b>a description of the procedure</b> for creating a container's filesystem</dd>
  <dd>1. Layer (<b>layer</b>) — an atomic set of changes to the <b>filesystem</b></dd>
  <dd>2. Image (<b>image</b>) — <b>RO</b> filesystem</dd>
  <dd>3. Container (<b>container</b>) — an image with a <b>RW</b> layer on top</dd>
  <dd>3*. Running container (<b>running container</b>) — a container (RW filesystem) with <b>process space</b></dd>
</dl>

* A layer is defined by commands in the Dockerfile
* An image consists of layers
* A container is created from an image
* A running container is created from a container

![](https://cdn.buttercms.com/CLQJN3yRRcS7oGqm7yKb)

### `Description of the Docker image creation process`

1. Start with an empty filesystem
2. Sequentially, for each command in the Dockerfile:
   - 1. **Add** a new RW **layer** on top
   - 2. Execute the command and **record the changes** it makes **in the current layer**
   - 3. Make the current layer RO
3. Make the final layer RO
4. Save the resulting set of layers

### `Docker Livecycle`

![](./Docker%20Livecycle.svg)

### `Examples`

#### `1. Simple example: Docker Hub`

```bash
docker pull hello-world
docker image ls
docker run hello-world
docker container ls -a
docker rm "<CONTAINER_ID>"
```

#### `2. Docker-hub. Ubuntu`

* The first container that can be used in practice
* Demonstration of executing commands inside a container (using `ls` as an example)
* Overview of run flags (`--rm`, `-i`, `-t`, `-d`, `-p`, `-v`)

```bash
docker pull ubuntu
docker image ls
docker run ubuntu ls
docker rm "<CONTAINER_ID>"
# Run with automatic removal on exit
docker run --rm ubuntu ls
# Run with connection to a pseudo-terminal
docker run -i -t ubuntu bash
```

#### `3.`
* Demonstration of changes to the temporary filesystem inside the container
* Destruction of the process space

 ```bash
 # Running with the -i -t combination allows detaching from the container using ^P^Q
 docker run -i -t ubuntu bash
 >> echo "Hello World" > ~/test.txt
 >> cat ~/test.txt
 # If the --rm flag was specified, the container will be removed along with its filesystem
 # Without this flag, only the process space is destroyed, while the filesystem remains intact
 >> exit
 docker ps -a
 docker start -i "<CONTAINER_ID>"
 # It can be seen that the filesystem state has not changed
 >> cat ~/test.txt
 ```

#### `4.`
* Demonstration of working with the process space
* Detaching from the container
* Running in the background

```bash
docker run -i -t ubuntu bash
>> apt update && apt install -y tmux
# Create a background application
>> tmux new -s run
>> while true; do echo >> test.txt; sleep 1; done;
>> ^B D
>> exit
docker start -i "<CONTAINER_ID>"
# We see that exiting the container this way indeed destroys the process space
>> tmux ls
# Create a background application again
>> tmux new -s run
>> while true; do echo "1" >> test.txt; sleep 1; done;
>> ^B D
# Detach from the container
>> ^P ^Q
docker start -i "<CONTAINER_ID>"
# We see that the process space remains intact
>> tmux ls
>> exit

# The -d flag can be used to run in the background
docker run --rm -d ubuntu bash -c "while true; do echo '0'; sleep 1; done;"
```

#### `5. Building containers from a Dockerfile`

In [1]:
from IPython.display import Code
Code('./example.py', language='python')

In [2]:
! wsl echo $'1.234\n2.345\n3.4314' > data.txt
! wsl cat data.txt

1.234
2.345
3.4314


In [3]:
Code('./Dockerfile', language='Dockerfile')

```bash
# # Build the container
docker build -t maksim64/test_app .

# Run the container
docker run maksim64/test_app

# Run with environment variables
docker run -e SECRET_KEY=hi maksim64/test_app
    
# Run with directory mounting
docker run -e SECRET_KEY=hi -v "./:/root/data" maksim64/test_app
```

### `Cheat sheet for essential Docker commands`

```bash
# Pull a container from a repository
docker pull container_name
```

```bash
# Run a container
docker run \
    [-d] [-i] [-t] [-p 1234:5000] [-v local_path:container_path] [-w container_working_path] container_name [COMMAND]
# Here
# -d -- run in detached (background) mode
# -i -- interactive mode, allowing input to the container
# -t -- create a pseudo-TTY
# -p -- map container port 5000 to local port 1234
# -v -- mount local file/folder local_path into the container at container_path
# -w -- set the working directory inside the container
# container_name -- name of the container
# COMMAND -- command to execute inside the container

# Examples:
# docker run hello-world       # runs the default command inside the hello-world container
# docker run -i -t ubuntu bash # runs a bash shell inside the ubuntu container, 
#                                with a pseudo-TTY (-t) and interactive input (-i)
```

```bash
# View the list of running [or previously run] containers
docker ps [-a]
```

```bash
# View the list of processes inside a running container
docker top CONTAINER_ID
```

```bash
# Stop a container (i.e., send it a SIGTERM and allow it time to shut down gracefully)
docker stop CONTAINER_ID
```

```bash
# Kill a container (i.e., send it a SIGKILL and terminate immediately)
docker kill CONTAINER_ID
```

```bash
# Remove a container (-f to force-remove a running one)
docker rm [-f] CONTAINER_ID
```

```bash
# Stop all containers and remove them
docker stop $(docker ps -a -q) && docker rm $(docker ps -a -q)
```

```bash
# Create an image in the repository repo_name with the name image_name and tag image_tag (usually latest)
docker commit -m "message" CONTAINER_ID repo_name/image_name:image_tag
```

```bash
# View the list of images [including intermediate images]
docker images [-a]
```

```bash
# Remove an image
docker image rm IMAGE_ID
```

```bash
# Build an image from a Dockerfile (if a Dockerfile is in the local directory)
docker build [--no-cache] -t repo_name/image_name:image_tag .
```

```bash
# Push an image to a public repository on hub.docker.com (the repo_name must match the name of your repository)
docker push repo_name/image_name:image_tag
```