<center><h1><b>DOCKER</b></h1></center>
Docker is a software that use OS-level virtualization to deliver software in packages called containers. The software that hosts the containers is called Docker Engine. Docker is a tool that is used to automate the deployment of applications in lightweight containers so that applications can work efficiently in different environments in isolation. 

Installation guide [here](https://docs.docker.com/get-docker/). Write `docker version` on the shell to know if docker is correctly installed and which version do you have.

---

## 1. VIRTUAL MACHINES vs CONTAINERS

#### VIRTUAL MACHINES
Virtual Machines (VMs) are an abstraction of the physical hardware of a computer system, each with its own fraction of memory, disk, CPU resources, its own OS, and running its own application(s). Virtualization is provided by a named Hypervisor. There are two types of Hypervisors: type 1, built directly on to the hardware; type 2, like VirtualBox, built over an already existing OS.

Each running VM includes a full copy of the guest OS (Win/Linux/…), all the necessary binaries and libraries, and ﬁnally the application you want to run. All this might take up to tens of GBs, and makes VMs also quite slow to start (boot). VMs provide full process isolation for applications: the software running in the guest operating system does not interfere with the host OS, and vice-versa.

#### CONTAINERS
Containers are instead “software packages” that include the code, libraries and all dependencies required to run your applications, without the need to bring along a Guest OS as in VMs. Containerization is still a type of virtualization that allows to run applications independently and in complete isolation, but it is more efﬁcient than Virtual Machines because they all share the Host OS kernel. The resulting “software packages” (the containers) are lightweight compared to VMs, faster to run, and using much less resources.

**Docker is a containerization platform that offers a way to create and run containers.**

Remember, containers are isolated environments:
- Host’s (your computer) ﬁles aren’t visible inside the container
- When the container is deleted, data created inside the container will be lost
- The container by default don’t accept connections over any port (e.g. the 8888)

---

## 2. IMAGES AND CONTAINERS

To create a docker container where we can do what we want, we have first to create a **Docker image**, i.e. a model used as a blueprint to generate containers. To create the image, we have to write a **Dockerfile**, i.e. a text file with all the istructions for Docker on how to build the image: which software/packages to include, how to run it etc. When we have built the image, we can create as many containers as we want of that image. We can find already built images on [Docker Hub](https://hub.docker.com/).

Main commands for images:
* `docker pull ubuntu:noble` : To pull an image (e.g. Ubuntu version 24.04, named 'noble') from docker hub
* `docker image ls` or `docker images`: To list all existing images
* `docker image rm <imageID>` : To delete a local image
* `docker image prune` : To remove ALL unused images

Main commands for containers:
* `docker ps -a` : You can list all the containers currently running or (with '-a' = all) also exited. Nota: 'ps' = process status
* `docker attach <container-name>` : you can attach to a running container, i.e. enter that container with your bash. Then with *docker detach* you undo it (also with ctrl+c)
* `docker exec -it <container-name> /bin/bash` : create a new shell inside the container
* `docker start/restart/stop/rm <container-name>` : containers are applications that could be started/restarted/stopped/removed
* `docker run --rm -i -t -d --name myubuntu ubuntu:noble /bin/bash` : To create and launch a container from an existing image (ex. Ubuntu)
    * *--name myubuntu* gives the name myubuntu to the container
    * */bin/bash* is a (optional!) command that will be run inside the container
    * *--rm* speciﬁes Docker to remove the container once stopped executing
    * *-i* speciﬁes that the command is Interactive (it starts the bash shell)
    * *-t* speciﬁes the allocation of a Terminal
    * *-d* instructs the container to run in the background (Detached)

Connection pc-container: as option in `docker run` we can also specify shared volumes/port with our container
* `-v $PWD/test_volumes:/mnt` : We create shared Volumes to share data between our pc and the container. The syntax is *-v MY_PC_PATH : CONTAINER_PATH*
* `-p 1234:8888` : We open Ports to allow communication between the host and the container. The syntax is *PORT_IN_YOUR_COMPUTER : PORT_INSIDE_THE_CONTAINER*. (memo: port 8888 is used for jupyter notebook)

#### DOCKER HUB:
Images can be pushed to Dockerhub to be stored and shared:
1. Create a repository on Dockerhub, and login from your machine with:  `docker login`
2. Tag the image with your username, a repository name and the image name:  `docker image tag my_image username/repo-name:image_name`
3. Push the image to the remote repository:  `docker push username/repo-name:image_name`

---

## 3. WRITING A DOCKERFILE

A Dockerfile is a text file that contains a set of instructions used to build a Docker image. It defines the steps for creating an image, specifying things like: the base image to use (e.g., an official Ubuntu image, or a specific version of a language runtime), any commands to run inside the container (e.g., installing software, copying files, setting environment variables), how the container should behave (e.g., specifying which command to run when the container starts).

Key parts of a Dockerfile:

    FROM: specifies the base image for the container.
    WORKDIR: sets the working directory for any following instructions.
    ENV: sets one or more environment variables inside the container. 
    RUN: executes commands inside the container, like installing packages or copying files.
    EXPOSE: informs Docker that the container will listen on a specific network port (like 8888) at runtime.
    COPY or ADD: copies files or directories from the host into the container.
    CMD: specifies the command that should run when the container starts.

To build the image starting from the Dockerﬁle, write: `docker build --tag mapd_notebook -f my_dockerfile.dockerfile .` , this creates an image from the custom dockerﬁle my_dockerfile.dockerfile. The options are:
* *--tag* tags/assign the image with name mapd_notebook
* *-f* speciﬁes the Dockerﬁle
* *.* speciﬁes that the context of the image is the current directory

Once built, the image appears in the list of available images *docker image ls* and can be used to run a container with *docker run --rm -i -t (-d) --name my_container mapd_notebook*.

Example: 
```
FROM python:3.13.2-slim

WORKDIR /mapd-workspace

ENV PIP_DEFAULT_TIMEOUT=100 \
    PYTHONUNBUFFERED=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1 \
    PIP_NO_CACHE_DIR=1

RUN pip install notebook \
    matplotlib \
    SQLAlchemy==2.0.38 \
    ipython-sql==0.5.0 \
    mysql-connector-python==9.2.0 \
    pandas

EXPOSE 8888

CMD jupyter notebook \
    --ip=0.0.0.0 \
    --port=8888 \
    --no-browser \
    --allow-root \
    --NotebookApp.token=
```

---

## 4. DOCKER-COMPOSE
Docker-compose is a Docker tool that can be used to manage running multiple containers. A single Docker-compose `.yaml` ﬁle is used to deﬁne all running services (containers), as well as volumes, networks, etc. The commands are simple:
- `docker compose up` : to start all services described by a Docker-compose ﬁle.
- `docker compose down` : to stop and remove all resources instantiated by Docker-compose.

Key parts of a .yaml file:
* *version*: Specifies the Docker Compose file format version.
* *services*: Defines the containers that will be created. Each service represents a container.
* *image*: Specifies the Docker image used for a service.
* *environment*: Defines environment variables for the container.
* *volumes*: Mounts directories between the host and the container.
* *command*: Overrides the default command of the container.
* *depends_on*: Specifies service dependencies, ensuring that one service starts after another.
* *platform*: Specifies the target architecture for the container.
* *ports*: Maps ports from the host to the container.

Example with two services:
```
version: '3.9'
services:
    db:
        image: mysql:9.2.0
        environment:
            MYSQL_USER: "my_user"
            MYSQL_PASSWORD: "my_pwd"
            MYSQL_ROOT_PASSWORD: "root_pwd"
        volumes:
            - $PWD:/mapd-workspace
        command: --secure_file_priv="/mapd-workspace"
    jupyter:
        depends_on:
                - db
        image: mapd_notebook
        platform: linux/amd64
        ports:
            - 1234:8888
        volumes:
            - $PWD:/mapd-workspace
```

---

## 5. RESOURCE USAGE
Docker will use your computer resources to run containers. You can check the resource usage and (most important) free up some of them with the following commands:
* `docker system df` : check the disk used by Docker
* `docker stats` : live monitor the computing resource used by running docker containers (similar to top)
* `docker container prune` : reclaim resources by removing all stopped containers
* `docker image prune` : reclaim resources by removing all dangling images (older builds and currently unused)
* `docker system prune` : reclaim resources by removing stopped containers, images and volumes
* `docker system prune -a` : reclaim all system resources by removing all(!) containers, images and volumes