# Simulate a cluster
By its nature, testing and experimenting with distributed software requires multiple machines. Luckily, we are living in the age of virtual machines and containers - we can simulate as many machines we like by using various vm/container technologies. Let's try docker

### Docker
In this section, we will use Docker to simulatre a cluster. Please note that this is not a Docker tutorial, which is why we will avoid using `docker compose`, `docker swarm` or even `Dockerfile`. We are pretending that these are full-fledged machines and a specific vm/container technology. 

First, let's make sure we have docker installed. If not, please install Docker Desktop: https://www.docker.com/products/docker-desktop/

In [None]:
!docker --version

#### Pull an image

We will set up a cluster of 5 machines. Le's use the official Python docker image: https://hub.docker.com/_/python
(may take around 5 minutes)

In [None]:
%%time
!docker pull continuumio/miniconda3

#### Set up a network

This will allow our "machines" to talk to each other

In [None]:
!docker network create simulated-cluster

#### Start a machine which belongs to our simulated cluster

Note that you can get help for docker flags via `!docker run --help`

Here are the flags we are using below:
- `run` This will run an instance of the image we downloaded earlier. Think of the image download as the CD which contains the operating sysstem and `docker run` as us setting up a physical machine to run the operating system
- `-d` This will run the image in the background, so it doesn't take over JupyterHUB (similar to `python ./app.py &`)
- `-i` This will run the image in an interactive manner - so we can connect to it
- `-t` This will run the image via the terminal
- `--rm` remove the image once we are done with it

In [None]:
!docker run -dit --rm --network simulated-cluster --name node1 continuumio/miniconda3

Once a container is running, you can "ssh" into it via the exec command. This will not work from Jupyter, you should do it from the command line (terminal)

`docker exec -it node1 bash`

Check docker

In [None]:
!docker ps

Run a few more machines

In [None]:
!docker run -dit --rm --network simulated-cluster --name node2 continuumio/miniconda3
!docker run -dit --rm --network simulated-cluster --name node3 continuumio/miniconda3

In [None]:
!docker ps

#### Install required software in those machines
Notice that these machines don't even have the ping command installed. Let's install it to confirm these machines can talk to each other

In [None]:
!docker exec node1 ping -c 3cnn.com

Even `ping` is not installed??

In [None]:
!docker exec node1 apt-get update 
!docker exec node1 apt-get install -y iputils-ping iproute2

In [None]:
!docker exec node1 ping -c 3 cnn.com

### Can these machines see each other?

In [None]:
!docker exec node2 hostname

In [None]:
!docker exec node1 ping -c 3 d6040cf9bf45

Nice!

### A drive can be shared among containers to avoid donwloading and installing packages multiple times

Notice that if we need to install the same package in each container, those containers will pull packages off the web again and again and again!
We can solve this by creating a shared volume

#### Shared `apt` volume

In [None]:
!docker volume create apt-cache

!docker run -dit --rm --name apt-container-1 --network simulated-cluster -v apt-cache:/var/cache/apt/archives continuumio/miniconda3
!docker run -dit --rm --name apt-container-2 --network simulated-cluster -v apt-cache:/var/cache/apt/archives continuumio/miniconda3
!docker run -dit --rm --name apt-container-3 --network simulated-cluster -v apt-cache:/var/cache/apt/archives continuumio/miniconda3

Now let's see how long it takes to install a package

In [None]:
%%time
!docker exec apt-container-1 sh -c "apt update && apt-get install -y iputils-ping iproute2"

In [None]:
%%time
!docker exec apt-container-2 sh -c "apt update && apt-get install -y iputils-ping iproute2"