# Tutorial: distributed computations on Kubernetes

This short tutorials explains how to run the hyperparamerter optimisation over multiple servers. It requires to set-up a Kubernetes cluster: you can refer to the `kubernetes/readme.md` file, where it is explained how to set-up the cluster (with a few lines of code, as all eh configuration files are ready).

In [None]:
# auto-reload in notebooks
%reload_ext autoreload
%autoreload 2
%matplotlib inline

# constants, do not change thhis if you run of the K8 config we provided!
USER = "root"
MYSQL_IP = "mysql-service"
PSW = "password"
REDIS_IP = "redis-service"

## General principles of distributions of computations

In this notebook we will show it is possible to distribute the HyperParametersOptimisation computations to different pods (i.e. computing servers).

There are two databases involved: one is Redis and is used to queue the computations, while the other is MySQL and it s used to keep track of the state of teh HPO.

In a loca setting, in which neither `K8` or `minikube` are running, you would start MySQL with the following command:

```
docker run --name=user_mysql_1 --env="MYSQL_ROOT_PASSWORD=password" -p 3306:3306 -d mysql:latest
```

Simlarly, to make sure that also redis runs:

```
redis-server
```

Of course, both redis and mysql have to be installed (in teh case of MySQL we use a docker image, while for redis we installed it via `brew install redis`.

connect to mysql each redis worker


**addendum**: 
 - to stop MySQL

```
/usr/local/bin/mysql.server stop
```
 - to make sure that the account to login to mysql is `root:password`, do
```

ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
```


In [None]:
# unit test
from redis import Redis
from rq import Queue
from rq import Retry

# needed! make sure the file parallel_hpo.py exists in the cwd
from utils import connect_to_mysql, test_fnc, run_hpo_in_parallel
from parallel_hpo import hpo_parallel

# connect to mysql
connect_to_mysql(USER, PSW, MYSQL_IP)
print("MySQL connected")

# connect to redis
redis = Redis(host=REDIS_IP, port=6379, db=0)
print("Redis connected")


## Enqueuing principles

In order to distribute comptuations, the technique we use here use queues. Basically, all teh computations are stored in a queue (in Redis!) and whence a worker is awailable, t strats cruchning the job (in a FIFO logic).

In [None]:
# prepare the job queue
q = Queue(connection=redis)

# unit test the connection
job = q.enqueue(test_fnc, "hello!", retry=Retry(max=3))
job

So far you have enqueued the jobs: now you have to start the workers so that the jobs can be crunched!

In our setup of `minikube` or `K8` the workers are automatically set-up. If you are running from scratchh on local, then you need to fire up some workers (each from a different terminal or moving the job to the background with `&`:

```
rq worker --url <redis-url> high default low
```

`<redis-url>` would most probably be `127.0.0.1` (or `localhost` or `0.0.0.0`).

To monitor the workers and the jobs, you can run the dashboard with:

```
rq-dashboard
```


In [None]:
# you need to wait a bit before being able to see the result
job.result

## Enqueuing the jobs for HPO

In the next section we enqueue te HPO and make sure that the workers are actively cruching the jobs! If more than one worker is active, the job gets distributed!

But how does `optuna` knows how to distribute the computations? This is what MySQL database is about.

You can set up multiple workers to have the HPO run in parallel and optuna will store in a MySQL database the data of each run every time a trial is finished. Every time a new trial starts, then the databse is read and -- depending on the HPO technique -- the new set of hyperparameters is used and recorded.

In [None]:
# enqueue hpo jobs
study_name = "distributed-example-2"
run_hpo_in_parallel(q, hpo_parallel, [USER, PSW, MYSQL_IP, study_name], 4)
print("jobs enqueued")