### Cluster Init

Since the cluster workers have CPU (4 cores) Dask will try to assign 4 tasks on a single worker (running in parallel). First of all, Dask does not know that a single Task (which is a Tensorflow simulation) will likely utilize 4 cores anyway, and more importandly it does not take into account the very limited RAM (~3GB) each worker has. Hence, the workers will run out of memory if we do not do something about this. 

We can use the `resources` functionality to define custom resources of our workers. We define `PROCESS` resource which we assign to be one. When we later `.submit` tasks we will inform Dask that on a worker a single task uses all of the worker's `PROCESS` resource, i.e., `{"PROCESS" : 1}` so that Dask will not assign another Task to this worker. See docs [Resources](https://distributed.dask.org/en/stable/resources.html) and relevant *stackoverflow* question [one task per worker](https://stackoverflow.com/questions/45052535/dask-distributed-how-to-run-one-task-per-worker-making-that-task-running-on-a).

Note: Dask obviously does not understand what `PROCESS` resrouce means, it is conceptual; it just knows that this arbitrary resource named `PROCESS` has one (it could be GPU resource, CPU, RAM whatever we think it is).

In [1]:
from distributed import LocalCluster
import dask

with dask.config.set({"distributed.worker.resources.PROCESS": 1}):
    cluster = LocalCluster(
        n_workers=4,
        threads_per_worker=2,
        memory_limit='4GB'
    )

In [2]:
cluster

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 8,Total memory: 14.90 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:54270,Workers: 4
Dashboard: http://127.0.0.1:8787/status,Total threads: 8
Started: Just now,Total memory: 14.90 GiB

0,1
Comm: tcp://127.0.0.1:54298,Total threads: 2
Dashboard: http://127.0.0.1:54301/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:54273,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-kol3wr4_,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-kol3wr4_

0,1
Comm: tcp://127.0.0.1:54294,Total threads: 2
Dashboard: http://127.0.0.1:54295/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:54274,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-8ngipdpg,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-8ngipdpg

0,1
Comm: tcp://127.0.0.1:54297,Total threads: 2
Dashboard: http://127.0.0.1:54299/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:54275,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-thp3csvy,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-thp3csvy

0,1
Comm: tcp://127.0.0.1:54291,Total threads: 2
Dashboard: http://127.0.0.1:54292/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:54276,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-wp1fou4c,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-wp1fou4c


### Client Init

In [3]:
from dask.distributed import Client

client = Client(cluster)

client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 8,Total memory: 14.90 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:54270,Workers: 4
Dashboard: http://127.0.0.1:8787/status,Total threads: 8
Started: Just now,Total memory: 14.90 GiB

0,1
Comm: tcp://127.0.0.1:54298,Total threads: 2
Dashboard: http://127.0.0.1:54301/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:54273,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-kol3wr4_,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-kol3wr4_

0,1
Comm: tcp://127.0.0.1:54294,Total threads: 2
Dashboard: http://127.0.0.1:54295/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:54274,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-8ngipdpg,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-8ngipdpg

0,1
Comm: tcp://127.0.0.1:54297,Total threads: 2
Dashboard: http://127.0.0.1:54299/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:54275,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-thp3csvy,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-thp3csvy

0,1
Comm: tcp://127.0.0.1:54291,Total threads: 2
Dashboard: http://127.0.0.1:54292/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:54276,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-wp1fou4c,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-wp1fou4c


### Load Data Lazily

In [4]:
import tensorflow as tf
from dask import delayed

@delayed
def load_data():
    (X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
    X_train, X_test = X_train / 255.0, X_test / 255.0

    return X_train, y_train, X_test, y_test

In [5]:
data_delayed = load_data()

### Upload the simulation module

This code must run **only** when all the workers have been initialized by the `cluster`. Later created workers will not have this.

For future needs: We can create a callback if necessary so that new workers get this file uploaded to them automatically. 

In [6]:
client.upload_file('TF_Simulation_FDA_CNN.py')

{'tcp://127.0.0.1:54291': {'status': 'OK'},
 'tcp://127.0.0.1:54294': {'status': 'OK'},
 'tcp://127.0.0.1:54297': {'status': 'OK'},
 'tcp://127.0.0.1:54298': {'status': 'OK'}}

### Training Simulation

We import the `TF_Simulation_FDA_CNN.py` which corresponds to `10_TF_Simulation_FDA_CNN.ipynb` from `progress_notebooks` directory.

In [7]:
def worker_simulation(data_delayed, num_clients_list, batch_size_list, num_steps_until_rtc_check_list,
                      theta_list, num_epochs, sketch_width, sketch_depth, bench_test=False):
    
    import TF_Simulation_FDA_CNN as sim
    import gc

    X_train, y_train, X_test, y_test = data_delayed.compute()
    
    train_dataset, test_dataset = sim.convert_to_tf_dataset(X_train, y_train, X_test, y_test)
    
    del X_train, y_train, X_test, y_test
    
    all_epoch_metrics, all_round_metrics = sim.run_simulations(
        train_dataset=train_dataset,
        test_dataset=test_dataset,
        num_clients_list=num_clients_list,
        batch_size_list=batch_size_list,
        num_steps_until_rtc_check_list=num_steps_until_rtc_check_list,
        theta_list=theta_list,
        num_epochs=num_epochs,
        sketch_width=sketch_width,
        sketch_depth=sketch_depth,
        bench_test=bench_test
    )
    
    del train_dataset, test_dataset
    
    gc.collect()  # force garbage collection
    sim.tf.keras.backend.clear_session()  # Clear TensorFlow session
    
    return all_epoch_metrics, all_round_metrics

### Dask Tasks

In [8]:
num_clients_list = [5, 6, 7, 8]
batch_size_list = [32]
num_steps_until_rtc_check_list = [1]
theta_list = [0.05]
num_epochs = 1

sketch_width = 500
sketch_depth = 7

The code in `TF_Simulation_FDA_CNN.py` for `run_simulations` is meant to take as input the parameter lists and perform a simulation for all the parameter combinations. Obviously it works as intended given a single value in all of the lists (performs one test). Instead of performing big tests (giving lists) on `.run_simulations` we will pass single-item parameter lists because we want Dask to create many Tasks. This way, if a Task fails it is not a big deal, Dask will rerun it again. In the other approach, where we pass a single big Task (i.e. many simulation tests, given parameter lists) on each worker, such a Test might take Days and if that test fails for whatever reason, we are screwed!

In [9]:
futures = []

In [10]:
for num_clients in num_clients_list:
    for batch_size in batch_size_list:
        for num_steps_until_rtc_check in num_steps_until_rtc_check_list:
            for theta in theta_list:
                
                future = client.submit(
                    worker_simulation,
                    data_delayed=data_delayed, 
                    num_clients_list=[num_clients], 
                    batch_size_list=[batch_size], 
                    num_steps_until_rtc_check_list=[num_steps_until_rtc_check],
                    theta_list=[theta], 
                    num_epochs=num_epochs,
                    sketch_width=sketch_width,
                    sketch_depth=sketch_depth,
                    resources={'PROCESS': 1}  # Tell Dask that the resource `PROCESS` is consumed in one task!
                ) 
                
                futures.append(future)

In [11]:
from dask.distributed import progress

progress(*futures)

VBox()

In [None]:
results = client.gather(futures)



### Process Results

In [27]:
from itertools import chain

all_tests_epoch_metrics, all_tests_round_metrics = zip(*results)

all_epoch_metrics = chain.from_iterable(all_tests_epoch_metrics)  # flatten, careful, iterator
all_round_metrics = chain.from_iterable(all_tests_round_metrics)  # flatten, careful, iterator

In [28]:
import pandas as pd

epoch_metrics_df = pd.DataFrame(all_epoch_metrics)
round_metrics_df = pd.DataFrame(all_round_metrics)

### Save Results

We save as `.parquet` files.

In [29]:
epoch_metrics_df

Unnamed: 0,dataset_name,fda_name,num_clients,batch_size,num_steps_until_rtc_check,theta,nn_num_weights,sketch_width,sketch_depth,epoch,total_rounds,total_fda_steps,accuracy
0,EMNIST,naive,7,32,1,1.0,2592202,-1,-1,1,1,5,0.1028
1,EMNIST,linear,7,32,1,1.0,2592202,-1,-1,1,1,5,0.1028
2,EMNIST,sketch,7,32,1,1.0,2592202,500,7,1,1,5,0.1028
3,EMNIST,naive,6,32,1,1.0,2592202,-1,-1,1,1,5,0.1135
4,EMNIST,linear,6,32,1,1.0,2592202,-1,-1,1,1,5,0.1028
5,EMNIST,sketch,6,32,1,1.0,2592202,500,7,1,1,5,0.0958
6,EMNIST,naive,8,32,1,1.0,2592202,-1,-1,1,1,5,0.1028
7,EMNIST,linear,8,32,1,1.0,2592202,-1,-1,1,1,5,0.1028
8,EMNIST,sketch,8,32,1,1.0,2592202,500,7,1,1,5,0.0982
9,EMNIST,naive,2,32,1,1.0,2592202,-1,-1,1,3,5,0.1028


In [30]:
round_metrics_df

Unnamed: 0,dataset_name,fda_name,num_clients,batch_size,num_steps_until_rtc_check,theta,nn_num_weights,sketch_width,sketch_depth,epoch,round,total_fda_steps,est_var,actual_var
0,EMNIST,naive,7,32,1,1.0,2592202,-1,-1,2,1,5,0.525528,0.444461
1,EMNIST,linear,7,32,1,1.0,2592202,-1,-1,2,1,5,0.444269,0.379996
2,EMNIST,sketch,7,32,1,1.0,2592202,500,7,2,1,5,0.378839,0.373495
3,EMNIST,naive,6,32,1,1.0,2592202,-1,-1,2,1,5,0.622352,0.517178
4,EMNIST,linear,6,32,1,1.0,2592202,-1,-1,2,1,5,0.569023,0.471629
5,EMNIST,sketch,6,32,1,1.0,2592202,500,7,2,1,5,0.463931,0.454362
6,EMNIST,naive,8,32,1,1.0,2592202,-1,-1,2,1,5,0.368683,0.320951
7,EMNIST,linear,8,32,1,1.0,2592202,-1,-1,2,1,5,0.358903,0.313541
8,EMNIST,sketch,8,32,1,1.0,2592202,500,7,2,1,5,0.342747,0.340101
9,EMNIST,naive,2,32,1,1.0,2592202,-1,-1,1,1,2,1.693361,0.842146


### Terminate `Client` and `Cluster`

In [18]:
client.close()
cluster.close()

# TODO:
1. Fix loop
5. Go Dask and .py to simulation dir


THINK ABOUT APPROACH! TRY TO RUN MANY TESTS FROM ONE TASK INSTEAD OF MANY. 4G RAM HANDLED 20 CLIENTS JUST FINE!