### Cluster Init

Since the cluster workers have CPU (4 cores) Dask will try to assign 4 tasks on a single worker (running in parallel). First of all, Dask does not know that a single Task (which is a Tensorflow simulation) will likely utilize 4 cores anyway, and more importandly it does not take into account the very limited RAM (~3GB) each worker has. Hence, the workers will run out of memory if we do not do something about this. 

We can use the `resources` functionality to define custom resources of our workers. We define `PROCESS` resource which we assign to be one. When we later `.submit` tasks we will inform Dask that on a worker a single task uses all of the worker's `PROCESS` resource, i.e., `{"PROCESS" : 1}` so that Dask will not assign another Task to this worker. See docs [Resources](https://distributed.dask.org/en/stable/resources.html) and relevant *stackoverflow* question [one task per worker](https://stackoverflow.com/questions/45052535/dask-distributed-how-to-run-one-task-per-worker-making-that-task-running-on-a).

Note: Dask obviously does not understand what `PROCESS` resrouce means, it is conceptual; it just knows that this arbitrary resource named `PROCESS` has one (it could be GPU resource, CPU, RAM whatever we think it is).

In [2]:
from distributed import LocalCluster
import dask

with dask.config.set({"distributed.worker.resources.PROCESS": 1}):
    cluster = LocalCluster(
        n_workers=2,
        threads_per_worker=3,
        memory_limit='4GB'
    )

In [3]:
cluster

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 8,Total memory: 14.90 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:52542,Workers: 4
Dashboard: http://127.0.0.1:8787/status,Total threads: 8
Started: Just now,Total memory: 14.90 GiB

0,1
Comm: tcp://127.0.0.1:52567,Total threads: 2
Dashboard: http://127.0.0.1:52568/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:52545,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-j7e9563u,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-j7e9563u

0,1
Comm: tcp://127.0.0.1:52570,Total threads: 2
Dashboard: http://127.0.0.1:52571/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:52546,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-jiry6owy,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-jiry6owy

0,1
Comm: tcp://127.0.0.1:52564,Total threads: 2
Dashboard: http://127.0.0.1:52565/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:52547,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-wml6slkf,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-wml6slkf

0,1
Comm: tcp://127.0.0.1:52561,Total threads: 2
Dashboard: http://127.0.0.1:52562/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:52548,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-3m001ri4,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-3m001ri4


### Client Init

In [4]:
from dask.distributed import Client

client = Client(cluster)

client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 8,Total memory: 14.90 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:52542,Workers: 4
Dashboard: http://127.0.0.1:8787/status,Total threads: 8
Started: Just now,Total memory: 14.90 GiB

0,1
Comm: tcp://127.0.0.1:52567,Total threads: 2
Dashboard: http://127.0.0.1:52568/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:52545,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-j7e9563u,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-j7e9563u

0,1
Comm: tcp://127.0.0.1:52570,Total threads: 2
Dashboard: http://127.0.0.1:52571/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:52546,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-jiry6owy,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-jiry6owy

0,1
Comm: tcp://127.0.0.1:52564,Total threads: 2
Dashboard: http://127.0.0.1:52565/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:52547,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-wml6slkf,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-wml6slkf

0,1
Comm: tcp://127.0.0.1:52561,Total threads: 2
Dashboard: http://127.0.0.1:52562/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:52548,
Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-3m001ri4,Local directory: C:\Users\miket\AppData\Local\Temp\dask-worker-space\worker-3m001ri4


### Load Data Lazily

In [5]:
import tensorflow as tf
from dask import delayed

@delayed
def load_data():
    (X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
    X_train, X_test = X_train / 255.0, X_test / 255.0

    return X_train, y_train, X_test, y_test

In [6]:
data_delayed = load_data()

### Upload the simulation module

This code must run **only** when all the workers have been initialized by the `cluster`. Later created workers will not have this.

For future needs: We can create a callback if necessary so that new workers get this file uploaded to them automatically. 

In [7]:
client.upload_file('TF_Simulation_FDA_CNN.py')

{'tcp://127.0.0.1:52561': {'status': 'OK'},
 'tcp://127.0.0.1:52564': {'status': 'OK'},
 'tcp://127.0.0.1:52567': {'status': 'OK'},
 'tcp://127.0.0.1:52570': {'status': 'OK'}}

### Training Simulation

We import the `TF_Simulation_FDA_CNN.py` which corresponds to `10_TF_Simulation_FDA_CNN.ipynb` from `progress_notebooks` directory.

In [8]:
def worker_single_fda_simulation(data_delayed, fda_name, num_clients, batch_size, num_steps_until_rtc_check, 
                                 theta, num_epochs, sketch_width=-1, sketch_depth=-1, bench_test=False):
    
    import TF_Simulation_FDA_CNN as sim
    import gc
    
    X_train, y_train, X_test, y_test = data_delayed.compute()
    
    train_dataset, test_dataset = sim.convert_to_tf_dataset(X_train, y_train, X_test, y_test)
    
    del X_train, y_train, X_test, y_test
    
    epoch_metrics, round_metrics = sim.single_simulation(
        fda_name, num_clients, train_dataset, test_dataset, batch_size, num_steps_until_rtc_check,
        theta, num_epochs, sketch_width=sketch_width, sketch_depth=sketch_depth, bench_test=bench_test
    )
    
    del train_dataset, test_dataset
    
    gc.collect()  # force garbage collection
    sim.tf.keras.backend.clear_session()  # Clear TensorFlow session
    
    return epoch_metrics, round_metrics

### Dask Tasks

In [9]:
num_clients_list = [5, 20, 4, 11, 15, 7]
batch_size_list = [32]
num_steps_until_rtc_check_list = [1]
theta_list = [1.]
num_epochs = 1

sketch_width = 500
sketch_depth = 7

In `TF_Simulation_FDA_CNN.py` we have general methods for testing all combinations given the lists above. In the Cluster enviroment we want to break-up the tests into smaller tasks and considering the limited RAM of each worker in the Cluster we chose to break up the tests to the bottom, that is, a single simulation (*naive*, *linear* or *sketch*) given fixed parameters. This has time-cost drawbacks like recomputation of Tensorflow Graphs (specifically, in `AmsSketch`) and creation of many `AmsSketch` instances, one for each *sketch* test, computation of the `delayed` dataset for each test etc. But it is definitely the safest thing to do considering our RAM requirements.

In [12]:
futures = []

for num_clients in num_clients_list:
    for batch_size in batch_size_list:
        for num_steps_until_rtc_check in num_steps_until_rtc_check_list:
            for theta in theta_list:
                
                for fda_name in ["naive", "linear", "sketch"]:
                
                    future = client.submit(
                        worker_single_fda_simulation,
                        data_delayed=data_delayed, 
                        fda_name=fda_name,
                        num_clients=num_clients, 
                        batch_size=batch_size, 
                        num_steps_until_rtc_check=num_steps_until_rtc_check,
                        theta=theta, 
                        num_epochs=num_epochs,
                        sketch_width=sketch_width if fda_name == "sketch" else -1,
                        sketch_depth=sketch_depth if fda_name == "sketch" else -1,
                        bench_test=True,
                        resources={'PROCESS': 1}  # Tell Dask that the resource `PROCESS` is consumed in one task!
                    ) 

                    futures.append(future)

In [13]:
from dask.distributed import progress

progress(*futures)

VBox()

### Gather and Save results

Due to the low RAM of Workers we need to be careful so caching results in their RAM until all Tasks have completed is not the way to go (if they die we lose the results - must recompute them, and we produce the caching overhead on them). Thus, we use `as_completed` to force Workers return their results immediately upon completion.

To go a step further, we also save each test's metrics in temporary `.parquet` files which we will combine when time comes.

In [14]:
import os

def save_result_to_parquet(df, directory, file_prefix):
    if not os.path.exists(directory):
        os.makedirs(directory)

    file_name = f"{file_prefix}_{len(os.listdir(directory))}.parquet"
    file_path = os.path.join(directory, file_name)
    
    df.to_parquet(file_path)

In [15]:
from dask.distributed import as_completed
import pandas as pd
import threading

tmp_epoch_metrics_dir = 'results/tmp_epoch_metrics'
tmp_round_metrics_dir = 'results/tmp_round_metrics'

def gather_and_save_results():
    
    total_futures = len(futures)
    num_completed = 0
    
    for future, result in as_completed(futures, with_results=True):
        
        epoch_metrics, round_metrics = result
        
        epoch_metrics_df = pd.DataFrame(epoch_metrics)
        save_result_to_parquet(epoch_metrics_df, tmp_epoch_metrics_dir, 'epoch_metrics')
        
        round_metrics_df = pd.DataFrame(round_metrics)
        save_result_to_parquet(round_metrics_df, tmp_round_metrics_dir, 'round_metrics')
        
        num_completed += 1
        print(f"\rProgress on Gathered-Saved Results: {num_completed} / {total_futures}", end="", flush=True)  # Print progress

        
# Run the function in a separate thread
t = threading.Thread(target=gather_and_save_results)
t.start()

Progress on Gathered-Saved Results: 18 / 18

### Combine & Save Results

In [16]:
# Read multiple Parquet files and combine them
all_epoch_metrics_df = pd.read_parquet(tmp_epoch_metrics_dir)
all_round_metrics_df = pd.read_parquet(tmp_round_metrics_dir)

all_epoch_metrics_df.to_parquet('results/epoch_metrics.parquet')
all_round_metrics_df.to_parquet('results/round_metrics.parquet')

In [42]:
# from itertools import chain

# results = client.gather(futures)

# all_tests_epoch_metrics, all_tests_round_metrics = zip(*results)

# all_epoch_metrics = chain.from_iterable(all_tests_epoch_metrics)  # flatten
# all_round_metrics = chain.from_iterable(all_tests_round_metrics)  # flatten

# epoch_metrics_df = pd.DataFrame(all_epoch_metrics)
# round_metrics_df = pd.DataFrame(all_round_metrics)

### Terminate `Client` and `Cluster`

In [15]:
client.close()
cluster.close()