## Using the Simulation Runner

In order to be able to run different simulations with different types of data (fairly) painlessly, it made sense to put together a simple API to run them together.

The goal was to try to do this as simply as possible while also being nice to use and easy to expand on. The general idea is this:

- Parameters that are used by different simulations/data generation are defined by defining them in json data to set them Runner object
    - For example: `{"num_samples": 20000, "num_labels": 3}`
- Simulations/data generation declare what parameters they need in order to be able to run. If any required parameters are missing in the json then the runner stops and prints out what parameters were missing.
- If all required parameters are set, then the data generation function runs and the data that is generated is passed to the simulation run function.

Creating the runner object immediately runs the simulation.

### Expanding the Runner

The runner should (hopefully) be easily expandable. When we end up needing different types of parameters, it should be very easy to define some more. For example, if we want to add a parameter called `delay`:
- Create a new parameter key in the `runner_keys.py` (`P_KEY_DELAY = "delay"`)
- Set this value in the parameter json for the simulators that need it
- Make sure any simulation/data generation that depends on it marks it as so (in `run_func_ltable` in `runner.py`)

Hopefully this isn't over-engineered. I tried very hard to keep this minimalistic

In [1]:
from mozfldp.runner import Runner
import warnings

# hide the warning message temporarily
warnings.simplefilter("ignore")

# auto-reload the modules everytime a cell is run
%load_ext autoreload
%autoreload 2

blob_data_file_path = "../datasets/blob_S20000_L3_F4_U100.csv"

# Overly simple json generator
Since the runner needs to be initialised with json parameter data, here's a simple function that converts a dictionary to a json string:

In [2]:
import json

def gen_param_json_from_param_dict(param_dict):
    return json.dumps(param_dict)        

### Example #1: Running Federated Learning with Different Types of Data

Before we can run a simulation, we need to initialise the runner with the parameters that the simulation requires: 

In [3]:
blob_data_file_path = "../datasets/blob_S20000_L3_F4_U100.csv"
params_dict = {
    Runner.P_KEY_NUM_SAMPLES: 20000,
    Runner.P_KEY_NUM_LABELS: 3,
    Runner.P_KEY_NUM_FEATURES: 4,
    Runner.P_KEY_NUM_USERS: 100,
    Runner.P_KEY_NUM_ROUNDS: 10,
    Runner.P_KEY_BATCH_SIZE: 40,
    Runner.P_KEY_NUM_EPOCHS: 5,
    Runner.P_KEY_DATA_FILE_PATH: blob_data_file_path,
    Runner.P_KEY_RAND_SEED: 42,
}

json_params = gen_param_json_from_param_dict(params_dict)

With the JSON data, we're ready to run the simulation. Now we just need to construct the runner object and specify the data generation tpe

It's fine to run multiple simulations with a single runner object.

In [4]:
_ = Runner(json_params, Runner.SIM_TYPE_FED_LEARNING, Runner.DATA_GEN_TYPE_DATA_FROM_FILE)

Generating "file_data" data...
Running the "fed_learning" simulation...
Training...
Params:  {'batch_size': 40, 'client_fraction': 1, 'epoch': 1, 'init_weight': [array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]]), array([0., 0., 0.])], 'num_rounds': 10}
Weights: [array([[-10.61654909,  16.69856324,  -6.7932592 , -51.4672848 ],
       [ 12.86313335, -16.74530439,  -5.19809286,  -5.20404901],
       [-30.84055385,  34.42050187,  24.12360651,  97.03290455]]), array([ 287.08697397,  -18.21114464, -806.25488941])]
Score: 0.999750


Training...
Params:  {'batch_size': 40, 'client_fraction': 1, 'epoch': 5, 'init_weight': [array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]]), array([0., 0., 0.])], 'num_rounds': 10}
Weights: [array([[-1.16572831e+01,  1.93391441e+01, -8.23903719e+00,
        -5.68271554e+01],
       [ 1.03880622e+00, -1.28290159e+00, -3.01597975e-01,
         2.83275151e-02],
       [-2.53759360e+01,  3.50566124e+01,  2.55224324e+

### Example #2: Runner Federated Learning with Differential Privacy

In [5]:
params_dict = {
    Runner.P_KEY_NUM_SAMPLES: 20000,
    Runner.P_KEY_NUM_LABELS: 3,
    Runner.P_KEY_NUM_FEATURES: 4,
    Runner.P_KEY_NUM_USERS: 100,
    Runner.P_KEY_BATCH_SIZE: 40,
    Runner.P_KEY_NUM_EPOCHS: 5,
    Runner.P_KEY_NUM_ROUNDS: 10,
    Runner.P_KEY_WEIGHT_MOD: 1,
    Runner.P_KEY_USER_SEL_PROB: 0.1,
    Runner.P_KEY_SENSITIVITY: 0.5,
    Runner.P_KEY_NOISE_SCALE: 1.0,
    Runner.P_KEY_DATA_FILE_PATH: blob_data_file_path
}

json_params = gen_param_json_from_param_dict(params_dict)

In [6]:
_ = Runner(json_params, Runner.SIM_TYPE_FED_AVG_WITH_DP, Runner.DATA_GEN_TYPE_DATA_FROM_FILE)

Generating "file_data" data...
Running the "fed_avg_with_dp" simulation...
Theta ([coef, inter] for round 0: 
 [-0.01246488  0.15734012  0.11257415 -0.14175231  0.08026086 -0.25686652
  0.04949297 -0.00699434 -0.46252793  0.09015778 -0.10756821  0.16983063
  0.06986314 -0.18457772 -0.36771826]
Theta ([coef, inter] for round 1: 
 [-0.09092099  0.16648902  0.13552367 -0.33574714  0.0739814  -0.29338803
 -0.02531278 -0.05534413 -0.8263421   0.0602155  -0.13997352  0.1146149
  0.12353002 -0.23842829 -0.51434352]
Theta ([coef, inter] for round 2: 
 [-0.19134171  0.49578205  0.19167243 -0.65174216  0.14834252 -0.40575947
 -0.00978062 -0.22510194 -1.35279402  0.03888275 -0.2126846   0.31756821
  0.1612367  -0.26926082 -0.87635599]
Theta ([coef, inter] for round 3: 
 [-2.58801369e-01  5.68927718e-01  3.13059693e-01 -7.43617231e-01
  1.41012351e-03 -3.78718588e-01 -4.18699892e-02 -2.81493307e-01
 -1.55599646e+00  1.38419129e-01 -2.01856268e-01  3.94656370e-01
  1.64634868e-01 -2.41655534e-01 -9

### Missing Parameters for a Run
If the simulation/data generation is missing any required parameters, you will get an exception like this:

In [7]:
params_dict = {
    Runner.P_KEY_NUM_SAMPLES: 20000,
    Runner.P_KEY_NUM_LABELS: 3,
    Runner.P_KEY_NUM_FEATURES: 4,
    Runner.P_KEY_NUM_USERS: 100,
    Runner.P_KEY_RAND_SEED: 42,
    Runner.P_KEY_NUM_ROUNDS: 1,
    Runner.P_KEY_BAT
}

json_params = gen_param_json_from_param_dict(params_dict)
_ = Runner(json_params, Runner.SIM_TYPE_FED_LEARNING, Runner.DATA_GEN_TYPE_DATA_FROM_FILE)

RunnerException: Can not run fed_learning because the following required parameters are missing: 
- rand_seed
- num_epochs
- batch_size
- num_rounds