# Quickstart Tensorflow Tutorial
In this tutorial, you will see how to use flox to run FL experiments on Tensorflow using first a local executor and then using real physical endpoints. We will train our model to classify instances from the Fashion MNIST dataset.

In [1]:
import logging

import tensorflow as tf
from tensorflow import keras

from flox.clients.TensorflowClient import TensorflowClient
from flox.controllers.TensorflowController import TensorflowController
from flox.model_trainers.TensorflowTrainer import TensorflowTrainer

logger = logging.getLogger(__name__)

### Getting Test Data
First, let's get some Tensorflow test data by using the ``get_test_data`` function from flox/utils.

In [2]:
from flox.utils import get_test_data
x_test, y_test = get_test_data(keras_dataset="fashion_mnist", num_samples=2000)

### Defining the Model

Next, let's define our Tensorflow model architecture and compile it.

In [3]:
# `fashion_mnist` images are grayscale, 28 x 28 pixels in size
input_shape = (28, 28, 1)
# there are 10 classes in the dataset
num_classes = 10

# define the model architecture
global_model = tf.keras.Sequential(
    [
        tf.keras.Input(shape=input_shape),
        tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(num_classes, activation="softmax"),
    ]
)

# compile the model
global_model.compile(
    loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
)

### Instantiating Model Trainer and Client instances

Next, we will initialize an instance of a Tensorflow Model Trainer and Client. You can check out their implementation under ``flox/model_trainers`` and ``flox/clients``, respectively. You can also extend or modify these classes to fit your needs.

In [4]:
tf_trainer = TensorflowTrainer()
tf_client = TensorflowClient()

### Instantiating the Controller (Local Execution)

Now, let's define our endpoints and initialize the PyTorch *Controller* that will do the heavy lifting of deploying tasks to the endpoints. We will run three rounds of FL, with 100 samples and 1 training epoch on each device. Note that we are specifying ``executor_type`` to "local", which will use ``concurrent.futures.ThreadPoolExecutor`` to execute the tasks locally. We are also providing the dataset name and the test data. Finally, we'll launch the experiment.

In [5]:
# since we are first executing the experiment locally, it does not matter what we name the endpoints:
eps = ["simulated_endpoint_1", "simulated_endpoint_2", "simulated_endpoint_3"]
logger.info(f"Endpoints: {eps}")

flox_controller = TensorflowController(
    endpoint_ids=eps,
    num_samples=100,
    epochs=1,
    rounds=3,
    client_logic=tf_client,
    global_model=global_model,
    executor_type="local",  # choose "funcx" for FuncXExecutor, "local" for ThreadPoolExecutor
    model_trainer=tf_trainer,
    x_test=x_test,
    y_test=y_test,
    data_source="keras",
    dataset_name="fashion_mnist",
)

logger.info("STARTING FL FLOW...")
flox_controller.run_federated_learning()

1676799786.807365 2023-02-19 17:43:06 INFO MainProcess-18416 MainThread-18036 __main__:3 <module> Endpoints: ['simulated_endpoint_1', 'simulated_endpoint_2', 'simulated_endpoint_3']
1676799786.809368 2023-02-19 17:43:06 INFO MainProcess-18416 MainThread-18036 __main__:20 <module> STARTING FL FLOW...
1676799786.810364 2023-02-19 17:43:06 DEBUG MainProcess-18416 MainThread-18036 flox.controllers.MainController:166 on_model_init No executor was provided, trying to retrieve the provided executor type local from the list of available executors: {'local': <class 'concurrent.futures.thread.ThreadPoolExecutor'>, 'funcx': <class 'funcx.sdk.executor.FuncXExecutor'>}
1676799786.812365 2023-02-19 17:43:06 DEBUG MainProcess-18416 MainThread-18036 flox.controllers.MainController:170 on_model_init The selected executor is <class 'concurrent.futures.thread.ThreadPoolExecutor'>
1676799790.106074 2023-02-19 17:43:10 DEBUG MainProcess-18416 MainThread-18036 flox.controllers.MainController:209 on_model_br

Train on 100 samples


1676799810.926122 2023-02-19 17:43:30 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:290 on_model_receive Starting to retrieve results from endpoints
1676799810.928100 2023-02-19 17:43:30 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:305 on_model_receive Finished retrieving all results from the endpoints
1676799810.929100 2023-02-19 17:43:30 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:335 on_model_aggregate Finished aggregating weights
1676799810.932134 2023-02-19 17:43:30 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:398 run_federated_learning Round 0 evaluation results: 
1676799811.307099 2023-02-19 17:43:31 DEBUG MainProcess-18416 MainThread-18036 flox.controllers.MainController:209 on_model_broadcast Launching the <class 'concurrent.futures.thread.ThreadPoolExecutor'> executor
1676799811.308103 2023-02-19 17:43:31 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainCont

{'loss': 2.254750108718872, 'metrics': {'accuracy': 0.20649999380111694}}


1676799827.820720 2023-02-19 17:43:47 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:247 on_model_broadcast Deployed the task to endpoint simulated_endpoint_1


Train on 100 samples


1676799828.677727 2023-02-19 17:43:48 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:290 on_model_receive Starting to retrieve results from endpoints
1676799828.678720 2023-02-19 17:43:48 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:305 on_model_receive Finished retrieving all results from the endpoints
1676799828.679720 2023-02-19 17:43:48 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:335 on_model_aggregate Finished aggregating weights
1676799828.683729 2023-02-19 17:43:48 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:398 run_federated_learning Round 1 evaluation results: 
1676799828.933715 2023-02-19 17:43:48 DEBUG MainProcess-18416 MainThread-18036 flox.controllers.MainController:209 on_model_broadcast Launching the <class 'concurrent.futures.thread.ThreadPoolExecutor'> executor
1676799828.934716 2023-02-19 17:43:48 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainCont

{'loss': 2.186371862411499, 'metrics': {'accuracy': 0.19249999523162842}}


1676799844.765646 2023-02-19 17:44:04 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:247 on_model_broadcast Deployed the task to endpoint simulated_endpoint_1


Train on 100 samples


1676799845.619435 2023-02-19 17:44:05 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:290 on_model_receive Starting to retrieve results from endpoints
1676799845.620431 2023-02-19 17:44:05 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:305 on_model_receive Finished retrieving all results from the endpoints
1676799845.622431 2023-02-19 17:44:05 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:335 on_model_aggregate Finished aggregating weights
1676799845.625432 2023-02-19 17:44:05 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:398 run_federated_learning Round 2 evaluation results: 


{'loss': 2.131452308654785, 'metrics': {'accuracy': 0.12449999898672104}}


### Real Endpoint (FuncX) Execution 

Now, let's switch "endpoint_type" to "funcx" and provide UUID of actual funcX endpoints. Make sure to follow instructions in this directory's README to set up your clients. 

In [6]:
eps = ["c7487b2b-b129-47e2-989b-5a9ac361befc"]
logger.info(f"Endpoints: {eps}")

flox_controller = TensorflowController(
    endpoint_ids=eps,
    num_samples=100,
    epochs=1,
    rounds=3,
    client_logic=tf_client,
    global_model=global_model,
    executor_type="funcx",  # choose "funcx" for FuncXExecutor, "local" for ThreadPoolExecutor
    model_trainer=tf_trainer,
    x_test=x_test,
    y_test=y_test,
    data_source="keras",
    dataset_name="fashion_mnist",
)

logger.info("STARTING FL FLOW...")
flox_controller.run_federated_learning()

1676799859.120141 2023-02-19 17:44:19 INFO MainProcess-18416 MainThread-18036 __main__:2 <module> Endpoints: ['c7487b2b-b129-47e2-989b-5a9ac361befc']
1676799859.121139 2023-02-19 17:44:19 INFO MainProcess-18416 MainThread-18036 __main__:19 <module> STARTING FL FLOW...
1676799859.123143 2023-02-19 17:44:19 DEBUG MainProcess-18416 MainThread-18036 flox.controllers.MainController:166 on_model_init No executor was provided, trying to retrieve the provided executor type funcx from the list of available executors: {'local': <class 'concurrent.futures.thread.ThreadPoolExecutor'>, 'funcx': <class 'funcx.sdk.executor.FuncXExecutor'>}
1676799859.124136 2023-02-19 17:44:19 DEBUG MainProcess-18416 MainThread-18036 flox.controllers.MainController:170 on_model_init The selected executor is <class 'funcx.sdk.executor.FuncXExecutor'>
1676799860.556572 2023-02-19 17:44:20 DEBUG MainProcess-18416 MainThread-18036 flox.controllers.MainController:209 on_model_broadcast Launching the <class 'funcx.sdk.exec

{'loss': 2.019030746459961, 'metrics': {'accuracy': 0.46050000190734863}}


1676799890.953324 2023-02-19 17:44:50 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:215 on_model_broadcast Starting to broadcast a task to endpoint c7487b2b-b129-47e2-989b-5a9ac361befc
1676799892.986187 2023-02-19 17:44:52 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:247 on_model_broadcast Deployed the task to endpoint c7487b2b-b129-47e2-989b-5a9ac361befc
1676799902.897734 2023-02-19 17:45:02 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:290 on_model_receive Starting to retrieve results from endpoints
1676799902.899697 2023-02-19 17:45:02 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:305 on_model_receive Finished retrieving all results from the endpoints
1676799902.901691 2023-02-19 17:45:02 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:335 on_model_aggregate Finished aggregating weights
1676799902.909693 2023-02-19 17:45:02 INFO MainProcess-18416 MainThrea

{'loss': 1.882064624786377, 'metrics': {'accuracy': 0.4880000054836273}}


1676799904.539999 2023-02-19 17:45:04 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:215 on_model_broadcast Starting to broadcast a task to endpoint c7487b2b-b129-47e2-989b-5a9ac361befc
1676799906.335784 2023-02-19 17:45:06 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:247 on_model_broadcast Deployed the task to endpoint c7487b2b-b129-47e2-989b-5a9ac361befc
1676799914.913979 2023-02-19 17:45:14 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:290 on_model_receive Starting to retrieve results from endpoints
1676799914.916040 2023-02-19 17:45:14 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:305 on_model_receive Finished retrieving all results from the endpoints
1676799914.920983 2023-02-19 17:45:14 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:335 on_model_aggregate Finished aggregating weights
1676799914.928975 2023-02-19 17:45:14 INFO MainProcess-18416 MainThrea

{'loss': 1.7488623313903808, 'metrics': {'accuracy': 0.5274999737739563}}


### Real Endpoint (FuncX) Execution with Running Average

When we have lots and lots of endpoints, aggregating all of their updated model weights at the same time might be computationally heavy and time consuming. Thus, we can utilize the time by aggregating the models as they come back from the endpoints. In this example, we change the ``running_average`` variable to ``True`` in flox_controller and run the same experiment again.

In [7]:
flox_controller = TensorflowController(
    endpoint_ids=eps,
    num_samples=100,
    epochs=1,
    rounds=3,
    client_logic=tf_client,
    global_model=global_model,
    executor_type="funcx",  # choose "funcx" for FuncXExecutor, "local" for ThreadPoolExecutor
    model_trainer=tf_trainer,
    x_test=x_test,
    y_test=y_test,
    data_source="keras",
    dataset_name="fashion_mnist",
    running_average=True,
)

logger.info("STARTING FL FLOW...")
flox_controller.run_federated_learning()

1676799978.286873 2023-02-19 17:46:18 INFO MainProcess-18416 MainThread-18036 __main__:17 <module> STARTING FL FLOW...
1676799978.287877 2023-02-19 17:46:18 DEBUG MainProcess-18416 MainThread-18036 flox.controllers.MainController:166 on_model_init No executor was provided, trying to retrieve the provided executor type funcx from the list of available executors: {'local': <class 'concurrent.futures.thread.ThreadPoolExecutor'>, 'funcx': <class 'funcx.sdk.executor.FuncXExecutor'>}
1676799978.288876 2023-02-19 17:46:18 DEBUG MainProcess-18416 MainThread-18036 flox.controllers.MainController:170 on_model_init The selected executor is <class 'funcx.sdk.executor.FuncXExecutor'>
1676799979.793795 2023-02-19 17:46:19 DEBUG MainProcess-18416 MainThread-18036 flox.controllers.MainController:209 on_model_broadcast Launching the <class 'funcx.sdk.executor.FuncXExecutor'> executor
1676799981.146563 2023-02-19 17:46:21 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:215 on_mod

{'loss': 1.6583860664367676, 'metrics': {'accuracy': 0.41100001335144043}}


1676800001.818598 2023-02-19 17:46:41 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:215 on_model_broadcast Starting to broadcast a task to endpoint c7487b2b-b129-47e2-989b-5a9ac361befc
1676800004.060041 2023-02-19 17:46:44 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:247 on_model_broadcast Deployed the task to endpoint c7487b2b-b129-47e2-989b-5a9ac361befc
1676800013.476712 2023-02-19 17:46:53 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:437 tasks_to_running_average Starting to retrieve results from endpoints
1676800013.478765 2023-02-19 17:46:53 DEBUG MainProcess-18416 MainThread-18036 flox.controllers.MainController:449 tasks_to_running_average the running average is NONE, instantiating it for the first time
1676800013.481714 2023-02-19 17:46:53 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:475 tasks_to_running_average Finished retrieving all results from the endpoints
1676800013.

{'loss': 1.476961688041687, 'metrics': {'accuracy': 0.6549999713897705}}


1676800015.112986 2023-02-19 17:46:55 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:215 on_model_broadcast Starting to broadcast a task to endpoint c7487b2b-b129-47e2-989b-5a9ac361befc
1676800016.985120 2023-02-19 17:46:56 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:247 on_model_broadcast Deployed the task to endpoint c7487b2b-b129-47e2-989b-5a9ac361befc
1676800025.817783 2023-02-19 17:47:05 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:437 tasks_to_running_average Starting to retrieve results from endpoints
1676800025.821797 2023-02-19 17:47:05 DEBUG MainProcess-18416 MainThread-18036 flox.controllers.MainController:449 tasks_to_running_average the running average is NONE, instantiating it for the first time
1676800025.823785 2023-02-19 17:47:05 INFO MainProcess-18416 MainThread-18036 flox.controllers.MainController:475 tasks_to_running_average Finished retrieving all results from the endpoints
1676800025.

{'loss': 1.4497999477386474, 'metrics': {'accuracy': 0.38749998807907104}}
