# Quickstart Tensorflow Tutorial
In this tutorial, you will see how to use flox to run FL experiments on Tensorflow using first a local executor and then using real physical endpoints. We will train our model to classify instances from the Fashion MNIST dataset.

In [2]:
import logging
import os

import tensorflow as tf
from funcx import FuncXExecutor
from tensorflow import keras

from flox.clients.TensorflowClient import TensorflowClient
from flox.controllers.TensorflowController import TensorflowController
from flox.model_trainers.TensorflowTrainer import TensorflowTrainer

logger = logging.getLogger(__name__)

### Getting Test Data
First, let's get some test data by using the ``get_test_data`` function from flox/utils.

In [3]:
from flox.utils import get_test_data
x_test, y_test = get_test_data(keras_dataset="fashion_mnist", num_samples=2000)

### Defining the Model

Next, let's define our Tensorflow model architecture and compile it.

In [4]:
# `fashion_mnist` images are grayscale, 28 x 28 pixels in size
input_shape = (28, 28, 1)
# there are 10 classes in the dataset
num_classes = 10

# define the model architecture
global_model = tf.keras.Sequential(
    [
        tf.keras.Input(shape=input_shape),
        tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(num_classes, activation="softmax"),
    ]
)

# compile the model
global_model.compile(
    loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
)

### Instantiating Model Trainer and Client instances

Next, we will initialize an instance of a Tensorflow Model Trainer and Client. You can check out their implementation under ``flox/model_trainers`` and ``flox/clients``, respectively. You can also extend or modify these classes to fit your needs.

In [5]:
tf_trainer = TensorflowTrainer()
tf_client = TensorflowClient()

### Instantiating the Controller (Local Execution)

Now, let's define our endpoints and initialize the PyTorch *Controller* that will do the heavy lifting of deploying tasks to the endpoints. We will run three rounds of FL, with 100 samples and 1 training epoch on each device. Note that we are specifying ``executor_type`` to "local", which will use ``concurrent.futures.ThreadPoolExecutor`` to execute the tasks locally. We are also providing the dataset name and the test data. Finally, we'll launch the experiment.

In [6]:
# since we are first executing the experiment locally, it does not matter what we name the endpoints:
eps = ["simulated_endpoint_1", "simulated_endpoint_2", "simulated_endpoint_3"]
logger.info(f"Endpoints: {eps}")

flox_controller = TensorflowController(
    endpoint_ids=eps,
    num_samples=100,
    epochs=1,
    rounds=3,
    client_logic=tf_client,
    global_model=global_model,
    executor_type="local",  # choose "funcx" for FuncXExecutor, "local" for ThreadPoolExecutor
    model_trainer=tf_trainer,
    x_test=x_test,
    y_test=y_test,
    data_source="keras",
    dataset_name="fashion_mnist",
)

logger.info("STARTING FL FLOW...")
flox_controller.run_federated_learning()

1676525788.598553 2023-02-16 13:36:28 INFO MainProcess-17968 MainThread-25440 __main__:3 <module> Endpoints: ['simulated_endpoint_1', 'simulated_endpoint_2', 'simulated_endpoint_3']
1676525788.601554 2023-02-16 13:36:28 INFO MainProcess-17968 MainThread-25440 __main__:20 <module> STARTING FL FLOW...
1676525788.602553 2023-02-16 13:36:28 DEBUG MainProcess-17968 MainThread-25440 flox.controllers.MainController:166 on_model_init No executor was provided, trying to retrieve the provided executor type local from the list of available executors: {'local': <class 'concurrent.futures.thread.ThreadPoolExecutor'>, 'funcx': <class 'funcx.sdk.executor.FuncXExecutor'>}
1676525788.604568 2023-02-16 13:36:28 DEBUG MainProcess-17968 MainThread-25440 flox.controllers.MainController:170 on_model_init The selected executor is <class 'concurrent.futures.thread.ThreadPoolExecutor'>
1676525790.258576 2023-02-16 13:36:30 DEBUG MainProcess-17968 MainThread-25440 flox.controllers.MainController:209 on_model_br

Train on 100 samples


1676525805.639392 2023-02-16 13:36:45 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:290 on_model_receive Starting to retrieve results from endpoints
1676525805.640367 2023-02-16 13:36:45 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:305 on_model_receive Finished retrieving all results from the endpoints
1676525805.641366 2023-02-16 13:36:45 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:335 on_model_aggregate Finished aggregating weights
1676525805.643367 2023-02-16 13:36:45 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:398 run_federated_learning Round 0 evaluation results: 
1676525806.000365 2023-02-16 13:36:46 DEBUG MainProcess-17968 MainThread-25440 flox.controllers.MainController:209 on_model_broadcast Launching the <class 'concurrent.futures.thread.ThreadPoolExecutor'> executor
1676525806.001364 2023-02-16 13:36:46 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainCont

{'loss': 2.24061008644104, 'metrics': {'accuracy': 0.16899999976158142}}


1676525820.074437 2023-02-16 13:37:00 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:247 on_model_broadcast Deployed the task to endpoint simulated_endpoint_1


Train on 100 samples


1676525820.976432 2023-02-16 13:37:00 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:290 on_model_receive Starting to retrieve results from endpoints
1676525820.978425 2023-02-16 13:37:00 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:305 on_model_receive Finished retrieving all results from the endpoints
1676525820.980426 2023-02-16 13:37:00 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:335 on_model_aggregate Finished aggregating weights
1676525820.983428 2023-02-16 13:37:00 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:398 run_federated_learning Round 1 evaluation results: 
1676525821.285425 2023-02-16 13:37:01 DEBUG MainProcess-17968 MainThread-25440 flox.controllers.MainController:209 on_model_broadcast Launching the <class 'concurrent.futures.thread.ThreadPoolExecutor'> executor
1676525821.286425 2023-02-16 13:37:01 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainCont

{'loss': 2.1857838592529295, 'metrics': {'accuracy': 0.2224999964237213}}


1676525841.770951 2023-02-16 13:37:21 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:247 on_model_broadcast Deployed the task to endpoint simulated_endpoint_1


Train on 100 samples


1676525842.654240 2023-02-16 13:37:22 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:290 on_model_receive Starting to retrieve results from endpoints
1676525842.655225 2023-02-16 13:37:22 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:305 on_model_receive Finished retrieving all results from the endpoints
1676525842.658227 2023-02-16 13:37:22 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:335 on_model_aggregate Finished aggregating weights
1676525842.660224 2023-02-16 13:37:22 INFO MainProcess-17968 MainThread-25440 flox.controllers.MainController:398 run_federated_learning Round 2 evaluation results: 


{'loss': 2.107709077835083, 'metrics': {'accuracy': 0.27549999952316284}}


### Real Endpoint (FuncX) Execution 

Now, let's switch "endpoint_type" to "funcx" and provide UUID of actual funcX endpoints. Make sure to follow instructions in this directory's README to set up your clients. 

In [None]:
eps = ["fb93a1c2-a8d7-49f3-ad59-375f4e298784", "c7487b2b-b129-47e2-989b-5a9ac361befc"]
logger.info(f"Endpoints: {eps}")

flox_controller = TensorflowController(
    endpoint_ids=eps,
    num_samples=100,
    epochs=1,
    rounds=3,
    client_logic=tf_client,
    global_model=global_model,
    executor_type="funcx",  # choose "funcx" for FuncXExecutor, "local" for ThreadPoolExecutor
    model_trainer=tf_trainer,
    x_test=x_test,
    y_test=y_test,
    data_source="keras",
    dataset_name="fashion_mnist",
)

logger.info("STARTING FL FLOW...")
flox_controller.run_federated_learning()

### Real Endpoint (FuncX) Execution with Running Average

When we have lots and lots of endpoints, aggregating all of their updated model weights at the same time might be computationally heavy and time consuming. Thus, we can utilize the time by aggregating the models as they come back from the endpoints. In this example, we change the ``running_average`` variable to ``True`` in flox_controller and run the same experiment again.

In [None]:
flox_controller = TensorflowController(
    endpoint_ids=eps,
    num_samples=100,
    epochs=1,
    rounds=3,
    client_logic=tf_client,
    global_model=global_model,
    executor_type="local",  # choose "funcx" for FuncXExecutor, "local" for ThreadPoolExecutor
    model_trainer=tf_trainer,
    x_test=x_test,
    y_test=y_test,
    data_source="keras",
    dataset_name="fashion_mnist",
    running_average=True,
)

logger.info("STARTING FL FLOW...")
flox_controller.run_federated_learning()