# High-performance simulations with TFF

This tutorial will describe how to setup high-performance simulations with TFF
in a variety of common scenarios.

TODO(b/134543154): Populate the content, some of the things to cover here:
- using GPUs in a single-machine setup,
- multi-machine setup on GCP/GKE, with and without TPUs,
- interfacing MapReduce-like backends,
- current limitations and when/how they will be relaxed.

## Before we begin

First, make sure your notebook is connected to a backend that has the relevant
components (including gRPC dependencies for multi-machine scenarios) compiled.

Now, let's start by loading the MNIST example from the TFF website, and
declaring the Python function that will run a small experiment loop over
a group of 10 clients.

In [1]:
#@test {"skip": true}
!pip install --quiet --upgrade tensorflow_federated

/bin/sh: pip: command not found


In [0]:
import collections
import time

import tensorflow as tf
tf.compat.v1.enable_v2_behavior()

import tensorflow_federated as tff

source, _ = tff.simulation.datasets.emnist.load_data()


def map_fn(example):
  return collections.OrderedDict(
      x=tf.reshape(example['pixels'], [-1, 784]), y=example['label'])


def client_data(n):
  ds = source.create_tf_dataset_for_client(source.client_ids[n])
  return ds.repeat(10).shuffle(500).batch(20).map(map_fn)


train_data = [client_data(n) for n in range(10)]
batch = tf.nest.map_structure(lambda x: x.numpy(), next(iter(train_data[0])))


def model_fn():
  model = tf.keras.models.Sequential([
      tf.keras.layers.Input(shape=(784,)),
      tf.keras.layers.Dense(units=10, kernel_initializer='zeros'),
      tf.keras.layers.Softmax(),
  ])
  return tff.learning.from_keras_model(
      model,
      dummy_batch=batch,
      loss=tf.keras.losses.SparseCategoricalCrossentropy(),
      metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])


trainer = tff.learning.build_federated_averaging_process(
    model_fn, client_optimizer_fn=lambda: tf.keras.optimizers.SGD(0.02))


def evaluate(num_rounds=10):
  state = trainer.initialize()
  for _ in range(num_rounds):
    t1 = time.time()
    state, metrics = trainer.next(state, train_data)
    t2 = time.time()
    print('loss {}, round time {}'.format(metrics.loss, t2 - t1))

## Single-machine simulations

Now on by default.

In [3]:
evaluate()

loss 2.962388753890991, round time 3.759093999862671
loss 2.681041717529297, round time 2.3295602798461914
loss 2.4773595333099365, round time 2.4395787715911865
loss 2.259058713912964, round time 2.2856998443603516
loss 2.085496425628662, round time 2.356675863265991
loss 1.9388854503631592, round time 2.3676540851593018
loss 1.7678184509277344, round time 2.716273784637451
loss 1.6329854726791382, round time 2.3285439014434814
loss 1.5447049140930176, round time 2.20774507522583
loss 1.3957462310791016, round time 2.758118152618408


## Multi-machine simulations on GCP/GKE, GPUs, TPUs, and beyond...

Coming very soon.