<a href="https://colab.research.google.com/github/sourcecode369/TensorFlow-2.0/blob/master/tensorflow_2.0_docs/TensorFlow%20Core/Tutorials/Distributed%20Training/Multi-worker%20Training%20with%20Keras/TensorFlow_2_0_Distributed_Training_Multi_worker_training_with_Keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install --upgrade tensorflow-gpu

Collecting tensorflow-gpu
[?25l  Downloading https://files.pythonhosted.org/packages/25/44/47f0722aea081697143fbcf5d2aa60d1aee4aaacb5869aee2b568974777b/tensorflow_gpu-2.0.0-cp36-cp36m-manylinux2010_x86_64.whl (380.8MB)
[K     |████████████████████████████████| 380.8MB 43kB/s 
Collecting tensorboard<2.1.0,>=2.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/d3/9e/a48cd34dd7b672ffc227b566f7d16d63c62c58b542d54efa45848c395dd4/tensorboard-2.0.1-py3-none-any.whl (3.8MB)
[K     |████████████████████████████████| 3.8MB 28.7MB/s 
Collecting tensorflow-estimator<2.1.0,>=2.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/fc/08/8b927337b7019c374719145d1dceba21a8bb909b93b1ad6f8fb7d22c1ca1/tensorflow_estimator-2.0.1-py2.py3-none-any.whl (449kB)
[K     |████████████████████████████████| 450kB 50.6MB/s 
Collecting google-auth<2,>=1.6.3
[?25l  Downloading https://files.pythonhosted.org/packages/c5/9b/ed0516cc1f7609fb0217e3057ff4f0f9f3e3ce79a369c6af4a6c5ca25664/google_

In [0]:
from __future__ import absolute_import, print_function, division, unicode_literals
import tensorflow_datasets as tfds
import tensorflow as tf
tfds.disable_progress_bar()

In [2]:
BUFFER_SIZE = 10000
BATCH_SIZE = 64

def make_datasets_unbatched():
  def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255
    return image, label

  datasets, info = tfds.load(name='mnist', 
                             with_info=True,
                             as_supervised=True)
  return datasets['train'].map(scale).cache().shuffle(BUFFER_SIZE)

train_datasets = make_datasets_unbatched().batch(BATCH_SIZE)

[1mDownloading and preparing dataset mnist (11.06 MiB) to /root/tensorflow_datasets/mnist/1.0.0...[0m




Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`


Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`


[1mDataset mnist downloaded and prepared to /root/tensorflow_datasets/mnist/1.0.0. Subsequent calls will reuse this data.[0m


In [0]:
def build_and_compile_cnn_model():
  model = tf.keras.Sequential([
                               tf.keras.layers.Conv2D(32, 3, activation=tf.nn.relu, input_shape=(28,28,1)),
                               tf.keras.layers.MaxPooling2D(),
                               tf.keras.layers.Flatten(),
                               tf.keras.layers.Dense(64,activation=tf.nn.relu),
                               tf.keras.layers.Dense(10, activation="softmax")
  ])
  model.compile(loss=tf.keras.losses.sparse_categorical_crossentropy,
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
                metrics = ["accuracy"]
                )
  return model

In [4]:
single_worker_model = build_and_compile_cnn_model()
history = single_worker_model.fit(train_datasets, 
                                  epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [0]:
# import os
# import json
# os.environ["TF_CONFIG"] = json.dumps(
#     {
#         'cluster': {
#             'worker': ["localhost:12345", "localhost:23456"]
#         },
#         'task': {
#             'type':'worker', 'index':0
#         }
#     }
# )

### Choosing the right strategy

In [6]:
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy(communication=tf.distribute.experimental.CollectiveCommunication.RING)





INFO:tensorflow:Single-worker CollectiveAllReduceStrategy with local_devices = ('/device:GPU:0',), communication = CollectiveCommunication.RING


INFO:tensorflow:Single-worker CollectiveAllReduceStrategy with local_devices = ('/device:GPU:0',), communication = CollectiveCommunication.RING


In [8]:
print(f"Number of parallel devices: {strategy.num_replicas_in_sync}")

Number of parallel devices: 1


In [10]:
%%time
NUM_WORKERS = 2
GLOBAL_BATCH_SIZE = 64 * NUM_WORKERS
with strategy.scope():
  train_dataset = make_datasets_unbatched().batch(GLOBAL_BATCH_SIZE)
  multi_worker_model = build_and_compile_cnn_model()
multi_worker_model.fit(train_dataset,epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3
CPU times: user 32.5 s, sys: 6.52 s, total: 39 s
Wall time: 27.9 s


### Dataset sharding and batch size

In [0]:
options = tf.data.Options()
options.experimental_distribute.auto_shard = False
train_datasets_no_auto_shard = train_datasets.with_options(options)

### Performance

##### 1. Use different collective communication implementations. 

    tf.distribute.experimental.CollectiveCommunication.NCCL
    tf.distribute.experimental.CollectiveCommunication.RING
    tf.distribute.experimental.CollectiveCommunication.AUTO

##### 2. Cast the variables to `tf.float`

### Fault Tolerance

In synchronous training, the cluster would fail if one of the workers fail and no fault tolerance mechanism exists.
Using tf.distribute.Strategy comes with the advantage of fault tolerance i.e. where workers die or are otherwise unstable.

This is done by preserving the training state in the distributed file system such that upon restart of the instance that previously failed or preempted, the training is recovererd.

### Model Checkpoint Callback

In [14]:
%%time
callbacks = [tf.keras.callbacks.ModelCheckpoint(filepath='/tmp/keras-ckpt/')]

with strategy.scope():
  option = tf.data.Options()
  option.experimental_distribute.auto_shard = True
  dataset = make_datasets_unbatched().batch(GLOBAL_BATCH_SIZE)
  dataset_with_shard = dataset.with_options(option)
  multiworker_model = build_and_compile_cnn_model()
  multiworker_model.fit(dataset_with_shard,epochs=3,callbacks=callbacks)

Epoch 1/3
Instructions for updating:
If using Keras pass *_constraint arguments to layers.


Instructions for updating:
If using Keras pass *_constraint arguments to layers.


INFO:tensorflow:Assets written to: /tmp/keras-ckpt/assets


INFO:tensorflow:Assets written to: /tmp/keras-ckpt/assets


Epoch 2/3


INFO:tensorflow:Assets written to: /tmp/keras-ckpt/assets


Epoch 3/3


INFO:tensorflow:Assets written to: /tmp/keras-ckpt/assets


CPU times: user 32.3 s, sys: 6.38 s, total: 38.7 s
Wall time: 29 s
