# Multi GPU Training in TensorFlow

![TensorFlow Logo](https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/tensorflow.png "doc-image")

## Overview

This example is an extension of the example of using [Single GPU Tensorflow](./01-tensorflow-single-gpu.ipynb) to train a Resnet50 architecture on Birds dataset to identify different species of birds, only here we will be using multiple GPUs.  We will parallelize the learning by using TensorFlow Mirrored Strategy. The model will split the data in each batch (sometimes called “global batch”) across the GPUs (making “worker batches”). Each GPU has a copy of the model, called a “replica”, and while they learn on different parts of each batch, they will combine the learned gradients at the end of the step. They stay synchronized this way, and the result at the end of training is one model that has learned on all the data.

This dataset constitutes 40,000+ birds and has been taken from [kaggle](https://www.kaggle.com/gpiosenka/100-bird-species).

We recommend to spin up a bigger EC2 instance now, with multiple GPUs, so we can distribute the training work across these and have them working simultaneously, all training the same model.


In [None]:
import tensorflow as tf
import keras
import time
import matplotlib.pyplot as plt

The dataset originally had 285 classes. We have taken subset of this data which has 61 classes . The data is stored in AWS S3.The first time you run this job, you'll need to download the training and test data in the code chunk above.

In [None]:
import s3fs

s3 = s3fs.S3FileSystem(anon=True)
_ = s3.get(
    rpath="s3://saturn-public-data/100-bird-species/100-bird-species/*/*/*.jpg",
    lpath="dataset/birds/",
)

Our datasat has already neatly separated out training, test, and validation samples. In code below we are constructing Keras data object for training and validation set using keras.preprocessing.image_dataset_from_directory . We have chosen Adam optimizer and have set learning rate to 0.02. We are training our classifier with ResNet50 architecture, which has 48 Convolution layers along with 1 MaxPool and 1 Average Pool layer. It’s learning from each image, updating its model gradients, and gradually improving its performance as it goes. The model is being compiled,trained and saved at path 'model/keras_multi/'. This is actually incredibly similar to our single GPU approach. We’re just applying the Mirrored Strategy scope around our model definition and compilation stage, so that TensorFlow knows this model is to be trained on multiple GPUs.

In [None]:
def train_multigpu(n_epochs, classes, base_lr, batchsize,scale_batch = False, scale_lr = False):
    

    strategy = tf.distribute.MirroredStrategy()
    print('Number of devices: %d' % strategy.num_replicas_in_sync)

    
    
    with strategy.scope():
        model = tf.keras.applications.ResNet50(
            include_top=True,
            weights=None,
            classes=classes)

        optimizer = keras.optimizers.Adam(lr=base_lr)
        model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

    # Data
    train_ds = tf.keras.preprocessing.image_dataset_from_directory(
        'dataset/birds/train',
        image_size=(224,224),
        batch_size=batchsize
    ).prefetch(2).cache().shuffle(1000)
    
    # printing sample birds images
    for birds, labels in train_ds.take(1):
        plt.figure(figsize=(18, 18))
        for i in range(9):
            plt.subplot(3, 3, i + 1)
            plt.imshow(birds[i].numpy().astype("uint8"))
            plt.axis("off")
    plt.show()
    
    valid_ds = tf.keras.preprocessing.image_dataset_from_directory(
        'dataset/birds/valid',
        image_size=(224,224),
        batch_size=batchsize
    ).prefetch(2)
    
    start = time.time()

    model.fit(
        train_ds, 
        epochs=n_epochs, 
        validation_data=valid_ds,
    )

    end = time.time()-start
    print("model training time", end)
    
    tf.keras.models.save_model(model, 'model/keras_multi/')

In code below we are setting up necessary parameters . We are only running a few epochs, to save time, but once you've got this working you'll have all the information you need to build and run bigger Tensorflow models on Saturn Cloud. Make sure that you take full advantage of multi GPUs processing power by increasing batch size so that GPUs can be kept busy (but without overrunning our RAM). 

In [None]:
model_params = {'n_epochs': 3, 
                'base_lr': .02,
               'batchsize': 64,
                   'classes':61,
               'scale_batch': True}


The code below runs the model training process, and saves your trained model object to the Jupyter instance memory. A folder called `model` will be created and populated for you. Also on running the training function you can see below some beautiful bird's of various species!

In [None]:
tester_plain = train_multigpu( **model_params)