# TF-Slim Walkthrough for multi-GPU setup

This is an adaptation of the Flowers training example in slim_walkthrough.ipynb, but allows you to split the training across multiple GPUs.

The tower loss and training loop code is taken from the CIFAR demo at https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py

## Defining properties of our system

How many GPUs we have, batch size on each GPU, etc.

These settings are for two GeForce GTX 1080 Graphics Cards. Depending on the number and spec of your graphics cards you will have to adjust NUM_GPUS and BATCH_SIZE.

In [14]:
BATCH_SIZE = 32 # How many images can pass through *a single GPU*
                 # (if they are different specs you'll have to adapt the script)
MAX_STEPS = 1000
NUM_GPUS = 8
url = "http://download.tensorflow.org/data/flowers.tar.gz"
flowers_data_dir = '/tmp/flowers'
dataset=flowers
dataset_dir=flowers_data_dir

## Preparing the data

If you've already worked through slim_walkthrough.ipynb you will have this data on your system.

### Download the Flowers Dataset
<a id='DownloadFlowers'></a>

We've made available a tarball of the Flowers dataset which has already been converted to TFRecord format.

In [15]:
import tensorflow as tf
from datasets import dataset_utils



if not tf.gfile.Exists(flowers_data_dir):
    tf.gfile.MakeDirs(flowers_data_dir)

dataset_utils.download_and_uncompress_tarball(url, flowers_data_dir) 

>> Downloading flowers.tar.gz 100.0%
Successfully downloaded flowers.tar.gz 228649660 bytes.


### Download the Inception V1 checkpoint




In [16]:
from datasets import dataset_utils

url = "http://download.tensorflow.org/models/inception_v1_2016_08_28.tar.gz"
checkpoints_dir = '/tmp/checkpoints'

if not tf.gfile.Exists(checkpoints_dir):
    tf.gfile.MakeDirs(checkpoints_dir)

dataset_utils.download_and_uncompress_tarball(url, checkpoints_dir)

>> Downloading inception_v1_2016_08_28.tar.gz 100.0%
Successfully downloaded inception_v1_2016_08_28.tar.gz 24642554 bytes.


## Define the methods we'll need during training

The batch method is slightly different from the single GPU case as we pass a data_provider which can give images as a stream, where the dataset has already been located on disk - so that we don't have two data_provider objects trying to access our Tfrecord files concurrently.

In [17]:
from preprocessing import inception_preprocessing
import tensorflow as tf

slim = tf.contrib.slim


def load_batch(data_provider, batch_size=32, height=299, width=299, is_training=False):
    """Loads a single batch of data.
    
    Args:
      data_provider: The dataset to load.
      batch_size: The number of images in the batch.
      height: The size of each image after preprocessing.
      width: The size of each image after preprocessing.
      is_training: Whether or not we're currently training or evaluating.
    
    Returns:
      images: A Tensor of size [batch_size, height, width, 3], image samples that have been preprocessed.
      images_raw: A Tensor of size [batch_size, height, width, 3], image samples that can be used for visualization.
      labels: A Tensor of size [batch_size], whose values range between 0 and dataset.num_classes.
    """
    
    image_raw, label = data_provider.get(['image', 'label'])
    
    # Preprocess image for usage by Inception.
    image = inception_preprocessing.preprocess_image(image_raw, height, width, is_training=is_training, fast_mode=True)
    
    # Preprocess the image for display purposes.
    image_raw = tf.expand_dims(image_raw, 0)
    image_raw = tf.image.resize_images(image_raw, [height, width])
    image_raw = tf.squeeze(image_raw)

    # Batch it up.
    images, images_raw, labels = tf.train.batch(
          [image, image_raw, label],
          batch_size=batch_size,
          num_threads=1,
          capacity=2 * batch_size)
    
    return images, images_raw, labels

The loss is slightly different as we will need to call our loss function once for each GPU, so they will reside in different scopes.

In [18]:
from tensorflow.python.ops import math_ops

def get_total_loss(scope, name="total_loss"):
    losses = slim.losses.get_losses(scope=scope)
    losses += slim.losses.get_regularization_losses(scope=scope)
    return math_ops.add_n(losses, name=name)

Every GPU instance is in an abstraction called a "tower" which has its own scope. So we have a function to get the loss for a given tower.

In [19]:
def tower_loss(scope, data_provider):
    """Calculate the total loss on a single tower running the model.

    Args:
    scope: unique prefix string identifying the tower, e.g. 'tower_0'

    Returns:
    Tensor of shape [] containing the total loss for a batch of data
    """    
   
    images, _, labels = load_batch(data_provider, batch_size=BATCH_SIZE, height=image_size, width=image_size, is_training=True)
    
    # Create the model, use the default arg scope to configure the batch norm parameters.
    with slim.arg_scope(inception.inception_v1_arg_scope()):
        logits, _ = inception.inception_v1(images, num_classes=dataset.num_classes, is_training=True)
    
    one_hot_labels = slim.one_hot_encoding(labels, dataset.num_classes, scope=scope)
    
    slim.losses.softmax_cross_entropy(logits, one_hot_labels, scope=scope)
    
    total_loss = get_total_loss(scope)
    
    return total_loss

When the losses have been calculated on each tower, we need to calculate the gradients for each tower too and average them before updating our model.

In [20]:
def average_gradients(tower_grads):
    """Calculate the average gradient for each shared variable across all towers.

    Note that this function provides a synchronization point across all towers.

    Args:
    tower_grads: List of lists of (gradient, variable) tuples. The outer list
    is over individual gradients. The inner list is over the gradient
    calculation for each tower.
    Returns:
    List of pairs of (gradient, variable) where the gradient has been averaged
    across all towers.
    """
    average_grads = []
    for grad_and_vars in zip(*tower_grads):
        # Note that each grad_and_vars looks like the following:
        #   ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN))
        grads = []
        for g, _ in grad_and_vars:
            # Add 0 dimension to the gradients to represent the tower.
            expanded_g = tf.expand_dims(g, 0)

            # Append on a 'tower' dimension which we will average over below.
            grads.append(expanded_g)

        # Average over the 'tower' dimension.
        grad = tf.concat(0, grads)
        grad = tf.reduce_mean(grad, 0)

        # Keep in mind that the Variables are redundant because they are shared
        # across towers. So .. we will just return the first tower's pointer to
        # the Variable.
        v = grad_and_vars[0][1]
        grad_and_var = (grad, v)
        average_grads.append(grad_and_var)
    return average_grads

### Now fine-tune the model

We will fine tune the inception model on the Flowers dataset.

In [21]:
import os

from datasets import flowers
from nets import inception
from preprocessing import inception_preprocessing
import time
import numpy as np
from datetime import datetime

slim = tf.contrib.slim
image_size = inception.inception_v1.default_image_size


def get_init_fn():
    """Returns a function run by the chief worker to warm-start the training."""
    checkpoint_exclude_scopes=["InceptionV1/Logits", "InceptionV1/AuxLogits"]
    
    exclusions = [scope.strip() for scope in checkpoint_exclude_scopes]

    variables_to_restore = []
    for var in slim.get_model_variables():
        excluded = False
        for exclusion in exclusions:
            if var.op.name.startswith(exclusion):
                excluded = True
                break
        if not excluded:
            variables_to_restore.append(var)

    return slim.assign_from_checkpoint_fn(
      os.path.join(checkpoints_dir, 'inception_v1.ckpt'),
      variables_to_restore)


In [22]:
train_dir = '/tmp/inception_finetuned/'

In [23]:
with tf.Graph().as_default():
    tf.logging.set_verbosity(tf.logging.INFO)
    
    global_step = tf.get_variable("global_step", [], initializer=tf.constant_initializer(0),
                                 trainable=False)
    
    # Specify the optimizer and create the train op:    
    optimizer = tf.train.AdadeltaOptimizer(learning_rate=0.001)
    
    # Here you can substitute the flowers dataset for your own dataset.
    dataset = flowers.get_split('train', flowers_data_dir)    
    print ("number of classes: ", dataset.num_classes)
    data_provider = slim.dataset_data_provider.DatasetDataProvider(
        dataset, common_queue_capacity=32,
        common_queue_min=8, shuffle=True)
    
    
    tower_grads = []
    losses = []
    for i in range(NUM_GPUS):
        with tf.device("/gpu:" + str(i)):
            with tf.name_scope("tower_" + str(i)) as scope:
                loss = tower_loss(scope, data_provider)
                losses.append(loss)
                
                tf.get_variable_scope().reuse_variables()
                
                grads = optimizer.compute_gradients(loss)
                
                tower_grads.append(grads)
                
    grads = average_gradients(tower_grads)
    
    apply_gradient_op = optimizer.apply_gradients(grads, global_step=global_step)
    
    train_op = apply_gradient_op
    
    saver = tf.train.Saver(tf.all_variables())
    
    # Build an initialization operation to run below.
    init = tf.initialize_all_variables()

    # Start running operations on the Graph.
    sess = tf.Session(config=tf.ConfigProto(
        allow_soft_placement=True,
        log_device_placement=False))
    sess.run(init)
    
    init_fn = get_init_fn()
    init_fn(sess)

    # Start the queue runners.
    tf.train.start_queue_runners(sess=sess)

    summary_writer = tf.train.SummaryWriter(train_dir, sess.graph)

    for step in range(MAX_STEPS):
        start_time = time.time()
        
        # This code gets the average loss, and the losses on GPUs 1 and 2, to print.
        # If you have more GPUs then you will need to adapt it.
        _, loss_value = sess.run([train_op, loss])
        duration = time.time() - start_time

        assert not np.isnan(loss_value), 'Model diverged with loss = NaN'

        if step % 10 == 0:
            num_examples_per_step = BATCH_SIZE * NUM_GPUS
            examples_per_sec = num_examples_per_step / duration
            sec_per_batch = float(duration) / NUM_GPUS

            format_str = ('%s: step %d, loss = %.2f  (%.1f examples/sec; %.3f '
                          'sec/batch)')
            print (format_str % (datetime.now(), step, loss_value, 
                                 examples_per_sec, sec_per_batch))

        # Save the model checkpoint periodically.
        if step % 1000 == 0 or (step + 1) == MAX_STEPS:
            checkpoint_path = os.path.join(train_dir, 'model.ckpt')
            saver.save(sess, checkpoint_path, global_step=step)
        
print('Finished training. Last batch loss %f' % final_loss)

('number of classes: ', 5)
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.histogram uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.histogram uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.histogram uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Instructions for updating:
Please switch to tf.summary.image. No

NameError: name 'final_loss' is not defined

### Apply fine tuned model to some images.

In [24]:
import numpy as np
import tensorflow as tf
from datasets import flowers
from nets import inception

slim = tf.contrib.slim

image_size = inception.inception_v1.default_image_size
batch_size = 3

with tf.Graph().as_default():
    tf.logging.set_verbosity(tf.logging.INFO)
    
    dataset = flowers.get_split('train', flowers_data_dir)
    images, images_raw, labels = load_batch(dataset, height=image_size, width=image_size)
    
    # Create the model, use the default arg scope to configure the batch norm parameters.
    with slim.arg_scope(inception.inception_v1_arg_scope()):
        logits, _ = inception.inception_v1(images, num_classes=dataset.num_classes, is_training=True)

    probabilities = tf.nn.softmax(logits)
    
    checkpoint_path = tf.train.latest_checkpoint(train_dir)
    init_fn = slim.assign_from_checkpoint_fn(
      checkpoint_path,
      slim.get_variables_to_restore())
    
    with tf.Session() as sess:
        with slim.queues.QueueRunners(sess):
            sess.run(tf.initialize_local_variables())
            init_fn(sess)
            np_probabilities, np_images_raw, np_labels = sess.run([probabilities, images_raw, labels])
    
            for i in xrange(batch_size): 
                image = np_images_raw[i, :, :, :]
                true_label = np_labels[i]
                predicted_label = np.argmax(np_probabilities[i, :])
                predicted_name = dataset.labels_to_names[predicted_label]
                true_name = dataset.labels_to_names[true_label]
                
                plt.figure()
                plt.imshow(image.astype(np.uint8))
                plt.title('Ground Truth: [%s], Prediction [%s]' % (true_name, predicted_name))
                plt.axis('off')
                plt.show()

AttributeError: 'Dataset' object has no attribute 'get'