<a href="https://www.kaggle.com/code/peremartramanonellas/gan-tutorial-3-how-to-use-tpus-to-train-a-gan?scriptVersionId=136918892" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
from IPython.core.display import HTML
HTML("""
<style>
.output_png {
    display: table-cell;
    text-align: center;
    vertical-align: middle;
    horizontal-align: middle;
}
h1 {
    text-align: center;
    background-color: #6bacf5;
    padding: 10px;
    margin: 0;
    font-family: monospace;
    color:DimGray;
    border-radius: 2px
    style="font-family:verdana;"
}

h2 {
    text-align: center;
    background-color: #83c2ff;
    padding: 10px;
    margin: 0;
    font-family: monospace;
    color:DimGray;
    border-radius: 2px
}

h3 {
    text-align: center;
    background-color: pink;
    padding: 10px;
    margin: 0;
    font-family: monospace;
    color:DimGray;
    border-radius: 2px
}

h4 {
    text-align: center;
    background-color: pink;
    padding: 10px;
    margin: 0;
    font-family: monospace;
    color:DimGray;
    border-radius: 2px
}

body, p {
    font-family: monospace;
    font-size: 18px;
    color: charcoal;
}
div {
    font-size: 14px;
    margin: 0;

}


</style>
""")


# Using TPUs and distributed training to accelerate our Notebooks. 

Looking to speed up your GAN training? In this noptebook, I'll try to show you how to utilize TPUs in Kaggle or Google Colab to drastically reduce training time. 

Follow this  step-by-step guide and take your GANs to the next level!

The code in this notebook wortks in Kaggle and in Colab. If you can't acces to TPUs at Kaggle, just try with the same notebook at [Google Colab](https://colab.research.google.com/drive/1p6sQqiu4kWeDpxu91C0MQBX9P6qSwmPG?usp=sharing)

### Feel Free to fork or edit the noteboook for you own convenience. Please consider UPVOTING IT. It helps others to discover the notebook, and it encourages me to continue publishing.

This is the third notebook in the GANs tutorial. Here is a link to the other Notebooks. 
https://www.kaggle.com/code/peremartramanonellas/gan-tutorial-first-dcgan-using-tensorflow
https://www.kaggle.com/code/peremartramanonellas/gan-tutorial-2-generating-color-images

I'm going to use the CelebA Dataset. It's a Dataset composed of 200,000 images with famous people's faces. 

### **If you want more explanations than the ones in this Notebook you can check the full tutorial at medium:** 
https://medium.com/@peremartra/list/gans-from-zero-to-hero-d8e6cb773f93





## Importing Libraries


In [None]:
#Model face detection library, we can use the face detection in the notebooks. 
!pip install mtcnn

In [None]:
#import libraries
import tensorflow as tf
from tensorflow.keras.utils import plot_model
from tensorflow.keras import layers
import tensorflow.keras as keras
from IPython import display

import os
import zipfile
import glob
import urllib.request
from enum import Enum
from tqdm import tqdm
from functools import partial

import matplotlib.pyplot as plt
import numpy as np

from PIL import Image
from IPython.display import display
from IPython.display import Image as IpyImage
import imageio
import cv2

In [None]:
#Face detection library
import mtcnn
print(mtcnn.__version__)
     

In [None]:
#Support function to print the images. 
def plot_results(images, n_cols=None, unnorm=True, save_path = "" ):

    n_cols = n_cols or len(images)
    n_rows = (len(images) - 1) // n_cols + 1

    if images.shape[-1] == 1:
        images = np.squeeze(images, axis=-1)

    plt.figure(figsize=(12, 12))
    
    for index, image in enumerate(images):
        plt.subplot(n_rows, n_cols, index + 1)
        imageorg = image

        #We can print normalized and unnormalized images, just using the unnorm parameter. 
        if unnorm:
          imageorg = (image +1) * 0.5
          
        plt.imshow(imageorg)
        plt.axis("off")

        if save_path:
          plt.savefig(save_path)
    plt.show()

## Activating the TPU
This code aims to establish execution strategy. 

The first thing is to connect to a TPU. Once connected, it's time to create the strategy with *tf.distribute.TPUStrategy*. 


Indicating, that we are going to execute the code in a distributed way. 


In [None]:
try: 
  tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
except ValueError: 
  raise BaseException("CAN'T CONNECT TO A TPU")

tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
strategy = tf.distribute.TPUStrategy(tpu)


I'm going to use the Dataset FacesA, getting it from a storage in google, but there are more sources with the Dataset. (Just in case mine disappears). 
Google directory: https://storage.googleapis.com/learning-datasets/Resources/archive.zip
Kaggle: https://www.kaggle.com/datasets/jessicali9530/celeba-dataset
Mmlab: https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
TensorFlow: https://www.tensorflow.org/datasets/catalog/celeb_a?hl=es-419

The first thing to do is create the directory. 
After that, download the images. 
And finally, unzip the file. 

The images will be in the directory: 
/tmp/celeb/img_align_celeba/img_align_celeba

In [None]:
# make a data directory
try:
  os.mkdir('/tmp/celeb')
except OSError:
  pass

# download the dataset archive
data_url = "https://storage.googleapis.com/learning-datasets/Resources/archive.zip"
data_file_name = "archive.zip"
download_dir = '/tmp/celeb/'
urllib.request.urlretrieve(data_url, data_file_name)

# extract the zipped file
zip_ref = zipfile.ZipFile(data_file_name, 'r')
zip_ref.extractall(download_dir)
zip_ref.close()

Time to load and transform the images. I have two functions: 

* load_faces: Cuts the central area of the image, then adjust the size. 
* load_faces_MTCNN: Uses the MTCNN library to get the coordinates where the face is image.

In [None]:
#Function to load the faces. 
#Crop & center the images because the faces is almost always in the center. 
def load_faces(image_paths, resize, max_images):
  crop_size = 128
  if (max_images == 0): 
    max_images =  len(image_paths)
    print(max_images)

  images = np.zeros((max_images, resize, resize, 3), np.uint8)

  for i, path in tqdm(enumerate(image_paths)):
    with Image.open(path) as img:
      left = (img.size[0] - crop_size) // 2
      top = (img.size[1] - crop_size) // 2
      right = left + crop_size
      bottom = top + crop_size
      img = img.crop((left, top, right, bottom))
      img = img.resize((resize, resize), Image.LANCZOS)
      images[i] = np.asarray(img, np.uint8)

      if (i >= max_images-1):
        break

  return images

In [None]:
#Funtion to load the faces. Use the MTCNN lybrary to detect where the face is. 
def load_faces_MTCNN(image_paths, resize, max_images):
  MTCNN_model = mtcnn.MTCNN()
  if (max_images == 0): 
    max_images =  len(image_paths)
    print(max_images)

  images = np.zeros((max_images, resize, resize, 3), np.uint8)
  for i, path in enumerate(image_paths):
    with Image.open(path) as img:

      img = img.convert('RGB')
      #img_pixels = np.asarray(img)
      
      face = MTCNN_model.detect_faces(img_pixels)
      if len(face) == 0:
        continue 
      x1, y1, width, height = face[0]['box']
      x1, y1 = abs(x1), abs(y1)
      x2, y2 = x1 + width, y1 + height

      img = img.crop((x1, y1, x2, y2))
      img = img.resize((resize, resize), Image.LANCZOS)
      images[i] = np.asarray(img, np.uint8)
      if (i >= max_images-1):
        break
  return images

This is the function responsible for loading the Dataset.  I'm, calling load_faces. However, it is possible to change the load_faces function call to a load_faces_MTCNN call, and use the MTCNN library to detect where the faces are. 

Each of the images is passed through the preprocess function, to normalizing the pixels to values between -1 and 1.

In [None]:
#Load the pictures in the dataset. You can indicate the max_images. I recommend 
#that for testing you use 1000 images, and when you want to see the result final 
#use the max number of images, indicating 0 to max_images. 
def load_celeba(batch_size, resize=80, max_images=0):
  """Creates batches of preprocessed images from the JPG files
  Args:
    batch_size - batch size
    resize - size in pixels to resize the images
    crop_size - size to crop from the image
  
  Returns:
    prepared dataset
  """

  # initialize zero-filled array equal to the size of the dataset
  image_paths = sorted(glob.glob("/tmp/celeb/img_align_celeba/img_align_celeba/*.jpg"))
  
  print("Creating Images")

  # crop and resize the raw images then put into the array
  #choose wich function you want to use. 
  images = load_faces(image_paths, resize, max_images)
  #images = load_faces_MTCNN(image_paths, resize, max_images)
  
  #Plot the 5 first images. 
  plot_results(images[0:5], unnorm=False)

  # split the images array into two
  split_n = images.shape[0] // 2
  images1, images2 = images[:split_n], images[split_n:2 * split_n]
  del images

  # preprocessing function to convert the pixel values into the range [-1,1]
  #Is a GAN Hack to normalize the pixels of the images 
  def preprocess(img):
      x = tf.cast(img, tf.float32) / 127.5 - 1.0
      return x
  
  # use the preprocessing function on the arrays and create batches
  dataset = tf.data.Dataset.from_tensor_slices((images1, images2))
  dataset = dataset.map(
      lambda x1, x2: (preprocess(x1), preprocess(x2))
  ).shuffle(4096).batch(batch_size, drop_remainder=True).prefetch(tf.data.experimental.AUTOTUNE)
  
  return dataset

It's possible to adjust the maximum number of images to load. Although I recommend a minimum of 1000. In case we want to use all the images in the dataset, we only have to give a value of 0 to the variable max_images.

In [None]:
# use the function above to load and prepare the dataset
#Note how the batch_size is multiplied by strategy.num_replicas_in_sync
print(strategy.num_replicas_in_sync)
batch_size = 8
batch_size = batch_size * strategy.num_replicas_in_sync
dataset = load_celeba(batch_size, max_images=10000)
out_dir = "celeba_out"

As we are using a distributed strategy to execute the notebook the Dataset must be prepared for that. 

In [None]:
    # create distributed dataset
    dataset = strategy.experimental_distribute_dataset(dataset)
    
    # set the loss function
    loss_func = tf.keras.losses.BinaryCrossentropy(
        from_logits=True, 
        reduction=tf.keras.losses.Reduction.NONE
    )

## Generator & Discriminator

I have a function to create Generators, and other to create Discriminators. They are really easy to use, and it is easy to adapt the Generator and Discriminator to different shape images. 

As you know a GAN is composed of two models, the Generator and the Discriminator. The Generator creates images. 

The discriminator tries to figure out if an image belongs to the original Dataset, or is a Generated one. 

In [None]:
def adapt_generator(initial_0, nodes, upsamplings, multnodes = 1.0, endnodes = 3, input_noise=100):
  #initial_0, initial_1: size of the initial mini image. 
  #nodes: nodes in the first Dense layers. 
  #upsamplings: number og upsamplings bucles. 
  #multnodes: a multiplicator to modify the nodes in each upsampling bucle. 
  #endnodes: nodes of the last layer. 
  #input_noise: size of the noise. 

  model = keras.models.Sequential()

  #First Dense layer. 
  model.add(keras.Input(shape=(1, 1, 128)))

  #Upsampling bucles. 
  nodeslayers = nodes
  model.add(keras.layers.Conv2DTranspose(nodeslayers , kernel_size=initial_0, strides=1, padding="valid", 
                                 use_bias=False))

  
  for i in range(upsamplings-1):
    nodeslayers = int(nodeslayers * multnodes)
    model.add(keras.layers.Conv2DTranspose(nodeslayers , kernel_size=4, strides=2, padding="SAME", 
                                 activation=keras.layers.LeakyReLU(alpha=0.2)))
    model.add(keras.layers.BatchNormalization())

  #last upsample and last layer. 
  model.add(keras.layers.Conv2DTranspose(endnodes, kernel_size=4, strides=2, padding="SAME", 
                                 activation='tanh'))
  
  return model

In [None]:
def adapt_discriminator(nodes, downsamples, multnodes = 1.0, in_shape=[32, 32, 3]):
  #nodes: nodes in the first Dense layers.
  #downsamples: number of downsamples bucles. 
  #multnodes: a multiplicator to modify the nodes in each downsample bucle.
  #in_shape: Shape of the input image. 

  model = keras.models.Sequential()
  
  #input layer % first downsample 
  model.add(keras.layers.Conv2D(nodes, kernel_size=5, strides=2, padding="SAME",
                        activation=keras.layers.LeakyReLU(0.2),
                        input_shape=in_shape))
  model.add(keras.layers.Dropout(0.4))
  
  #creating downsamples
  nodeslayers = nodes
  for i in range(downsamples - 1):
    nodeslayers = int(nodeslayers * multnodes)
    model.add(keras.layers.Conv2D(nodeslayers, kernel_size=3, strides=2, padding="SAME",
                        activation=keras.layers.LeakyReLU(0.2)))
    model.add(keras.layers.Dropout(0.4))
  
  #ending model
  model.add(keras.layers.Flatten())
  model.add(keras.layers.Dense(1, activation="sigmoid"))
  
  return model

In [None]:
# Settings
resize = 80
shape = (resize, resize, 3)

# Build the GAN
with strategy.scope():
    # create the generator model
    model_G = adapt_generator(5, nodes=128, upsamplings=4, multnodes=1, endnodes=3, input_noise=100)

    # create the discriminator model
    model_D = adapt_discriminator(128, 5, multnodes=1, in_shape=shape)
    
    # print summaries
    model_G.summary()
    model_D.summary()

    # set optimizers
    param_G = tf.keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5)
    param_D = tf.keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.5)

In [None]:
plot_model(model_G, show_shapes=True, 
           show_layer_names=True)

In [None]:
plot_model(model_D, show_shapes=True, 
           show_layer_names=True)

The distributed function is used to decorate the train function of the GAN. 

The train function will be executed in multiple instances and we need to treat the return of the function to join the different values returned, we need to get the values like it were returned by a simple execution.

We can SUM (with reduction) or concatenate the variables returned. 

In [None]:
class Reduction(Enum):
    SUM = 0
    CONCAT = 1
#TThis decorated function indicates how to concatenate the values 
#returned by all the functions working in the different distributed 
#TPU's. 
#We have two possibilites. return a reducted SUM of each process, or 
# a concatenation. 
def distributed(*reduction_flags):
    def _decorator(fun):
        def per_replica_reduction(z, flag):
            if flag == Reduction.SUM:
                return strategy.reduce(tf.distribute.ReduceOp.SUM, z, axis=None)
            elif flag == Reduction.CONCAT:
                z_list = strategy.experimental_local_results(z)
                return tf.concat(z_list, axis=0)
            else:
                raise NotImplementedError()

        @tf.function
        def _decorated_fun(*args, **kwargs):
            fun_result = strategy.run(fun, args=args, kwargs=kwargs)

            assert type(fun_result) is tuple
            return tuple((per_replica_reduction(fr, rf) for fr, rf in zip(fun_result, reduction_flags)))
        return _decorated_fun
    return _decorator

Our training function is decorated with the function distributed. As we are returning 3 variables, we must indicate how to concatenate all of them. 

In the call to @distributed we are indicating the Sum of the first two parameters, while a concatenation is applied to the third. It makes sense because the first two are numbers that show the loss of the discriminator and the generator, and the third is a list of the images generated.


In [None]:
@distributed(Reduction.SUM, Reduction.SUM, Reduction.CONCAT)
def train_on_batch(real_img1, real_img2):
    '''trains the GAN on a given batch'''
    # concatenate the real image inputs
    real_img = tf.concat([real_img1, real_img2], axis=0)

    # PHASE ONE - train the discriminator
    with tf.GradientTape() as d_tape:

        # create noise input
        z = tf.random.normal(shape=(real_img.shape[0], 1, 1, z_dim))

        # generate fake images
        fake_img = model_G(z)

        # feed the fake images to the discriminator
        fake_out = model_D(fake_img)

        # feed the real images to the discriminator
        real_out = model_D(real_img)

        # use the loss function to measure how well the discriminator
        # labels fake or real images
        d_fake_loss = loss_func(tf.zeros_like(fake_out), fake_out)
        d_real_loss = loss_func(tf.ones_like(real_out), real_out)

        # get the total loss
        d_loss = (d_fake_loss + d_real_loss) 
        d_loss = tf.reduce_sum(d_loss) / (batch_size * 2)

    # get the gradients
    gradients = d_tape.gradient(d_loss, model_D.trainable_variables)
    
    # update the weights of the discriminator
    param_D.apply_gradients(zip(gradients, model_D.trainable_variables))
    

    # PHASE TWO - train the generator
    with tf.GradientTape() as g_tape:
        # create noise input
        z = tf.random.normal(shape=(real_img.shape[0], 1, 1, z_dim))
        
        # generate fake images
        fake_img = model_G(z)

        # feed fake images to the discriminator
        fake_out = model_D(fake_img)
        
        # use loss function to measure how well the generator
        # is able to trick the discriminator (i.e. model_D should output 1's)
        g_loss = loss_func(tf.ones_like(fake_out), fake_out)
        g_loss = tf.reduce_sum(g_loss) / (batch_size * 2)
    
    # get the gradients
    gradients = g_tape.gradient(g_loss, model_G.trainable_variables)

    # update the weights of the generator
    param_G.apply_gradients(zip(gradients, model_G.trainable_variables))
    
    # return the losses and fake images for monitoring
    return d_loss, g_loss, fake_img 

## The Training loop.


In [None]:
NUM_EPOCHS = 100

# generate a batch of noisy input
z_dim = 128
test_z = tf.random.normal(shape=(64, 1, 1, z_dim))

# start loop
tf.keras.backend.clear_session()
for epoch in range(NUM_EPOCHS): 
    with tqdm(dataset) as pbar:
        pbar.set_description(f"[Epoch {epoch}]")
        for step, (X1, X2) in enumerate(pbar):
            # train on the current batch
            d_loss, g_loss, fake = train_on_batch(X1, X2)
    
        # generate fake images
        fake_img = model_G(test_z)

    # save face generated to file. 
    if not os.path.exists(out_dir):
        os.makedirs(out_dir)
    file_path = out_dir+f"/epoch_{epoch:04}.png"
    
    # display gallery of generated faces
    if epoch % 1 == 0:
        plot_results(fake_img.numpy()[:4], 2, save_path=file_path)

In [None]:
from PIL import Image
imgs = os.listdir('celeba_out')
imgs.sort()
gif_images = []
for img in imgs: 
  if img.find("png")>0:
    #print ("celebaout/" + img)
    
    with Image.open("celeba_out/" + img) as im:
      im = im.convert("RGB")
      gif_images.append(np.array(im))
gif_path = "faces.gif"
imageio.mimsave(gif_path, gif_images, fps=1)

In [None]:
path="faces.gif"

with open(path,'rb') as f:
    display(IpyImage(data=f.read(), format='png'))

In [None]:
from IPython.display import Image
gif_path = "faces.gif"
Image(filename=gif_path)

# Conclusions, Fork & Improve. 
## If you liked the notebook, please consider **upvoting it**. I will be very happy, and it encourages me to continue publishing.

Please don't hesitate to fork, improve and play with the notebook at your convenience.

The faces you are looking at have been generated using only 1,000 images from the Dataset. If you use the full Dataset, the quality, and variety of faces will improve. 

We've seen how to use TPUs and distributed execution in a notebook. I used it to train a GAN, but you can use it for any problem you want. 

The main steps are: 

* Create a  strategy. 
* Instantiate the TPU. 
* Use the Scope to create the models. 
* Take care of the variables returned by functions executed in multiple TPUs. 

If you like, try to convert the notebook to run on a GPU without distributed execution, and check the differences in performance. 

I hope you liked the notebook! 

Other GANs tutorials: 

**First DCGAN using TensorFlow:**
https://www.kaggle.com/code/peremartramanonellas/gan-tutorial-first-dcgan-using-tensorflow

**Generating Color Images:**
https://www.kaggle.com/code/peremartramanonellas/gan-tutorial-2-generating-color-images

**