# Generative Adversarial Network for Augmentation of Laser-Based Laryngeal Imaging

For the deep-learning-based algorithm to match features via a registration task, it is essential to apply intense data augmentation to the training data set. The training data consists of images $m(x)$ that represent the spatial configuration of laser points projected onto the vocal fold surface.
The foundation for the images $m(x)$ are the x-y-coordinates of each single laser point within the image, as $m(x)$ is generated by plotting the single laser points and then smoothing the image. To create intense augmentation we want to train a generative adversaraial network (GAN) to generate images that are variations of the images of the training set and represent feasible configurations of laser points projected onto a vocal fold.

## Import Statements
The notebook was developed on Keras using the Tensorflow 2.2.0 backend.

In [None]:
import tensorflow.keras as keras
import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
import os
from tensorflow.keras.preprocessing import image
from matplotlib import pyplot as plt
from scipy.ndimage import gaussian_filter
import glob
import json

## Hardware Configuration
Check for GPU and allow memory growth such that limitations for training are reduced. 

In [None]:
print(tf.__version__)
physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
tf.config.experimental.get_visible_devices('GPU')

## Model parameters
The grid dimensions have to be know, as well as the image dimensions for scaling. The depth of the input and output layer is 2 here, as we will have one channel representing x-coordinates and a second channel representing y-coordinates. 

In [None]:
height = 18
width = 18
channels = 2

image_width = 728
image_height = 728

## Hyperparameters
The dimension of the latent space can be adapted to optimize the network. Further, kernel size for convolutional layers and filters can be set. 

In [None]:
latent_dim = 8
kernel_size = 4
filters = 128

## Generator
The first part of a GAN is a generator network that takes random input vectors from the latent space and decodes the vector to generate a synthetic image.

In [None]:
generator_input = keras.Input(shape=(latent_dim,))

# Leaky ReLU is preferred as it lowers the sparsity of gradients
x = layers.Dense(filters * (height//2) * (width//2))(generator_input)
x = layers.LeakyReLU()(x)
x = layers.Reshape(((height//2), (width//2), filters))(x)

# Kernel size should be divisible by the stride size to prevent checkerboard artifacts
x = layers.Conv2DTranspose(2*filters, kernel_size, strides=2, padding='same')(x)
x = layers.LeakyReLU()(x)

# Use tanh activation for improved training
x = layers.Conv2D(channels, kernel_size, activation='tanh', padding='same')(x)
generator = keras.models.Model(generator_input, x)
generator.summary()

## Discriminator
The second part of the GAN is a discriminator network that takes an image as input and decides if the image comes from the training set or was synthetically created by the generator.

In [None]:
height=18
width=18
channels=2

discriminator_input = layers.Input(shape=(height, width, channels))

x = layers.Conv2D(filters, kernel_size)(discriminator_input)
x = layers.LeakyReLU()(x)

# Use strided convolutions instead of max pooling as it lowers the sparsity of gradients
x = layers.Conv2D(filters, kernel_size, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(filters, kernel_size, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Flatten()(x)

# Dropout is essential to induce robustness to the GAN
x = layers.Dropout(0.5)(x)

x = layers.Dense(1, activation='sigmoid')(x)

discriminator = keras.models.Model(discriminator_input, x)
discriminator.summary()

#discriminator_optimizer = keras.optimizers.Adam(lr=0.0002, beta_1=0.5, clipvalue=1.0, decay=1e-8)
discriminator_optimizer = keras.optimizers.RMSprop(lr=0.0002, clipvalue=1.0, decay=1e-8)

discriminator.compile(optimizer=discriminator_optimizer,
                      loss='binary_crossentropy')

## GAN
The gan itself is composed by the generator and the discriminator.

In [None]:
discriminator.trainable = False

gan_input = keras.Input(shape=(latent_dim,))
x = generator(gan_input)
x = discriminator(x)
gan = keras.models.Model(gan_input, x)

#gan_optimizer = keras.optimizers.Adam(lr=0.0002, beta_1=0.5, clipvalue=1.0, decay=1e-8)
gan_optimizer = keras.optimizers.RMSprop(lr=0.0004, clipvalue=1.0, decay=1e-8)

gan.compile(optimizer=gan_optimizer, loss='binary_crossentropy')

### Prepare data
Load the data and scale it to be between 0.0 an 1.0.

In [None]:
# Define a path from where to load the data
base_path = 'Data/LASTEN/train'

# Load data into numpy arrays
def sort_key(path):
    return int(path.split(os.sep)[-1].split(".")[0].split("_")[0]) 
    
globs = glob.glob(base_path+'/*.json')
files = sorted(globs, key=sort_key)
file_length =len(files)
x_position = np.full((file_length, height, width), 0)
y_position = np.full((file_length, height, width), 0)
for file_id, file in enumerate(files):
    with open(file) as json_file:
        data = json.load(json_file)
        
        for key, value in data.items():
            key = int(key)
        
            
            y = key // height 
            x = key % width

            x_position[file_id][y][x] = value[0]
            y_position[file_id][y][x] = value[1]

# Define offset
offset = 0.0

# Non existing points mapping
x_position = np.where(x_position<=0, -offset, x_position) + offset
y_position = np.where(x_position<=0, -offset, y_position) + offset

x_position = x_position / (image_width + offset)
y_position = y_position / (image_height + offset)

x_position = x_position[:,:,:,np.newaxis]
y_position = y_position[:,:,:,np.newaxis]

#x_position = x_position[0:20, :, : ,:]
#y_position = y_position[0:20, :, : ,:]

xy_data = np.concatenate((x_position, y_position), axis=3)
print("Shape of 'xy_data': {}".format(xy_data.shape))

## GAN Training
The training of the DCGAN (Deep Convolutional Generative Adversarial Network) is a dynamic process, where an equilibrium between the capability of the generator to fake images and the capability of the discriminator to recognize faked images should be achieved. The procedure of training is iterative. The following steps are repeated until a sufficient equilibrium is achieved:

1. We randomly draw points from the latent space assuming a Gaussian distribution.
2. The sample points from 1. are used to generate images with the generator.
3. Generated images (fake) are mixed with images from the training set (real).
4. Only the discriminator is trained where fake images get the label "fake" and real images have the label "real". In that way the disciminator learns to judge the generator whether the provided image is fake or real.
5. Again draw random points from the latent space.
6. The points from 5. are labeled as "real" images (although they are not) the parameters of the discriminator are fixed and the whole GAN model is trained. In that way the generator learns to fake images.

In [None]:
iterations = 40
batch_size = 5
save_dir = 'weights/gan'

xy_data_orig = xy_data.copy()
np.random.shuffle(xy_data)

start = 0
for step in range(iterations):
    # Get random 'fake' images
    random_latent_vectors = np.random.normal(size=(batch_size, latent_dim)) # sample with mean=0 and std=1.0
    generated_images = generator.predict(random_latent_vectors)
    
    # Get 'real' images, merge them with 'fake' and create labels
    stop = start + batch_size
    real_images = xy_data[start:stop]
    combined_images = np.concatenate([generated_images, real_images])
    labels = np.concatenate([np.ones((batch_size, 1)) - 0.1,
                             np.zeros((batch_size, 1))])
    
    labels += 0.05 * np.random.random(labels.shape) # important to introduce some randomness
    
    # Train the discriminator on 'real' and 'fake' images
    d_loss = discriminator.train_on_batch(combined_images, labels)

    # Get random images from generator but treat them as 'real' now
    random_latent_vectors = np.random.normal(size=(batch_size, latent_dim))
    misleading_targets = np.zeros((batch_size, 1))
    
    # Train the generator's weights
    a_loss = gan.train_on_batch(random_latent_vectors, misleading_targets)
    
    # Finalize loop
    start += batch_size
    if start > len(xy_data) - batch_size:
        start = 0
        np.random.shuffle(xy_data)
        
    # Print loss
    print('discriminator loss:', d_loss)
    print('adversarial loss:', a_loss)      

## Predict images and compare with ground truth
Following cells will display results of the prediction.

### Predicition of x-coordinates by generator
A randomly drawn latent vector is used to generate fake images by the generator.

In [None]:
random_latent_vectors = np.random.normal(size=(8, latent_dim))# sample with mean=0 and std=1.0
generated_images = generator.predict(random_latent_vectors)

plt.rcParams["figure.figsize"] = (7,7)
for i in range(8):
    img = generated_images[i,:,:,1]
    img = img[:,:,np.newaxis]
    img = keras.preprocessing.image.array_to_img(img)
    
    val = 441 + i
    plt.subplot(val)
    plt.imshow(img)
    plt.axis('off')
    plt.tight_layout()

### Ground-truth of x-coordinates
Out of all 8 recordings from the dataset one frame is displayed.

In [None]:
for i in range(8):
    val = 441 + i
    plt.subplot(val)
    x = xy_data_orig[i*20,:,:,0]
    plt.imshow(x)
    plt.axis('off')
    plt.tight_layout()

Conclusion: Predictions of the GAN are not even close to the ground truth! Sampling data from the GAN is therefore not reasonable.