### Objective
The goal of using GAN is to solve the problem of overfitting the CL classification due to possible duplicate images due to larger number of bootstrap iterations. Therefore, the current data generation process will involve both bootstrapp sampling followed by GAN. Half of the required dataset will be generated using bootstrap method and the rest will generated through GAN from this data


## Description
GAN refers to the " Generative Adversarial network". It is used to generate fake images as close as possible to the real images. GAN consists of two competing convolutional neural networks, Discriminator and Generator. Discrimator takes fake and real images as input and tries to disciminate them. Generator network generates fake images as close as possible to the real images to fool the discriminator. Therefore, by using the feedback from discriminator, generator updates its weights to produce more images as close as possible to the real images.

## Implementation
The following implementation shows a step by stemp implementation of GAN. This is a modified version of core from https://machinelearningmastery.com/how-to-develop-a-generative-adversarial-network-for-an-mnist-handwritten-digits-from-scratch-in-keras/. The original was trained to generate fake images from MNIST dataset. I modified it to generate sample topomap images.

### 1. Define a standalone disciminator network
Discriminator (D) network takes images from real and fake datastets( from generator) and tries to discriminate them. Unlike normal CNN, D does not have softmax layer, instead it used sigmoid activation function to return the probability of images being real. Therefore,the output should be close to 1 for real images and zero for fake images.

In [1]:
import tensorflow as tf
from numpy import expand_dims
from numpy import zeros
from numpy import ones
from numpy import vstack
from numpy.random import randn
from numpy.random import randint
from tensorflow.keras.datasets.mnist import load_data
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Reshape
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Conv2DTranspose
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.layers import Dropout
from matplotlib import pyplot as plt
import os
import cv2
import numpy as np
# define the standalone discriminator model
def define_discriminator(in_shape=(224,224,3)):
    model = Sequential()
    model.add(Conv2D(64, (3,3), strides=(2, 2), padding='same', input_shape=in_shape))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.4))
    model.add(Conv2D(64, (3,3), strides=(2, 2), padding='same'))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.4))
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))
    # compile model
    opt = Adam(lr=0.0002, beta_1=0.5)
    model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
    return model


### 2.  Standalone generator model
The generator model is responsible for creating new fake but plausible topomap images. To do so, G takes point from latent space 

In [2]:

def define_generator(latent_dim):
    model = Sequential()
    # foundation for 7x7 image
    n_nodes = 128 * 28 * 28
    model.add(Dense(n_nodes, input_dim=latent_dim))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Reshape((28, 28,128)))
    
    # upsample to 56x56
    model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same'))
    model.add(LeakyReLU(alpha=0.2))
  
  # upsample to 112x112
    model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same'))
    model.add(LeakyReLU(alpha=0.2))
#     model.add(Conv2D(1, (7,7), activation='sigmoid', padding='same'))
    
     # upsample to 224x224
    model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same'))
    model.add(LeakyReLU(alpha=0.2))
    
    model.add(Conv2D(3, (7,7), activation='sigmoid', padding='same'))
    return model


In [3]:
gen=define_generator(100)
print(gen.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 100352)            10135552  
_________________________________________________________________
leaky_re_lu (LeakyReLU)      (None, 100352)            0         
_________________________________________________________________
reshape (Reshape)            (None, 28, 28, 128)       0         
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 56, 56, 128)       262272    
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 56, 56, 128)       0         
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 112, 112, 128)     262272    
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 112, 112, 128)     0

### 3. define the combined generator and discriminator model, for updating the generator
Here we stack together our to models. The discriminato model is trained in standalone fashion since its main role to classfify two images. So in the composite models, its weight will be frozen to classify the input image from G. The weights of G will be updated based on the results from D

In [4]:
def define_gan(g_model, d_model):
    # make weights in the discriminator not trainable
    d_model.trainable = False
    # connect them
    model = Sequential()
    # add generator
    model.add(g_model)
    # add the discriminator
    model.add(d_model)
    # compile model
    opt = Adam(lr=0.0002, beta_1=0.5)
    model.compile(loss='binary_crossentropy', optimizer=opt)
    return model


### 4. Let's prepare our real and fake images

In [None]:
n_samples=5500
img_shape=(224,224)
path= "/home/kashraf/Research_2021/Audio_topomaps_June21/theta/cl2//"
filenames=np.random.choice(os.listdir(path),5500)
data=np.array([cv2.resize(cv2.imread(path+file)/255.0,img_shape) for file in filenames])
# select real samples
def generate_real_samples(dataset, n_samples):
    # choose random instances
    ix = randint(0, dataset.shape[0], n_samples)
    # retrieve selected images
    X = dataset[ix]
    # generate 'real' class labels (1)
    y = ones((n_samples, 1))
    return X, y
 
# generate points in latent space as input for the generator
def generate_latent_points(latent_dim, n_samples):
    # generate points in the latent space
    x_input = randn(latent_dim * n_samples)
    # reshape into a batch of inputs for the network
    x_input = x_input.reshape(n_samples, latent_dim)
    return x_input
 

In [None]:

 
# use the generator to generate n fake examples, with class labels
def generate_fake_samples(g_model, latent_dim, n_samples):
    # generate points in latent space
    x_input = generate_latent_points(latent_dim, n_samples)
    # predict outputs
    X = g_model.predict(x_input)
    # create 'fake' class labels (0)
    y = zeros((n_samples, 1))
    return X, y
 
# create and save a plot of generated images (reversed grayscale)
def save_plot(examples, epoch, n=10):
    # plot images
    for i in range(n * n):
        # define subplot
        plt.subplot(n, n, 1 + i)
        # turn off axis
        plt.axis('off')
        # plot raw pixel data
        plt.imshow(examples[i, :, :, 0], cmap='gray_r')
    # save plot to file
    filename = 'generated_topos_e%03d.png' % (epoch+1)
    plt.savefig(filename)
    plt.close()

In [None]:
# data.shape

### 5. Evaluate the Discriminator model

In [None]:
import tqdm

# evaluate the discriminator, plot generated images, save generator model
def summarize_performance(epoch, g_model, d_model, dataset, latent_dim, n_samples=5500):
    # prepare real samples
    X_real, y_real = generate_real_samples(dataset,n_samples)
    # evaluate discriminator on real examples
    _, acc_real = d_model.evaluate(X_real, y_real, verbose=0)
    # prepare fake examples
    x_fake, y_fake = generate_fake_samples(g_model, latent_dim, n_samples)
    # evaluate discriminator on fake examples
    _, acc_fake = d_model.evaluate(x_fake, y_fake, verbose=0)
    # summarize discriminator performance
    print('>Accuracy real topomap: %.0f%%, Accuracy fake topomaps: %.0f%%' % (acc_real*100, acc_fake*100))
    # save plot
    save_plot(x_fake, epoch)
    # save the generator model tile file
    filename ='/home/kashraf/Research_2021/GAN_topomap/saved_models/generator_theta_cl2.h5'
    filename2='/home/kashraf/Research_2021/GAN_topomap/saved_models/discriminator_theta_cl2.h5'
    g_model.save(filename)
    d_model.save(filename2)
 
   ##train the generator and discriminator
def train(g_model, d_model, gan_model, dataset, latent_dim, n_epochs=100, n_batch=64):
    bat_per_epo = int(dataset.shape[0] / n_batch)
    half_batch = int(n_batch / 2)
    # manually enumerate epochs
    for i in range(n_epochs):
        print("-------------------CURRENT EPOCH-----------:",i+1)
        # enumerate batches over the training set
        for j in range(bat_per_epo):
            print("------------EPOCH:{} BATCH :{}/{}--------------".format(i+1,j+1,bat_per_epo))
            # get randomly selected 'real' samples
            X_real, y_real = generate_real_samples(dataset, half_batch)
            # generate 'fake' examples
            X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch)
            # create training set for the discriminator
            X, y = vstack((X_real, X_fake)), vstack((y_real, y_fake))
            # update discriminator model weights
            d_loss, _ = d_model.train_on_batch(X, y)
            # prepare points in latent space as input for the generator
            X_gan = generate_latent_points(latent_dim, n_batch)
            # create inverted labels for the fake samples
            y_gan = ones((n_batch, 1))
            # update the generator via the discriminator's error
            g_loss = gan_model.train_on_batch(X_gan, y_gan)
            # summarize loss on this batch
            print('Discriminator Loss: {}\nGenerator Loss: {}\n'. format(d_loss, g_loss))
        # evaluate the model performance, sometimes
        if (i+1) % 20 == 0:
            summarize_performance(i, g_model, d_model, dataset, latent_dim)
 


In [None]:
# size of the latent space
latent_dim = 100
# create the discriminator
d_model = define_discriminator()
# create the generator
g_model = define_generator(latent_dim)
# create the gan
gan_model = define_gan(g_model, d_model)
# # load image data
dataset = data
# train model
train(g_model, d_model, gan_model, dataset, latent_dim)

-------------------CURRENT EPOCH-----------: 1
------------EPOCH:1 BATCH :1/85--------------


### Final: Using generator model to generate images

In [None]:
# example of loading the generator model and generating images
import keras
from keras.models import load_model
from numpy.random import randn
from matplotlib import pyplot as plt
from keras.models import load_model
import h5py


# generate points in latent space as input for the generator
def generate_latent_points(latent_dim, n_samples):
    # generate points in the latent space
    x_input = randn(latent_dim * n_samples)
    # reshape into a batch of inputs for the network
    x_input = x_input.reshape(n_samples, latent_dim)
    return x_input

# create and save a plot of generated images (reversed grayscale)
def save_plot(examples, n):
    # plot images
    for i in range(n * n):
        # define subplot
        plt.subplot(n, n, 1 + i)
        # turn off axis
        plt.axis('off')
        # plot raw pixel data\
#         plt.savefig()
        plt.imshow(examples[i, :, :, 0])
    plt.show()
import tensorflow as tf
# load model
model = tf.keras.models.load_model('/home/kashraf/Research_2021/GAN_topomap/saved_models/Un_GAN_generator_V4_allsamples.h5')
# generate images
latent_points = generate_latent_points(100,10)
# generate images


# with CustomObjectScope({'GlorotUniform': glorot_uniform()}):
#         model = load_model('/home/kashraf/Research_2021/GAN_topomap/saved_models/Un_GAN_generator_V2.h5')


In [None]:
# path="/home/kashraf/Research_2021/GAN_topomap/generated_images/original_224//"
# # model=tf.keras.models.load_model('/home/kashraf/Research_2021/GAN_topomap/saved_models/Un_GAN_generator_V2.h5')
# # latent_points= generate_latent_points(100,500)
# # X =model.predict(latent_points)
# # generated_img=[((arr - arr.min()) * (1/(arr.max() - arr.min()) * 255)).astype('uint8') for arr in X]
# # generated_img = [cv2.cvtColor(img, cv2.COLOR_BGR2RGB) for img in generated_img]
# for i in range(100):
#     arr=data[i]
#     img=((arr - arr.min()) * (1/(arr.max() - arr.min()) * 255)).astype('uint8')
#     plt.imsave(path+"beta_OG"+str(i)+".png",img)

# # # plot the result
# # plt.figure(figsize=(10,10))
# # plt.imshow(generated_img[2])
# # plt.show()

In [None]:
k=range(10)
for i in range(10):
    if k[i]/2