# Wasserstein Generative Adversarial Network - WGAN
---
The Wasserstein GAN, or WGAN for short, was introduced by Martin Arjovsky, et al. in their 2017 paper titled [Wasserstein GAN](https://arxiv.org/abs/1701.07875).

The Wasserstein GAN (WGAN) extends the traditional GAN by introducing an alternative approach to training the generator. Its goal is to improve the generator's ability to approximate the data distribution observed in the training dataset.

Rather than using a discriminator to classify generated images as real or fake, the WGAN replaces it with a critic. The critic evaluates images by scoring their "realness" or "fakeness," providing a continuous metric instead of binary classification.

This modification is based on a theoretical principle: training the generator should minimize the distance between the data distribution in the training set and the distribution of generated samples.

WGAN offers several advantages. Its training process is more stable, less sensitive to model architecture, and less dependent on hyperparameter configurations. Most importantly, the critic's loss correlates with the quality of images produced by the generator, making it a more reliable indicator of training progress.

## Wasserstein GAN Implementation Details
---
Although the theoretical grounding for the WGAN is dense, the implementation of a WGAN requires a few minor changes to the standard Deep Convolutional GAN, or DCGAN.

The image below provides a summary of the main training loop for training a WGAN, taken from the paper. Note the listing of recommended hyperparameters used in the model.


![gan-algorithm](plots/wgan-algorithm.png)

The differences in implementation for the WGAN are as follows:

1. Use a linear activation function in the output layer of the critic model (instead of sigmoid).
2. Use -1 labels for real images and 1 labels for fake images (instead of 1 and 0).
3. Use Wasserstein loss to train the critic and generator models.
4. Constrain critic model weights to a limited range after each mini batch update (e.g. [-0.01,0.01]).
5. Update the critic model more times than the generator each iteration (e.g. 5).
6. Use the RMSProp version of gradient descent with a small learning rate and no momentum (e.g. 0.00005).

**NOTE** This code will to some extend be a modified version of that implemented in the [DCGAN](https://github.com/sulaiman-shamasna/GANs/blob/main/DCGANs.ipynb) notebook.


In [40]:
from numpy import expand_dims
from numpy import mean
from numpy import ones
from numpy.random import randn
from numpy.random import randint
from keras.datasets.mnist import load_data
from keras import backend
from keras.optimizers import RMSprop
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Reshape
from keras.layers import Flatten
from keras.layers import Conv2D
from keras.layers import Conv2DTranspose
from keras.layers import LeakyReLU
from keras.layers import BatchNormalization
from keras.initializers import RandomNormal
from keras.constraints import Constraint

In [41]:
from matplotlib import pyplot

In [42]:
# clip model weights to a given hypercube
class ClipConstraint(Constraint):
	# set clip value when initialized
	def __init__(self, clip_value):
		self.clip_value = clip_value
 
	# clip model weights to hypercube
	def __call__(self, weights):
		return backend.clip(weights, -self.clip_value, self.clip_value)
 
	# get the config
	def get_config(self):
		return {'clip_value': self.clip_value}

In [43]:
import tensorflow as tf

# Define the Wasserstein loss function
def wasserstein_loss(y_true, y_pred):
    return tf.reduce_mean(y_true * y_pred)

## How to Train a Wasserstein GAN Model
---
Now that we know the specific implementation details for the WGAN, we can implement the model for image generation.

In this section, we will develop a WGAN to generate a single handwritten digit (‘7’) from the [MNIST dataset](https://machinelearningmastery.com/how-to-develop-a-cnn-from-scratch-for-fashion-mnist-clothing-classification/). This is a good test problem for the WGAN as it is a small dataset requiring a modest mode that is quick to train.

The first step is to define the models.

The critic model takes as input one 28×28 grayscale image and outputs a score for the realness or fakeness of the image. It is implemented as a modest convolutional neural network using best practices for DCGAN design such as using the [LeakyReLU activation](https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/) function with a slope of 0.2, [batch normalization](https://machinelearningmastery.com/how-to-accelerate-learning-of-deep-neural-networks-with-batch-normalization/), and using a [2×2 stride to downsample](https://machinelearningmastery.com/padding-and-stride-for-convolutional-neural-networks/).

The critic model makes use of the new ClipConstraint weight constraint to clip model weights after mini-batch updates and is optimized using the custom wasserstein_loss() function, the RMSProp version of stochastic gradient descent with a learning rate of 0.00005.

The ```define_critic()``` function below implements this, defining and compiling the critic model and returning it. The input shape of the image is parameterized as a default function argument to make it clear.

In [44]:
# define the standalone critic model
def define_critic(in_shape=(28,28,1)):
	# weight initialization
	init = RandomNormal(stddev=0.02)
	# weight constraint
	const = ClipConstraint(0.01)
	# define model
	model = Sequential()
	# downsample to 14x14
	model.add(Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init, kernel_constraint=const, input_shape=in_shape))
	model.add(BatchNormalization())
	model.add(LeakyReLU(alpha=0.2))
	# downsample to 7x7
	model.add(Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init, kernel_constraint=const))
	model.add(BatchNormalization())
	model.add(LeakyReLU(alpha=0.2))
	# scoring, linear activation
	model.add(Flatten())
	model.add(Dense(1))
	# compile model
	opt = RMSprop(learning_rate=0.00005)
	model.compile(loss=wasserstein_loss, optimizer=opt)
	return model

In [45]:
define_critic().summary()

The generator model takes as input a point in the latent space and outputs a single 28×28 grayscale image.

This is achieved by using a fully connected layer to interpret the point in the latent space and provide sufficient activations that can be reshaped into many copies (in this case, 128) of a low-resolution version of the output image (e.g. 7×7). This is then upsampled two times, doubling the size and quadrupling the area of the activations each time using transpose convolutional layers.

The model uses best practices such as the LeakyReLU activation, a kernel size that is a factor of the stride size, and a hyperbolic tangent (tanh) activation function in the output layer.

The define_generator() function below defines the generator model but intentionally does not compile it as it is not trained directly, then returns the model. The size of the latent space is parameterized as a function argument.

In [46]:
# define the standalone generator model
def define_generator(latent_dim):
	# weight initialization
	init = RandomNormal(stddev=0.02)
	# define model
	model = Sequential()
	# foundation for 7x7 image
	n_nodes = 128 * 7 * 7
	model.add(Dense(n_nodes, kernel_initializer=init, input_dim=latent_dim))
	model.add(LeakyReLU(alpha=0.2))
	model.add(Reshape((7, 7, 128)))
	# upsample to 14x14
	model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init))
	model.add(BatchNormalization())
	model.add(LeakyReLU(alpha=0.2))
	# upsample to 28x28
	model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init))
	model.add(BatchNormalization())
	model.add(LeakyReLU(alpha=0.2))
	# output 28x28x1
	model.add(Conv2D(1, (7,7), activation='tanh', padding='same', kernel_initializer=init))
	return model

Next, a GAN model can be defined that combines both the generator model and the critic model into one larger model.

This larger model will be used to train the model weights in the generator, using the output and error calculated by the critic model. The critic model is trained separately, and as such, the model weights are marked as not trainable in this larger GAN model to ensure that only the weights of the generator model are updated. This change to the trainability of the critic weights only has an effect when training the combined GAN model, not when training the critic standalone.

This larger GAN model takes as input a point in the latent space, uses the generator model to generate an image, which is fed as input to the critic model, then output scored as real or fake. The model is fit using RMSProp with the custom wasserstein_loss() function.

The define_gan() function below implements this, taking the already defined generator and critic models as input.

In [48]:
# define the combined generator and critic model, for updating the generator
def define_gan(generator, critic):
	# make weights in the critic not trainable
	for layer in critic.layers:
		if not isinstance(layer, BatchNormalization):
			layer.trainable = False
	# connect them
	model = Sequential()
	# add generator
	model.add(generator)
	# add the critic
	model.add(critic)
	# compile model
	opt = RMSprop(learning_rate=0.00005)
	model.compile(loss=wasserstein_loss, optimizer=opt)
	return model

Now that we have defined the GAN model, we need to train it. But, before we can train the model, we require input data.

The first step is to load and [scale the MNIST dataset](https://machinelearningmastery.com/how-to-manually-scale-image-pixel-data-for-deep-learning/). The whole dataset is loaded via a call to the load_data() Keras function, then a subset of the images is selected (about 5,000) that belongs to class 7, e.g. are a handwritten depiction of the number seven. Then the pixel values must be scaled to the range [-1,1] to match the output of the generator model.

The ```load_real_samples()``` function below implements this, returning the loaded and scaled subset of the MNIST training dataset ready for modeling.

In [49]:
# load images
def load_real_samples():
	# load dataset
	(trainX, trainy), (_, _) = load_data()
	# select all of the examples for a given class
	selected_ix = trainy == 7
	X = trainX[selected_ix]
	# expand to 3d, e.g. add channels
	X = expand_dims(X, axis=-1)
	# convert from ints to floats
	X = X.astype('float32')
	# scale from [0,255] to [-1,1]
	X = (X - 127.5) / 127.5
	return X

We will require one batch (or a half) batch of real images from the dataset each update to the GAN model. A simple way to achieve this is to select a [random sample](https://machinelearningmastery.com/how-to-generate-random-numbers-in-python/) of images from the dataset each time.

The ```generate_real_samples()``` function below implements this, taking the prepared dataset as an argument, selecting and returning a random sample of images and their corresponding label for the critic, specifically target=-1 indicating that they are real images.

In [50]:
# select real samples
def generate_real_samples(dataset, n_samples):
	# choose random instances
	ix = randint(0, dataset.shape[0], n_samples)
	# select images
	X = dataset[ix]
	# generate class labels, -1 for 'real'
	y = -ones((n_samples, 1))
	return X, y
 

In [51]:
# generate points in latent space as input for the generator
def generate_latent_points(latent_dim, n_samples):
	# generate points in the latent space
	x_input = randn(latent_dim * n_samples)
	# reshape into a batch of inputs for the network
	x_input = x_input.reshape(n_samples, latent_dim)
	return x_input

In [52]:
# Use the generator to generate n fake examples, with class labels
def generate_fake_samples(generator, latent_dim, n_samples):
    # generate points in latent space
    x_input = generate_latent_points(latent_dim, n_samples)
    # directly call the generator
    X = generator(x_input, training=False)
    # create class labels with 1.0 for 'fake'
    y = tf.ones((n_samples, 1))
    return X, y

In [53]:
import os
# generate samples and save as a plot and save the model
def summarize_performance(step, g_model, latent_dim, n_samples=100):
	# prepare fake examples
	X, _ = generate_fake_samples(g_model, latent_dim, n_samples)
	# scale from [-1,1] to [0,1]
	X = (X + 1) / 2.0
	# plot images
	for i in range(10 * 10):
		# define subplot
		pyplot.subplot(10, 10, 1 + i)
		# turn off axis
		pyplot.axis('off')
		# plot raw pixel data
		pyplot.imshow(X[i, :, :, 0], cmap='gray_r')
	# save plot to file
	filename1 = 'generated_plot_%04d.png' % (step+1)
	filename1_dir = os.path.join('wgan_results', filename1)	
	pyplot.savefig(filename1_dir)
	pyplot.close()
	# save the generator model
	filename2 = 'model_%04d.h5' % (step+1)
	filename2_dir = os.path.join('wgan_results', filename2)
	g_model.save(filename2_dir)
	print('>Saved: %s and %s' % (filename1, filename2_dir))

In [54]:
import os
result_path = os.path.join('wgan_results', 'plot_line_plot_loss.png')
# create a line plot of loss for the gan and save to file
def plot_history(d1_hist, d2_hist, g_hist):
	# plot history
	pyplot.plot(d1_hist, label='crit_real')
	pyplot.plot(d2_hist, label='crit_fake')
	pyplot.plot(g_hist, label='gen')
	pyplot.legend(result_path)
	pyplot.savefig()
	pyplot.close()

In [55]:
# Define the optimizers outside of the train_step function
critic_optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.00005)
generator_optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.00005)

@tf.function
def train_step(g_model, c_model, gan_model, real_images, latent_dim, n_batch, n_critic, critic_optimizer, generator_optimizer):
    c1_losses, c2_losses = [], []

    # Train the critic more frequently than the generator
    for _ in range(n_critic):
        # Generate fake images
        fake_images, y_fake = generate_fake_samples(g_model, latent_dim, n_batch // 2)
        y_real = -tf.ones((n_batch // 2, 1))  # Label for real images in WGAN

        # Update critic on real images
        with tf.GradientTape() as c_tape:
            real_output = c_model(real_images, training=True)
            fake_output = c_model(fake_images, training=True)
            c_loss = tf.reduce_mean(fake_output) - tf.reduce_mean(real_output)

        # Apply gradients to critic
        c_gradients = c_tape.gradient(c_loss, c_model.trainable_variables)
        critic_optimizer.apply_gradients(zip(c_gradients, c_model.trainable_variables))
        c1_losses.append(c_loss)

    # Prepare latent points and inverted labels for generator update
    latent_points = generate_latent_points(latent_dim, n_batch)
    y_gan = -tf.ones((n_batch, 1))

    # Update generator via critic’s loss
    with tf.GradientTape() as g_tape:
        generated_images = g_model(latent_points, training=True)
        fake_output = c_model(generated_images, training=False)
        g_loss = -tf.reduce_mean(fake_output)

    # Apply gradients to generator
    g_gradients = g_tape.gradient(g_loss, g_model.trainable_variables)
    generator_optimizer.apply_gradients(zip(g_gradients, g_model.trainable_variables))
    c2_losses.append(g_loss)

    return tf.reduce_mean(c1_losses), tf.reduce_mean(c2_losses), g_loss


In [56]:
# Train the generator and critic
def train(g_model, c_model, gan_model, dataset, latent_dim, n_epochs=10, n_batch=64, n_critic=5):
    bat_per_epo = int(dataset.shape[0] / n_batch)
    n_steps = bat_per_epo * n_epochs
    
    # Lists for keeping track of loss
    c1_hist, c2_hist, g_hist = list(), list(), list()
    
    for i in range(n_steps):
        # Get randomly selected 'real' samples
        X_real, y_real = generate_real_samples(dataset, n_batch // 2)
        
        # Perform a training step
        c1_loss, c2_loss, g_loss = train_step(g_model, c_model, gan_model, X_real, latent_dim, n_batch, n_critic, critic_optimizer, generator_optimizer)
        
        # Track losses
        c1_hist.append(c1_loss)
        c2_hist.append(c2_loss)
        g_hist.append(g_loss)
        
        print(f'>{i+1}, c1={c1_loss:.3f}, c2={c2_loss:.3f}, g={g_loss:.3f}')
        
        # Summarize performance
        if (i+1) % bat_per_epo == 0:
            summarize_performance(i, g_model, latent_dim)
    
    plot_history(c1_hist, c2_hist, g_hist)


In [57]:
# size of the latent space
latent_dim = 50
# create the critic
critic = define_critic()
# create the generator
generator = define_generator(latent_dim)
# create the gan
gan_model = define_gan(generator, critic)
# load image data
dataset = load_real_samples()
print(dataset.shape)
# train model
train(generator, critic, gan_model, dataset, latent_dim)

(6265, 28, 28, 1)
>1, c1=1.155, c2=0.027, g=0.027
>2, c1=1.044, c2=0.005, g=0.005
>3, c1=1.314, c2=-0.012, g=-0.012
>4, c1=1.289, c2=-0.031, g=-0.031
>5, c1=1.578, c2=-0.055, g=-0.055
>6, c1=1.317, c2=-0.085, g=-0.085
>7, c1=1.667, c2=-0.123, g=-0.123
>8, c1=1.934, c2=-0.165, g=-0.165
>9, c1=1.904, c2=-0.207, g=-0.207
>10, c1=2.086, c2=-0.252, g=-0.252
>11, c1=1.952, c2=-0.305, g=-0.305
>12, c1=2.197, c2=-0.370, g=-0.370
>13, c1=1.906, c2=-0.438, g=-0.438
>14, c1=2.114, c2=-0.509, g=-0.509
>15, c1=2.229, c2=-0.586, g=-0.586
>16, c1=2.160, c2=-0.683, g=-0.683
>17, c1=2.731, c2=-0.789, g=-0.789
>18, c1=2.837, c2=-0.908, g=-0.908
>19, c1=2.498, c2=-1.027, g=-1.027
>20, c1=2.819, c2=-1.159, g=-1.159
>21, c1=2.756, c2=-1.303, g=-1.303
>22, c1=2.752, c2=-1.453, g=-1.453
>23, c1=2.901, c2=-1.646, g=-1.646
>24, c1=3.150, c2=-1.827, g=-1.827
>25, c1=3.231, c2=-2.029, g=-2.029
>26, c1=3.011, c2=-2.244, g=-2.244
>27, c1=2.902, c2=-2.518, g=-2.518
>28, c1=3.148, c2=-2.826, g=-2.826
>29, c1=2.968, 



>Saved: generated_plot_0097.png and wgan_results\model_0097.h5
>98, c1=5.865, c2=-47.207, g=-47.207
>99, c1=5.731, c2=-47.647, g=-47.647
>100, c1=5.759, c2=-47.910, g=-47.910
>101, c1=5.987, c2=-48.013, g=-48.013
>102, c1=6.169, c2=-48.044, g=-48.044
>103, c1=6.134, c2=-48.092, g=-48.092
>104, c1=5.982, c2=-48.206, g=-48.206
>105, c1=5.861, c2=-48.420, g=-48.420
>106, c1=6.126, c2=-48.600, g=-48.600
>107, c1=6.207, c2=-48.811, g=-48.811
>108, c1=6.084, c2=-48.808, g=-48.808
>109, c1=6.288, c2=-49.008, g=-49.008
>110, c1=6.266, c2=-49.166, g=-49.166
>111, c1=6.362, c2=-49.251, g=-49.251
>112, c1=6.163, c2=-49.310, g=-49.310
>113, c1=6.039, c2=-49.365, g=-49.365
>114, c1=6.255, c2=-49.361, g=-49.361
>115, c1=6.421, c2=-49.465, g=-49.465
>116, c1=6.814, c2=-49.692, g=-49.692
>117, c1=6.426, c2=-49.804, g=-49.804
>118, c1=6.367, c2=-49.900, g=-49.900
>119, c1=6.599, c2=-50.052, g=-50.052
>120, c1=6.335, c2=-50.212, g=-50.212
>121, c1=6.512, c2=-50.155, g=-50.155
>122, c1=6.642, c2=-50.224,



>Saved: generated_plot_0194.png and wgan_results\model_0194.h5
>195, c1=8.362, c2=-51.101, g=-51.101
>196, c1=8.192, c2=-51.072, g=-51.072
>197, c1=8.007, c2=-50.921, g=-50.921
>198, c1=7.848, c2=-50.908, g=-50.908
>199, c1=7.853, c2=-51.053, g=-51.053
>200, c1=7.928, c2=-51.043, g=-51.043
>201, c1=8.236, c2=-50.889, g=-50.889
>202, c1=8.186, c2=-50.720, g=-50.720
>203, c1=8.112, c2=-50.599, g=-50.599
>204, c1=8.092, c2=-50.447, g=-50.447
>205, c1=8.156, c2=-50.533, g=-50.533
>206, c1=7.939, c2=-50.422, g=-50.422
>207, c1=8.114, c2=-50.524, g=-50.524
>208, c1=8.218, c2=-50.413, g=-50.413
>209, c1=8.014, c2=-50.385, g=-50.385
>210, c1=8.261, c2=-50.363, g=-50.363
>211, c1=8.338, c2=-50.261, g=-50.261
>212, c1=8.240, c2=-50.092, g=-50.092
>213, c1=8.341, c2=-49.936, g=-49.936
>214, c1=8.627, c2=-49.831, g=-49.831
>215, c1=8.342, c2=-49.809, g=-49.809
>216, c1=8.241, c2=-49.766, g=-49.766
>217, c1=8.371, c2=-49.725, g=-49.725
>218, c1=8.481, c2=-49.602, g=-49.602
>219, c1=8.253, c2=-49.54



>Saved: generated_plot_0291.png and wgan_results\model_0291.h5
>292, c1=8.888, c2=-41.898, g=-41.898
>293, c1=8.603, c2=-41.698, g=-41.698
>294, c1=8.747, c2=-41.574, g=-41.574
>295, c1=9.236, c2=-41.473, g=-41.473
>296, c1=9.135, c2=-41.351, g=-41.351
>297, c1=9.127, c2=-41.213, g=-41.213
>298, c1=9.101, c2=-41.065, g=-41.065
>299, c1=9.228, c2=-41.029, g=-41.029
>300, c1=9.133, c2=-40.942, g=-40.942
>301, c1=9.258, c2=-40.779, g=-40.779
>302, c1=9.182, c2=-40.670, g=-40.670
>303, c1=9.387, c2=-40.493, g=-40.493
>304, c1=9.070, c2=-40.322, g=-40.322
>305, c1=8.978, c2=-40.243, g=-40.243
>306, c1=8.921, c2=-40.180, g=-40.180
>307, c1=9.076, c2=-39.993, g=-39.993
>308, c1=8.900, c2=-39.890, g=-39.890
>309, c1=9.343, c2=-39.806, g=-39.806
>310, c1=8.989, c2=-39.701, g=-39.701
>311, c1=8.998, c2=-39.642, g=-39.642
>312, c1=8.955, c2=-39.538, g=-39.538
>313, c1=9.089, c2=-39.406, g=-39.406
>314, c1=9.087, c2=-39.208, g=-39.208
>315, c1=9.060, c2=-39.139, g=-39.139
>316, c1=9.014, c2=-39.04



>Saved: generated_plot_0388.png and wgan_results\model_0388.h5
>389, c1=9.194, c2=-32.332, g=-32.332
>390, c1=9.137, c2=-32.205, g=-32.205
>391, c1=9.006, c2=-32.078, g=-32.078
>392, c1=9.001, c2=-31.937, g=-31.937
>393, c1=9.164, c2=-31.853, g=-31.853
>394, c1=9.179, c2=-31.797, g=-31.797
>395, c1=9.122, c2=-31.712, g=-31.712
>396, c1=9.281, c2=-31.680, g=-31.680
>397, c1=9.076, c2=-31.543, g=-31.543
>398, c1=9.290, c2=-31.464, g=-31.464
>399, c1=9.216, c2=-31.412, g=-31.412
>400, c1=9.045, c2=-31.313, g=-31.313
>401, c1=9.156, c2=-31.262, g=-31.262
>402, c1=9.101, c2=-31.225, g=-31.225
>403, c1=8.927, c2=-31.195, g=-31.195
>404, c1=9.268, c2=-31.130, g=-31.130
>405, c1=9.244, c2=-31.059, g=-31.059
>406, c1=9.117, c2=-30.978, g=-30.978
>407, c1=9.225, c2=-30.889, g=-30.889
>408, c1=9.042, c2=-30.827, g=-30.827
>409, c1=9.442, c2=-30.788, g=-30.788
>410, c1=9.141, c2=-30.744, g=-30.744
>411, c1=9.226, c2=-30.643, g=-30.643
>412, c1=9.160, c2=-30.577, g=-30.577
>413, c1=8.947, c2=-30.51



>Saved: generated_plot_0485.png and wgan_results\model_0485.h5
>486, c1=8.913, c2=-26.587, g=-26.587
>487, c1=8.827, c2=-26.540, g=-26.540
>488, c1=8.916, c2=-26.468, g=-26.468
>489, c1=9.006, c2=-26.445, g=-26.445
>490, c1=8.831, c2=-26.421, g=-26.421
>491, c1=8.738, c2=-26.389, g=-26.389
>492, c1=8.953, c2=-26.352, g=-26.352
>493, c1=8.937, c2=-26.298, g=-26.298
>494, c1=8.883, c2=-26.229, g=-26.229
>495, c1=9.210, c2=-26.201, g=-26.201
>496, c1=9.060, c2=-26.204, g=-26.204
>497, c1=8.852, c2=-26.183, g=-26.183
>498, c1=8.848, c2=-26.149, g=-26.149
>499, c1=8.884, c2=-26.066, g=-26.066
>500, c1=8.717, c2=-26.002, g=-26.002
>501, c1=9.072, c2=-26.026, g=-26.026
>502, c1=8.829, c2=-26.022, g=-26.022
>503, c1=8.651, c2=-25.981, g=-25.981
>504, c1=8.873, c2=-25.938, g=-25.938
>505, c1=8.932, c2=-25.958, g=-25.958
>506, c1=8.839, c2=-25.908, g=-25.908
>507, c1=8.715, c2=-25.870, g=-25.870
>508, c1=8.919, c2=-25.787, g=-25.787
>509, c1=8.844, c2=-25.724, g=-25.724
>510, c1=8.903, c2=-25.70

KeyboardInterrupt: 

#### References
1. [gans-in-action](https://github.com/GANs-in-Action/gans-in-action)
2. [Machine Learning Mastery](https://machinelearningmastery.com/how-to-code-a-wasserstein-generative-adversarial-network-wgan-from-scratch/)