GANs are able to learn how to model the input distribution by training two
competing (and cooperating) networks referred to as generator and discriminator
(sometimes known as critic). The role of the generator is to keep on figuring out
how to generate fake data or signals (this includes, audio and images) that can
fool the discriminator. Meanwhile, the discriminator is trained to distinguish
between fake and real signals. As the training progresses, the discriminator will
no longer be able to see the difference between the synthetically generated data
and the real ones. From there, the discriminator can be discarded, and the generator
can now be used to create new realistic signals that have never been observed before.

one thing we'll find is that the most challenging aspect is how do we achieve stable training
of the generator-discriminator network? There must be a healthy competition
between the generator and discriminator in order for both networks to be able
to learn simultaneously. Since the loss function is computed from the output
of the discriminator, its parameters update is fast. When the discriminator
converges faster, the generator no longer receives sufficient gradient updates for
its parameters and fails to converge. Other than being hard to train, GANs can also
suffer from either a partial or total modal collapse, a situation wherein the generator
is producing almost similar outputs for different latent encodings.

# Principles of GANs
a GAN is made up of two networks, a generator, and
a discriminator. The input to the generator is noise, and the output is a synthesized
signal. 

Meanwhile, the discriminator's input will be either a real or a synthesized
signal. Genuine signals come from the true sampled data, while the fake signals
come from the generator. All of the valid signals are labeled 1.0 (that is, 100%
probability of being real) while all the synthesized signals are labeled 0.0 (that
is, 0% probability of being real). Since the labeling process is automated, GANs
are still considered part of the unsupervised learning approach in deep learning.

The objective of the discriminator is to learn from this supplied dataset on how
to distinguish real signals from fake signals. During this part of GAN training,
only the discriminator parameters will be updated. Like a typical binary classifier,
the discriminator is trained to predict on a range of 0.0 to 1.0 in confidence values
on how close a given input signal is to the true one. However, this is only half
of the story.

At regular intervals, the generator will pretend that its output is a genuine signal
and will ask the GAN to label it as 1.0. When the fake signal is then presented
to the discriminator, naturally it will be classified as fake with a label close to 0.0.
The optimizer computes the generator parameter updates based on the presented
label (that is, 1.0). It also takes its own prediction into account when training
on this new data. In other words, the discriminator has some doubt about its
prediction, and so, GANs takes that into consideration. This time, GANs will let
the gradients backpropagate from the last layer of the discriminator down to the
first layer of the generator. However, in most practices, during this phase of training,
the discriminator parameters are temporarily frozen. The generator will use the
gradients to update its parameters and improve its ability to synthesize fake signals.

# DCGAN
• Use of strides > 1 convolution instead of MaxPooling2D or UpSampling2D.
With strides > 1, the CNN learns how to resize the feature maps.
• Avoid using Dense layers. Use CNN in all layers. The Dense layer is utilized
only as the first layer of the generator to accept the z-vector. The output of the
Dense layer is resized and becomes the input of the succeeding CNN layers.
• Use of Batch Normalization (BN) to stabilize learning by normalizing
the input to each layer to have zero mean and unit variance. No BN
in the generator output layer and discriminator input layer. In the
implementation example to be presented here, no batch normalization
is used in the discriminator.
• Rectified Linear Unit (ReLU) is used in all layers of the generator except in
the output layer where the tanh activation is utilized. In the implementation
example to be presented here, sigmoid is used instead of tanh in the output
of the generator since it generally results in a more stable training for
MNIST digits.
• Use of Leaky ReLU in all layers of the discriminator. Unlike ReLU, instead of
zeroing out all outputs when the input is less than zero, Leaky ReLU generates
a small gradient equal to alpha × input. In the following example, alpha = 0.2.
