# Lab 10: Generative Adversarial Networks

In today's lab we'll take a quick look at **generative adversarial networks** or GANs. GANs are a realtively new neural net architecture, first introduced in 2014 by Goodfellow et al [1]. The main problem that is solved by GANs is that of generating new pieces of data: given a training set, can a neural network generate completely new data that are similar to those found in the training set? As it turns out, the answer is yes, and today we'll see how exactly these networks work.

## The Main Idea

The big problem that researches faced when trying to create a neural net generating new pieces of data was a matter of training. Which approach would be the best to evaluate whether the network is getting better or not at generating (let's say, images)? Supervised learning would work well if we wanted to teach a neural net to reproduce an image (such networks are called autoencoders), but we would be missing the necessary means to evaluate new images created by the network. With unsupervised learning, it might be possible to teach the network to generate images that fall into the same cluster as images from the training set, but a simple architecture wouldn't suffice here, either. Reinforcement learning, like supervised learning, is not suitable for this problem, since there's once again no signal that we could use to tell the network how it is performing.

The solution came with a novel architecture that comprises two networks: a generator and a discriminator. These two nets are joined in an eternal adversarial (or zero-sum) game: the good performance of one net is the other net's failing. The general architecture of GANs is shown on the figure below:

![GAN architecture](resources/gan_architecture.png)

As you can se, there are two networks in the system. The generator's task is to generate a new image (or any other type of data) from a random input, or noise. The image is then used as an input to the discriminator. The discriminator has one job: to tell apart real images (from the training set) and images generated by the generator - it is a simple two-class classificator. By using this architecture, we don't need to address the problem of training the generator and the discriminator separately, but we can train them together.

## Training a GAN

The main problem with training the generator on its own is that we don't have the expected inputs and outputs. We can generate sample noise as an input, but we cannot determine if the resulting output is a fake image or just random noise - we don't know the expected output for a given input. We could use a human to label the output of the generator as good enough or not, but that process would be tiresome and impractical. Therefore, we introduce the discriminator that substitutes that human and learns to tell the difference between a real and a fake image. With this setup, we know both the inputs and the expected outputs:

* if the discriminator gets a sample image from the training set, its output should be 1 (real image)
* if the discriminator gets an image generated by the generator, its ouputs should be 0 (fake image).

Therefore, we don't train the generator independently, instead we join it with the discriminator in a single net whose input is random noise and it's expected output is 0 (since we generate a fake image). The fake image itself is hidden in the middle of the net as the output of the generator. This architecture is represented as the lower branch of the figure above.

Generally, the GAN training process consists of the following steps:
1. pre-train the discriminator to recognize real images
2. train the combination of generator and discriminator together with random noise as input and 0 as expected output for some number of samples
3. train the discriminator on the same number of samples from the training set
4. repeat steps 2 and 3 until you are satisfied with the result

Unlike with simple categorization, your goal is not to maximize the accuracy of the discriminator, you ideally want to keep it at around 50% - the discriminator is completely clueless whether the image is real or fake, it does not perform better than chance value.

## What can go wrong?

Although the theory of GANs is simple enough, implementing them is not that straightforward. And, just like with any simple system, there are a handful of ways in which GANs can go wrong. Today we'll only mention the two most common problems.

**The discriminator overpowering the generator** happens when no matter what the generator produces, the discriminator can always tell it apart from real images. This basically means that the discriminator's accuracy remains constantly around 100%. This is a bad sign, since the generator does not get any useful information regarding its functioning: whatever it does, the discriminator tells it that it's wrong, and so the generator cannot get better at all. The most common causes of this problem is either a topology that is too simplistic for the given problem or that the pre-taining of the discriminator was too long and it was overtrained on the training set.

**Mode collapse** is another common problem that is caused by the nature of the generator. Generators, similarly to some people, are dumb and lazy. This means that they will try to find the easiest way out of their task even if it makes no sense. This is especially easy for them to do since they have no idea what sort of problem they are trying to solve, the only things that exists for them is the noise on the input and a signal from the discriminator that tells them whether they are doing fine or not. And so, it can happen that the generator will find a single image that evades the discriminator's attention and will reproduce the same image over and over again. Or, maybe it can capitalize on another weakness of the discriminator, and will use that to produce images that the discriminator will recognize as real - this can happen also when the discriminator focuses only on a small part of the image to determine if it is real or not.

## GANs in action

Now that we have the theoretical basics figured out, we can look at the implementation of a simple GAN network. Of course, nowadays there exist a myriad of variations of GANs that are based on a combination of GANs with other methods. A comprehensive list of them with examples in Keras can be found [here](https://github.com/eriklindernoren/Keras-GAN). Today we'll look at the original GAN architecture that will also run relatively quickly on your computers.

We will use the MNIST dataset of hand-written digits from 0 to 9 (the dataset is available through Keras, there's no need for you to download it). You can access the code of the GAN network [here](codes/lab10-gan.py). If everything goes according to plan, we will teach our network to generate images (compare the output of the untrained and the trained network):

![Output of an untrained GAN network](resources/gan_untrained.png)

![Output of the trained GAN network](resources/gan_trained.png)

## References
1. Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." In Advances in neural information processing systems, pp. 2672-2680. 2014. [available online](http://papers.nips.cc/paper/5423-generative-adversarial-nets)
2. eriklindernoren: Collection of Keras implementations of Generative Adversarial Networks (GANs) suggested in research papers. GitHub [available online](https://github.com/eriklindernoren/Keras-GAN)