# GANs (Generative Adversarial Networks)

Now its time for more advanced types of neural networks: GANs!
Generative adversarial networks (GANs) are deep learning architectures that use two neural networks, pitting one against the other in order to generate new instances of data. They are used widely in new images generation. 

Very famous application of GANs was [Face Generator (This Person Does Not Exist)](https://this-person-does-not-exist.com/en).

**TASK:** Generating totally new images with handwritten images with GANs


**WHAT YOU WILL LEARN:**

- how to implement basic GAN
- how to implement loss function for GANs (min max game) using built-in cross entropy pytorch implementation
- how to plot loss for Generator and Discriminator
- what is the importance of Normalization Layers in more complex architectures like GAN and how it affects the training stability
- how make your code more modular (especially useful for very big and complex architectures)


**TO DO:** Read and understand following code. Run the cells, analyse the results and if everything is clear, follow the instructions concerning exercises part.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
from torch.utils.tensorboard import SummaryWriter  # to print to tensorboard
import matplotlib.pyplot as plt
from torchvision.utils import make_grid
from tqdm.auto import tqdm
from PIL import Image
import matplotlib.pyplot as plt


## The simplest GANs architecture



### Data 

In this part, we will build the function for showing the image from a tensor, that represents image (digit).

Then we will create DataLoader from MNIST Dataset. Similarly to Autoencoders we don't need two separated datasets (train and test), since we are dealing with unsupervised learning!

In [None]:
def show(tensor, ch=1, size=(28,28), num=16):
  # tensor: 128 x 784
  data=tensor.detach().cpu().view(-1,ch,*size) # 128 x 1 x 28 x 28
  grid = make_grid(data[:num], nrow=4).permute(1,2,0)   # 1 x 28 x 28  = 28 x 28 x 1
  plt.imshow(grid)
  plt.show()

In [None]:
transforms = transforms.Compose(
    [transforms.ToTensor(), 
     ]
)

dataset = datasets.MNIST(root="dataset/", transform=transforms, download=True)


### Hyperparameters and other settings


In [None]:
EPOCHS = 20
LEARNING_RATE = 0.00001 # learning rate - speed of tweak the params of neural network
# LEARNING_RATE = 0.1
BATCH_SIZE = 128


cur_step = 0
info_step = 300 # show information about the current loss values and visualisation
mean_gen_loss = 0
mean_disc_loss = 0
z_dim = 64 #latent space dimension

#device = "cuda" if torch.cuda.is_available() else "cpu"
device = "cpu"

dataloader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True)

# number of steps: 60000 / 128  = 468.75


### Generator Model

The key part - implementing the Generator Network.

This time, in order to make the code more modular, we create `genBlock()`, that consists of sequence of layers: (`Linear`, `ReLU`) instead of 'rewriting' it few times. It is useful especially for more complex architectures.


Pay special attention to input and output dimensions of each layer!




In [None]:
class Generator(nn.Module):
    def __init__(self, z_dim=64, i_dim=784, h_dim=128):
        super().__init__()
        self.gen = nn.Sequential(
        self.genBlock(z_dim, h_dim), # 64, 128
        self.genBlock(h_dim, h_dim*2), # 128, 256
        self.genBlock(h_dim*2, h_dim*4), # 256 x 512
        self.genBlock(h_dim*4, h_dim*8), # 512, 1024
        nn.Linear(h_dim*8, i_dim), # 1024, 784 (28x28)
        nn.Sigmoid(),
    )

    def genBlock(self, inp, out):
        return nn.Sequential(
        nn.Linear(inp, out),
        #nn.BatchNorm1d(out),
        nn.ReLU(inplace=True)
    )

    def forward(self, noise):
       return self.gen(noise)

def gen_noise(number, z_dim):
  return torch.randn(number, z_dim).to(device)

gen = Generator(z_dim).to(device)
gen

GAN takes random noise as its input. The generator then transforms this noise into a meaningful output.


Let's generate some noise and pass it to the Generator. Since, the Generator is not trained yet, the output should stil look like a noise.

In [None]:
x,y = next(iter(dataloader))
print(x.shape, y.shape)
print(y[:10])
show(x)

noise = gen_noise(BATCH_SIZE, z_dim)
fake = gen(noise)
show(fake)

### Discriminator Model

The key part - implementing the Discriminator Network.


This time, in order to make the code more modular, we create `discBlock()`, that consists of sequence of layers: (`Linear`, `LeakyReLU`) instead of 'rewriting' it few times. Useful especially for more complex architectures!


Pay special attention to input and output dimensions of each layer!

In [None]:
class Discriminator(nn.Module):
    def __init__(self, i_dim=784, h_dim=256):
        super().__init__()
        self.disc = nn.Sequential(
            self.discBlock(i_dim, h_dim*4), # 784 (28x28) -> 1024
            self.discBlock(h_dim*4, h_dim*2), # 1024 -> 512
            self.discBlock(h_dim*2, h_dim), # 512 -> 256
            nn.Linear(h_dim, 1), # 256 -> 1
            nn.Sigmoid() # get values between 0 and 1
        )

    def discBlock(self, inp, out):
        return nn.Sequential(
            nn.Linear(inp, out),
            nn.LeakyReLU(0.2)
        )


    def forward(self, image):
        return self.disc(image)


disc = Discriminator().to(device)
disc

### Optimizer 

Initialize optimizers for models with previously selected value of learning rate.

In [None]:
gen_opt = torch.optim.Adam(gen.parameters(), lr=LEARNING_RATE)
disc_opt = torch.optim.Adam(disc.parameters(), lr=LEARNING_RATE)

### Generator Loss and Discriminator Loss

The most challenging part of designing GANs is to implement a Loss function.
We will use Cross Entropy here. 

**CROSS ENTROPY** - a measure of the difference between two probability distributions.

Binary cross entropy is implemented in Pytorch as `nn.BCELoss().`



In [None]:
loss_func = nn.BCELoss()
#loss_func = nn.BCEWithLogitsLoss()

In [None]:
#generator loss: log(1 - D(G(z))) <-> max log(D(G(z))
def calc_gen_loss(loss_func, gen, disc, number, z_dim):
    noise = gen_noise(number, z_dim) #number = batch size
    fake = gen(noise)
    pred = disc(fake) #predictions = output from discriminator 0-fake, 1-real
    target = torch.ones_like(pred) #vector with ones with dimensionality like pred
    gen_loss = loss_func(pred, target)
    return gen_loss 

#discriminator loss: log(D(x)) + log(1 - D(G(z)))
def calc_disc_loss(loss_func, gen, disc, number, real, z_dim):
    noise = gen_noise(number, z_dim)
    fake = gen(noise)
    #detach fake images from the calculations when the discriminator is trained
    disc_fake = disc(fake.detach()) 
    disc_fake_targets = torch.zeros_like(disc_fake)
    disc_fake_loss = loss_func(disc_fake, disc_fake_targets)

    disc_real = disc(real)
    disc_real_targets = torch.ones_like(disc_real)
    disc_real_loss = loss_func(disc_real, disc_real_targets)
    
    disc_loss = (disc_fake_loss + disc_real_loss)/2     
    return disc_loss

### Training Loop


Time for main training loop. In case of GANs you will need to wait for at least 10 epochs  in order to see reasonable output. 
After each epoch we will plot samples generated by our Generator. You can observe the process, where Model's output becomes better and better.
You can start to think about exercises in the meantime.

In [None]:
gen_losses = []
disc_losses = []
epochs = 20

for epoch in range(epochs):
    for real,_ in dataloader:

        #discriminator
        disc_opt.zero_grad()

        cur_bs = len(real) # real 128 x 1 x 28 x 28
        real = real.view(cur_bs, -1) # 128 x 784
        real = real.to(device)

        disc_loss = calc_disc_loss(loss_func,gen,disc,cur_bs,real,z_dim)

        disc_loss.backward(retain_graph=True)
        disc_opt.step()
        disc_losses.append(disc_loss.item())

        #generator
        gen_opt.zero_grad()
        gen_loss = calc_gen_loss(loss_func, gen, disc, cur_bs, z_dim)
        gen_loss.backward(retain_graph=True)
        gen_opt.step()
        gen_losses.append(gen_loss.item())

        #visualisationa & stats
        mean_disc_loss+=disc_loss.item()/info_step
        mean_gen_loss+=gen_loss.item()/info_step

        if cur_step % info_step == 0  and cur_step > 0:
            fake_noise = gen_noise(cur_bs, z_dim)
            fake = gen(fake_noise)
            show(fake)
            #show(real)
            print(f"Epoch: {epoch}: step {cur_step} / Gen loss: {mean_gen_loss} / Disc loss: {mean_disc_loss}")


            mean_gen_loss, mean_disc_loss = 0,0
        cur_step+=1


plt.figure(figsize=(12, 5))
plt.semilogy(gen_losses, label="Generator Loss")
plt.semilogy(disc_losses, label="Discriminator Loss")
plt.grid(True, "both", "both")
plt.legend()

### Exercises
1. Look at your results. If you can't see any digits yet - let's try to fix our Model. Add `BatchNorm1d` layers to our Generator (just uncomment commented line). Restart the Notebook, run again and observe the results. At which epoch you start to see the first shapes of digits instead of pure noise? 


2. Run the training with more number of epochs and try to spot typical "problems" with GANs with basic architectures:

- Non-convergence: the model parameters oscillate, destabilize and never converge,
- Mode collapse: the generator collapses which produces limited varieties of samples (e.g. only '9' and '0' digits),
- Diminished gradient: the discriminator gets too successful that the generator gradient vanishes and learns nothing,
- Unbalance between the generator and discriminator causing overfitting
- Highly sensitive to the hyperparameter selections.

[Read more:](https://jonathan-hui.medium.com/gan-why-it-is-so-hard-to-train-generative-advisory-networks-819a86b3750b)