# Objective 
**We are going to train a model capable of learning to generate even numbers.**

Source: https://towardsdatascience.com/build-a-super-simple-gan-in-pytorch-54ba349920e4

# Setup

In [31]:
from typing import List
import math
import numpy as np

import torch
import torch.nn as nn

# Data

Before getting into the actual model let’s build out our data set. We are going to represent each integer as it’s unsigned seven bit binary representation. So the number 56 is 0111000. We do this because:

1. It is very natural to pass in a binary vector to a machine learning algorithm, in this case, a neural network.
2. It is easy to see if the model is generating even numbers by looking at the lowest bit. If it’s a one the number is odd, if it’s a zero the number is even.

In [24]:
def create_binary_list_from_int(number): 
    if number < 0 or type(number) is not int:
        raise ValueError("Only Positive integers are allowed")

    return [int(x) for x in list(bin(number))[2:]]

In [25]:
for i in range(10):
    print(create_binary_list_from_int(i))

[0]
[1]
[1, 0]
[1, 1]
[1, 0, 0]
[1, 0, 1]
[1, 1, 0]
[1, 1, 1]
[1, 0, 0, 0]
[1, 0, 0, 1]


The next function will produce two outputs the first is a list of ones representing that this data is even and comes from our true distribution. The second output is a random even number in binary list form.

In [26]:
# generate training data
def generate_even_data(max_int, batch_size=16):
    # Get the number of binary places needed to represent the maximum number
    max_length = int(math.log(max_int, 2))

    # Sample batch_size number of integers in range 0-max_int
    sampled_integers = np.random.randint(0, int(max_int / 2), batch_size)

    # create a list of labels all ones because all numbers are even
    labels = [1] * batch_size

    # Generate a list of binary numbers for training.
    data = [create_binary_list_from_int(int(x * 2)) for x in sampled_integers]
    data = [([0] * (max_length - len(x))) + x for x in data]

    return labels, data

In [27]:
generate_even_data(100)

([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 [[1, 0, 0, 0, 1, 0],
  [1, 1, 1, 1, 0, 0],
  [0, 1, 1, 0, 1, 0],
  [1, 0, 1, 0, 1, 0, 0],
  [1, 0, 1, 1, 1, 0, 0],
  [0, 1, 1, 0, 0, 0],
  [1, 0, 1, 1, 0, 0, 0],
  [1, 0, 0, 0, 0, 0],
  [1, 1, 0, 0, 0, 0],
  [1, 0, 0, 1, 1, 1, 0],
  [0, 1, 0, 0, 1, 0],
  [0, 0, 1, 0, 0, 0],
  [1, 1, 0, 0, 0, 1, 0],
  [0, 0, 0, 1, 1, 0],
  [1, 0, 0, 1, 0, 1, 0],
  [0, 1, 0, 1, 1, 0]])

# Building the Generator and Discriminator
## Generator
We need something capable of mapping random seven digit binary input to seven digit binary input that is even. The simplest possible thing here is a single seven neuron layer.

In [120]:
class Generator(nn.Module):
    def __init__(self, input_length):
        super(Generator, self).__init__()
        self.fc = nn.Linear(int(input_length), int(input_length))
        self.activation = nn.Sigmoid()
        
    def forward(self, x):
        return self.activation(self.fc(x))

If we were building a GAN to do something more complicated on say images we would probably train it using random noise generated from a normal distribution and gradually upsample and reshape it until it’s the same size as the data we are trying to copy. Since our example is so simple, a single linear layer with a logistic (sigmoid) activation should be enough to map ones and zeros in seven positions to other ones and zeros in seven positions.


## Discriminator
The Discriminator is no more complicated than the Generator. Here we need a model to take in a seven digit binary input and output whether or not it is from our real data distribution (is even) or not (is odd or not a number). To accomplish this we use a single neuron model (logistic regression) with a logistic activation (Sigmoid).

In [121]:
class Discriminator(nn.Module):
    def __init__(self, input_length):
        super(Discriminator, self).__init__()
        self.fc = nn.Linear(int(input_length), 1)
        self.activation = nn.Sigmoid()
        
    def forward(self, x):
        return self.activation(self.fc(x))

Now for the tricky part of GAN training, the training. We need to link these models up in a way that can propagate the gradients around correctly.

# Training the model

We need to update two models with every bit of input and we need to be careful about how we do that. So to break it down, we pass two batches of data to our model at every training step. One batch is random noise which will cause the generator to create some generated data, and the second batch is composed solely of data from our true distribution. 

In [164]:
def train(max_int=128, batch_size=16, training_steps=500):
    input_length = int(math.log(max_int, 2))
    
    # models
    generator = Generator(input_length)
    discriminator = Discriminator(input_length)
    
    # optimizers
    generator_opt = torch.optim.Adam(generator.parameters(), lr=.001)
    discriminator_opt = torch.optim.Adam(discriminator.parameters(), lr=.001)
    
    # loss
    loss = nn.BCELoss()
    
    for i in range(training_steps):
        # zero gradient on each iteration
        generator_opt.zero_grad()
        
        # create noisy input for generator
        noise = torch.randint(0, 2, size=(batch_size, input_length)).float()
        generated_data = generator(noise)#.reshape(batch_size,-1)
        
        # generate examples of (even) real data
        true_labels, true_data = generate_even_data(max_int, batch_size=batch_size)
        true_labels = torch.tensor(true_labels).float().reshape(batch_size,-1)
        true_data = torch.tensor(true_data).float()
        
        # train generator
        # Get the predictions from the discriminator on the “fake” data
        generator_discriminator_out = discriminator(generated_data)
        # Calculate the loss from the discriminator’s output using labels 
        # as if the data were “real” instead of fake
        generator_loss = loss(generator_discriminator_out, true_labels)
        # Backpropagate the error through just the generator
        generator_loss.backward()
        generator_opt.step()
        
        
        # train the discriminator on the true/generated data
        discriminator_opt.zero_grad()
        # Pass in a batch of only data from the true data set with a vector of all one labels
        true_discriminator_out = discriminator(true_data)
        true_discriminator_loss = loss(true_discriminator_out, true_labels)

        # Pass our generated data into the discriminator, with detached weights, and zero labels.
        generator_discriminator_out = discriminator(generated_data.detach())
        # Pass our generated data into the discriminator, with detached weights, and zero labels
        generator_discriminator_loss = loss(generator_discriminator_out, torch.zeros(batch_size).reshape(-1,1))
        # Average losses
        discriminator_loss = (true_discriminator_loss + generator_discriminator_loss) / 2
        # Backpropagate the gradients through just the discriminator
        discriminator_loss.backward()
        discriminator_opt.step()
        
        if i % 10 == 0:
            print(generated_data>.5)

Notice how we use true labels instead of fake labels for calculating the loss of the **generator**. This is because we are training the generator. The generator should be trying to fool the discriminator so when the discriminator makes a mistake and says the generated output is real (predicts 1) then the gradients should be small, when the discriminator acts correctly and predicts that the output is generated (predicts 0) the gradients should be big. This is why we only propagate the gradients through the generator at this step, because we inverted the labels. If we trained the entire model like this either the generator would learn the wrong thing or the discriminator would.

The **discriminator** is trying to learn to distinguish real data from “fake” generated data. The labels while training the discriminator need to represent that, i.e. one when our data comes from the real data set and zero when it is generated by our generator. We pass in those two batches and then average the loss from the two batches. It’s important to note that when passing in the generated data we want to detach the gradients. We do this because we are not training the generator we are just focused on the discriminator. Once all of that is done we backpropagate the gradients in only the discriminator and we are done.

In [165]:
train(max_int=128, batch_size=16, training_steps=500)

tensor([[ True, False,  True,  True, False, False,  True],
        [False, False, False, False, False,  True,  True],
        [ True,  True,  True, False, False, False,  True],
        [ True, False, False, False, False, False,  True],
        [ True, False, False, False, False, False,  True],
        [ True,  True, False, False, False, False,  True],
        [ True, False,  True, False, False, False,  True],
        [ True, False,  True,  True, False, False,  True],
        [ True, False,  True, False, False, False,  True],
        [ True, False,  True, False, False, False,  True],
        [False, False, False, False, False, False,  True],
        [ True,  True,  True, False, False, False, False],
        [ True, False, False, False, False, False,  True],
        [ True,  True,  True, False, False, False, False],
        [ True, False, False, False, False, False,  True],
        [ True, False,  True, False, False, False,  True]])
tensor([[ True, False,  True, False, False, False, Fals

Last entry should be 0 (False) if number is even.