In [None]:
# For tips on running notebooks in Google Colab, see
# https://pytorch.org/tutorials/beginner/colab
%matplotlib inline

# MNIST Experimentation

In this file I will experiment with creating a simple image classification network, and then develop it into a simple GAN, from scratch as much as possible. For this, we will be using the MNIST dataset, a dataset of 28x28 pixel images containing handwritten numbers.

## Part I: Data acquisition and EDA

First things first we will download the data and have a look through it to observe the properties of the data we are dealing with.

In [None]:
from pathlib import Path
import requests

DATA_PATH = Path("data")
PATH = DATA_PATH / "mnist"

PATH.mkdir(parents=True, exist_ok=True)

URL = "https://github.com/pytorch/tutorials/raw/main/_static/"
FILENAME = "mnist.pkl.gz"

if not (PATH / FILENAME).exists():
        content = requests.get(URL + FILENAME).content
        (PATH / FILENAME).open("wb").write(content)

In [None]:
import pickle
import gzip
import tqdm

with gzip.open((PATH / FILENAME).as_posix(), "rb") as f:
        ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding="latin-1")

In [None]:
print(f"Training input shape: {x_train.shape}")
print(f"Training output shape: {y_train.shape}")
print(f"Validation input shape: {x_valid.shape}")
print(f"Validation output shape: {y_valid.shape}")

In [None]:
# lets look at our labels

print(f"Label set: {set(y_train)}")

In [None]:
example_image = x_train[0]
print(f"Example image raw: \n{example_image}")

Obviously this is a little hard to parse, but the important detail is that this is a sparsely populated vector, with values $0 \geq x \geq 1$ . What this points to is a greyscale image where 0 represents white space and 1 represents black, and gradiations in between represent corresponding shades of grey.

In [None]:
from matplotlib import pyplot
import numpy as np

example_image_reshaped = example_image.reshape((28, 28))
print(f"Example image reshaped: \n{example_image_reshaped}")

pyplot.imshow(example_image_reshaped, cmap="gray")
example_label = y_train[0]
print(f"Example label: {example_label}")


Makes sense. So now our tensor is shaped (28, 28), how should we interpret this. The first dimension can be thought of as each pixels row, and the second dimension can be thought of as the column. To see how, lets artificially crop the image.

In [None]:
cropped_along_horizontal_axis = example_image_reshaped[:len(example_image_reshaped)//2]
pyplot.imshow(cropped_along_horizontal_axis, cmap="gray")

In [None]:
cropped_along_vertical_axis = example_image_reshaped[:, :len(example_image_reshaped)//2]
pyplot.imshow(cropped_along_vertical_axis, cmap="gray")

## Part II: First Pass Convolutional Model

First we are going to step by step make a convolutional model which categorises the images.

From some online research I found a "general" setup:

> We will stack 3 {convolution + relu + maxpooling} modules. Our convolutions operate on 3x3 windows and our maxpooling layers operate on 2x2 windows. Our first convolution extracts 16 filters, the following one extracts 32 filters, and the last one extracts 64 filters.

That comes from [this tutorial](https://colab.research.google.com/github/google/eng-edu/blob/main/ml/pc/exercises/image_classification_part1.ipynb?utm_source=practicum-IC&utm_campaign=colab-external&utm_medium=referral&hl=en&utm_content=imageexercise1-colab#scrollTo=5oqBkNBJmtUv). First, lets do some package imports.

In [None]:
import torch
from torch import nn
from torchvision import utils as vutils

# set random seed

torch.manual_seed(42)


### The convolution layers.

First, lets see how the convolution layers work. Lets take the reshaped example image and pass it through some example convolutional layers to inspect their behavior.

Convolutional layers expect tensors of the form $
 (C, H, W) $ where H and W are the height and width respectively in pixels of the image, and C are the channels of the image. Most images have a channel of 3, representing the exact colour in the pixel via a RGB (red-green-blue) coordinate. However, our image is grayscale, and therefore one-dimensional. Therefore, we simply need to unsqueeze the original 28x28 images to have an extra 1-dimension to the tensor to fit the input specifications.

In [None]:
example_image_tensor = torch.tensor(example_image_reshaped, dtype=torch.float32).unsqueeze(0)

conv_layer_1_filter = nn.Conv2d(1, 1, 3) # the arguments here are (channel, num_filters, convolution_pixel_size)
output_1_filter = conv_layer_1_filter(example_image_tensor)

conv_layer_16_filter = nn.Conv2d(1, 16, 3) # the arguments here are (channel, num_filters, convolution_pixel_size)
output_16_filter = conv_layer_16_filter(example_image_tensor)

print(output_1_filter.shape)
print(output_16_filter.shape)

Lets just quickly see what this would look like if we did have multiple channels

In [None]:
example_image_tensor_3_channels = example_image_tensor.repeat(3, 1, 1)

print(f"New tensor shape: {example_image_tensor_3_channels.shape}")

conv_layer_1_filter_3_channels = nn.Conv2d(3, 1, 3)

output_1_filter_3_channels = conv_layer_1_filter_3_channels(example_image_tensor_3_channels)

print(f"Output shape: {output_1_filter_3_channels.shape}")

Next, lets observe how max pooling works.

In [None]:
max_pool_layer = nn.MaxPool2d(2)

output_max_pool = max_pool_layer(output_16_filter)

print(output_max_pool.shape)

We've halved the height and width of the filters by taking only the most significant values for each 2x2 square in each filter. Lets take an example, max pool it, and observe.

In [None]:
example = torch.rand(1, 26, 26)
max_pooled_example = max_pool_layer(example)
print(example[0, :2])
print(max_pooled_example[0, 0])

It's a little hard to read, but you can see by taking the first two elements of the first and second row in the example, the highest of those will be represented in the max-pooled output. Lets spell it out a bit more explicitly.

In [None]:
def explicit_max_pool(i):
  kernel = example[0][0, i:i+2].tolist() + example[0][1, i:i+2].tolist()
  print(f"The current kernel captures values: {kernel}. The max pooled value is {max(kernel)}")

explicit_max_pool(1)

Now we understand these processes, lets make our network matching the specifications laid out above. Note, by the way, the flatten process simply stretches out the dimensions of the tensor into one long array, while ReLU converts all values in the tensor below 0 to 0.

In [None]:
class ConvolutionalNetworkV1(nn.Module):
  def __init__(self):
    super().__init__()
    self.sequence = nn.Sequential(
        nn.Conv2d(1, 16, 3),
        nn.ReLU(),
        nn.MaxPool2d(2),
        nn.Conv2d(16, 32, 3),
        nn.ReLU(),
        nn.MaxPool2d(2),
        nn.Conv2d(32, 64, 3),
        nn.ReLU(),
        nn.MaxPool2d(2),
        nn.Flatten(),
        nn.Linear(64, 10)
    )

  def forward(self, x):
    return self.sequence(x)

In [None]:
conv_net_v1 = ConvolutionalNetworkV1()
output_v1 = conv_net_v1(example_image_tensor.unsqueeze(0))
output_v1.squeeze()

The model is outputting data in the correct format! Hooray! Lets send our model to the GPU and build a training loop.

In [None]:
if torch.cuda.is_available():
  conv_net_v1.cuda()

In [None]:
optim = torch.optim.Adam(conv_net_v1.parameters(), lr=0.001)
loss_function = nn.CrossEntropyLoss()

x_train_tensor = torch.tensor(x_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.int64)

x_valid_tensor = torch.tensor(x_valid, dtype=torch.float32)
y_valid_tensor = torch.tensor(y_valid, dtype=torch.int64)

train_dataset = torch.utils.data.TensorDataset(x_train_tensor, y_train_tensor)
valid_dataset = torch.utils.data.TensorDataset(x_valid_tensor, y_valid_tensor)

train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
valid_dataloader = torch.utils.data.DataLoader(valid_dataset, batch_size=64)

In [None]:
def calculate_validation_loss_and_f1(model, valid_dataloader):
  model.eval()
  total_loss = 0
  predicted_labels = []
  true_labels = []
  for batch in valid_dataloader:
    x, y = batch
    x_reshaped = x.unsqueeze(1).reshape(-1, 1, 28, 28).to("cuda")
    y = y.to("cuda")
    output = model(x_reshaped)
    loss = loss_function(output, y)
    total_loss += loss.item()
  return total_loss / len(valid_dataloader), multiclass_f1_score(output, y, num_classes=10, average='weighted')

In [None]:
epochs = 10
show_loss = False
train = False

if train:

  for epoch in range(epochs):
    conv_net_v1.train()
    for i, batch in enumerate(train_dataloader):
      x, y = batch
      optim.zero_grad()
      x_reshaped = x.unsqueeze(1).reshape(-1, 1, 28, 28).to("cuda")
      y = y.to("cuda")
      output = conv_net_v1(x_reshaped)
      loss = loss_function(output, y)
      loss.backward()
      optim.step()

      if show_loss and i % 100 == 0:
        print(f"Epoch: {epoch}, Batch: {i}, Loss: {loss.item()}")


    validation_loss, f1 = calculate_validation_loss_and_f1(conv_net_v1, valid_dataloader)
    print(f"Epoch: {epoch}, Validation Loss: {validation_loss}, F1: {f1}")

In [None]:
conv_net_v1.eval().to("cpu")

In [None]:
conv_net_v1(example_image_tensor.unsqueeze(0).reshape(-1, 1, 28, 28)).max(1).indices

In [None]:
pyplot.imshow(example_image_reshaped, cmap="gray")

It worked! Lets try another.

In [None]:
example_2 = x_train[1]
example_2_reshaped = example_2.reshape((28, 28))
example_2_tensor = torch.tensor(example_2_reshaped, dtype=torch.float32).unsqueeze(0)

conv_net_v1(example_2_tensor.unsqueeze(0).reshape(-1, 1, 28, 28)).max(1).indices

In [None]:
pyplot.imshow(example_2_reshaped, cmap="gray")

Another success! Time to move on.

## Part III: Developing a GAN

Generative Adversarial Networks are composed of a generator network and a discriminator network. Discriminator networks have the responsibility of determining true examples of a category from false examples of a category. We already have a discriminator whose current responsibility is classifying the images into each digit.

The generator is a model which takes in random noise and generates an output which should look like the training images, at least enough to fool the discriminator. For our purposes, lets just make this a standard feedforward neural network.

First, lets modify our convolutional network to make it into a discriminator which classifies inputs images as 1 (an image from the training set) or 0 (an image created by the generator).

In [None]:
class Discriminator(nn.Module):
  def __init__(self):
    super().__init__()
    self.sequence = nn.Sequential(
        nn.Conv2d(1, 16, 3),
        nn.ReLU(),
        nn.MaxPool2d(2),
        nn.Conv2d(16, 32, 3),
        nn.ReLU(),
        nn.MaxPool2d(2),
        nn.Conv2d(32, 64, 3),
        nn.ReLU(),
        nn.MaxPool2d(2),
        nn.Flatten(),
        nn.Linear(64, 1) # THIS IS THE ONLY CHANGED LINE FROM BEFORE
    )

  def forward(self, x):
    return self.sequence(x)

discriminator = Discriminator()
discriminator_optim = torch.optim.Adam(discriminator.parameters(), lr=0.001)

Now let's create a super simple generator. This will be a small feedforward network with a Sigmoid layer at the end to convert the output into a 28x28 pixel image with all values between 0 and 1.

In [None]:
class Generator(nn.Module):
  def __init__(self):
    super().__init__()
    self.sequence = nn.Sequential(
        nn.Linear(100, 100),
        nn.ReLU(),
        nn.Linear(100, 100),
        nn.ReLU(),
        nn.Linear(100, 28*28),
        nn.Sigmoid()
    )

  def forward(self, x):
    return self.sequence(x)

generator = Generator()
generator_optim = torch.optim.Adam(generator.parameters(), lr=0.001)

Sigmoid functions convert all elements to be between 0 and 1, see below:

In [None]:
sigmoid = nn.Sigmoid()
elements = torch.tensor([-5, -2, 0, 2, 5])
sigmoid(elements)

We also need to be able to create our dataset. For GANs, the only data we really care about keeping on hand is 'true' training data. 'False' training data can be generated by sending random noise through the generator. So, lets make a dataset for the 'true' data.

In [None]:
def create_true_image_dataloader(device):
  true_images = torch.tensor(x_train, dtype=torch.float32).to(device)
  dataset = torch.utils.data.TensorDataset(true_images)
  return torch.utils.data.DataLoader(dataset, batch_size=64, shuffle=True)

Now we can create our annotated dataset, we need a loss function to train the discriminator. This can just be BCE loss.

We also need to store a true label and a false label

In [None]:
loss_function = nn.BCEWithLogitsLoss()
real_label = 1.0
fake_label = 0.0

We now define a function which generates random noise for our generator to use to produce an image. Our generator is a simple feedforward network, so the noise is just to be the size of our input feature vector (100), by a batch size. This can then be passed to our generator's forward method.

In [None]:
def get_noise(batch_size):
  return torch.randn(
        batch_size, 100,
        device=device
        )

Now we write the important learning methods. We alternate between training the generator and the discriminator. The important thing here is that we are using the binary cross entropy loss to backpropagate loss through **both** the discriminator and the generator to alternatingly train both.

The discriminator is fed in both real and fake images as 1 channel, 28x28 vectors. Its task is to correctly classify the real and fake images. Cross entropy loss is a function which can represent this problem. So, when we feed in our data, we get a loss that penalises the model the more it incorrectly categorises the images it sees. This loss gets backpropagated through the discriminator and we use the optimiser to update the **weights of the discriminator only.**

In [None]:
def train_discriminator(
    batch,
    discriminator,
    generator,
    discriminator_optim,
    ):
  
  # Step 1: Train the discriminator on the real images
  discriminator_optim.zero_grad()
  output = discriminator(batch).view(-1)
  label = torch.full((batch.shape[0],), real_label, dtype=torch.float, device=device)

  t_loss = loss_function(output, label)
  t_loss.backward()
  D_x = output.mean().item() # the average prediction of the discriminator for real images


  # Step 2: Train the discriminator on the fake images
  noise = get_noise(batch.shape[0])
  generated_images = generator(noise)
  generated_images = generated_images.reshape(-1, 1, 28, 28)
  output = discriminator(generated_images).view(-1)
  label.fill_(fake_label)

  f_loss = loss_function(output, label)
  f_loss.backward()
  D_G_z1 = output.mean().item() # the average prediction of the discriminator for fake images

  loss = t_loss + f_loss # total loss for the batch
  discriminator_optim.step()

  return loss.item(), D_G_z1, D_x

The generator's update function performs an inversion of the discriminators loss. We similarly use BCE loss, however we are only concerned with the fake images. The loss here penalises the discriminator for **not classifying the images as true.** This loss then gets backpropagated through the discriminator and into the generator. The optimiser then **solely updates the parameters of the generator**.

*NB: this technique can sometimes be vulnerable to the vanishing gradient problem, where the discriminator is extremely confident in classifying the image and does not give a good signal for which way the generator can improve to produce more convincing images.*

In [None]:
def train_generator(
    batch,
    discriminator,
    generator,
    generator_optim,
    ):
  generator_optim.zero_grad()

  noise = get_noise(batch.shape[0])
  generated_images = generator(noise)
  generated_images = generated_images.reshape(-1, 1, 28, 28)
  output = discriminator(generated_images)
  loss = loss_function(output.squeeze(), torch.ones(batch.shape[0], dtype=torch.float32, device=device))

  loss.backward()
  generator_optim.step()
  D_G_z2 = output.mean().item()

  return loss.item(), D_G_z2


In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
fixed_noise = get_noise(64)
fixed_noise = fixed_noise.to(device)

In [None]:

generated_image_vectors = []

def train_epoch(epoch, discriminator, generator, discriminator_optim, generator_optim, generated_image_vectors):
  dataloader = create_true_image_dataloader(device)
  discriminator.train().to(device)
  generator.train().to(device)

  epoch_d_loss = 0
  epoch_g_loss = 0

  i = 0

  for batch in tqdm.tqdm(dataloader):

    batch_true_images = batch[0].reshape(-1, 1, 28, 28)
    # train discriminator
    d_loss, D_G_z1, D_x = train_discriminator(
        batch_true_images,
        discriminator,
        generator,
        discriminator_optim,
        )
    # train generator
    g_loss, D_G_z2 = train_generator(
        batch_true_images,
        discriminator,
        generator,
        generator_optim,
        )

    epoch_d_loss += d_loss
    epoch_g_loss += g_loss

    i += 1

  generator.eval() # shouldn't make any difference here, but best practice
  with torch.no_grad():
    fake = generator(fixed_noise).detach().cpu().reshape(-1, 1, 28, 28)

  generated_image_vectors.append(vutils.make_grid(fake, padding=2, normalize=True))


  print(f"Epoch {epoch} discriminator loss: {epoch_d_loss / len(dataloader)}")
  print(f"Epoch {epoch} generator loss: {epoch_g_loss / len(dataloader)}")


In [None]:
for epoch in range(10):
  train_epoch(
      epoch,
      generator=generator,
      generator_optim=generator_optim,
      discriminator=discriminator,
      discriminator_optim=discriminator_optim,
      generated_image_vectors=generated_image_vectors
      )

In [None]:
pyplot.imshow(np.transpose(generated_image_vectors[-1],(1,2,0)))

Our model is learning! The issue, however, is that our generator's output sucks. Lets make a new generator, following the inverse of our discriminator network. This follows a [tutorial](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html) on GANs released on PyTorch's website.

The core idea here is that where we use Conv2d layers in the discriminator, we use ConvTranspose2d here. ConvTranspose2d are 'transposed convolution operations'. This sounds confusing and scary. What you should understand it as is the inverse of a convolution operation. You know how convolution takes lots of filters and uses them on an image to create smaller versions of that image? This should be imagined as taking small versions of images and exploding them out into larger images with fewer features. From end-to-end, we can transform a n-dimensional array (conceptually a 1-pixel image with n features), into our 28x28 pixel image with a feature depth of 1 (whether the pixel is black or white).

We implement that below, where our input 100-dimensional feature array gets transformed into a 28x28 image with 1 feature, and finally applying a sigmoid function to convert the outputs to be between 0 and 1 (as our greyscale image values are required to be).

In [None]:
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            nn.ConvTranspose2d(100, 128, 4, stride=1, padding=0),  # Output: (batch, 128, 4, 4)
            nn.BatchNorm2d(128),
            nn.ReLU(True),

            nn.ConvTranspose2d(128, 64, 4, stride=2, padding=1),   # Output: (batch, 64, 8, 8)
            nn.BatchNorm2d(64),
            nn.ReLU(True),

            nn.ConvTranspose2d(64, 32, 4, stride=2, padding=1),    # Output: (batch, 32, 16, 16)
            nn.BatchNorm2d(32),
            nn.ReLU(True),

            nn.ConvTranspose2d(32, 1, 4, stride=2, padding=1),     # Output: (batch, 1, 32, 32)
            # note we now use a convolutional layer, not a transpose convolutional layer to get to the final image
            nn.Conv2d(1, 1, kernel_size=5),                        # Output: (batch, 1, 28, 28)
            nn.Sigmoid()
        )

    def forward(self, input):
        return self.main(input)


    def forward(self, input):
        return self.main(input)


The consequence to this new generator architecture is that its inputs are expected to take a different shape, that being $ (B, C, H, W)$, respectively batch, channel (the feature dimension), height, and width. So lets alter our noise function that creates input for our generator, and train again.

In [None]:
def get_noise(batch_size):
  return torch.randn(
        batch_size, 100, 1, 1,
        device=device
        )

In [None]:
fixed_noise = get_noise(64)
fixed_noise = fixed_noise.to(device)

In [None]:
generator = Generator()
generator_optim = torch.optim.Adam(generator.parameters(), lr=0.01)
discriminator = Discriminator()
discriminator_optim = torch.optim.Adam(discriminator.parameters(), lr=0.0001)

generated_image_vectors = []

for epoch in range(10):
  train_epoch(
      epoch,
      generator=generator,
      generator_optim=generator_optim,
      discriminator=discriminator,
      discriminator_optim=discriminator_optim,
      generated_image_vectors=generated_image_vectors,
      )

In [None]:
pyplot.imshow(np.transpose(generated_image_vectors[-1],(1,2,0)))

The results you'll see from here could be extremely varied. Sometimes the results can look a bit like hand-drawn numbers, and other times its complete garbage. Lets try some alterations, again following [this tutorial](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html).

The first this I'll do is to give the generator more layers, and more of a chance to learn. When training, I'll lower the learning rate for the discriminator and raise it for the generator. The hypothesis is that because the discriminator is outperforming the generator, we'll give the generator more of an advantage. The potential downside is that the learning rate will prevent the generator finding a true optimum as it bounces around too wildly.

In [None]:
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            nn.ConvTranspose2d(50, 128, 4, 1, 0, bias=False),  # Output: (batch, 256, 4, 4)
            nn.BatchNorm2d(128),
            nn.ReLU(True),

            nn.ConvTranspose2d(128, 64, 4, 2, 1, bias=False),  # Output: (batch, 128, 8, 8)
            nn.BatchNorm2d(64),
            nn.ReLU(True),

            nn.ConvTranspose2d(64, 32, 4, 2, 1, bias=False),   # Output: (batch, 64, 16, 16)
            nn.BatchNorm2d(32),
            nn.ReLU(True),

            nn.ConvTranspose2d(32, 16, 4, 2, 1, bias=False),   # Output: (batch, 64, 16, 16)
            nn.BatchNorm2d(16),
            nn.ReLU(True),

            nn.Conv2d(16, 1, kernel_size=5),   # Output: (batch, 1, 28, 28)
            nn.Sigmoid()
        )

    def forward(self, input):
        return self.main(input)


    def forward(self, input):
        return self.main(input)

Generator()


In [None]:
def get_noise(batch_size):
  return torch.randn(
        batch_size, 50, 1, 1,
        device=device
        )

In [None]:
fixed_noise = get_noise(64)
fixed_noise = fixed_noise.to(device)

In [None]:
generator = Generator()
generator_optim = torch.optim.Adam(generator.parameters(), lr=0.01)
discriminator = Discriminator()
discriminator_optim = torch.optim.Adam(discriminator.parameters(), lr=0.0001)

generated_image_vectors = []

for epoch in range(10):
  train_epoch(
      epoch,
      generator=generator,
      generator_optim=generator_optim,
      discriminator=discriminator,
      discriminator_optim=discriminator_optim,
      generated_image_vectors=generated_image_vectors,
      )

In [None]:
pyplot.imshow(np.transpose(generated_image_vectors[-8],(1,2,0)))

This doesn't seem to result in a huge change. Lets try something else. One technique commonly used in these models is LeakyReLU. This is a variation on ReLU.

ReLU usually simply sets all negative activations in a layer to 0. LeakyReLU allows negative signals to "leak through" by providing a small slope at negative values rather than just setting them to 0. This gives more information to propagate in the network. Given we're propagating information back through the discriminator into the generator, we're giving the generator **more information** from the discriminator for it to use to improve.

In [None]:
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.main = nn.Sequential(
            # input is ``(nc) x 64 x 64``
            nn.Conv2d(1, 16, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. ``(ndf) x 32 x 32``
            nn.Conv2d(16, 32, 4, 2, 1, bias=False),
            nn.BatchNorm2d(32),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. ``(ndf*2) x 16 x 16``
            nn.Conv2d(32, 64, 4, 2, 1, bias=False),
            nn.BatchNorm2d(64),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. ``(ndf*4) x 8 x 8``
            nn.Conv2d(64, 1, 4, 2, 1, bias=False),
        )

    def forward(self, input):
        return self.main(input)

In [None]:
generator = Generator()
generator_optim = torch.optim.Adam(generator.parameters(), lr=0.01)
discriminator = Discriminator()
discriminator_optim = torch.optim.Adam(discriminator.parameters(), lr=0.0001)

generated_image_vectors = []

for epoch in range(10):
  train_epoch(
      epoch,
      generator=generator,
      generator_optim=generator_optim,
      discriminator=discriminator,
      discriminator_optim=discriminator_optim,
      generated_image_vectors=generated_image_vectors,
      )

In [None]:
pyplot.imshow(np.transpose(generated_image_vectors[-1],(1,2,0)))

We're still not seeing a large change! Lets add beta parameters for Adam optimizers and modify the learning rate as in the tutorial.

(Go into what Betas do)

In [None]:
generator = Generator()
generator_optim = torch.optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
discriminator = Discriminator()
discriminator_optim = torch.optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

generated_image_vectors = []

for epoch in range(10):
  train_epoch(
      epoch,
      generator=generator,
      generator_optim=generator_optim,
      discriminator=discriminator,
      discriminator_optim=discriminator_optim,
      generated_image_vectors=generated_image_vectors,
      )

In [None]:
pyplot.imshow(np.transpose(generated_image_vectors[-1],(1,2,0)))

We (finally) see a big improvement. We went from very junky output to something that basically fits the bill of hand-drawn digits. It does still look quite artificial (a human could probably tell most of these are generated). Lets try one more trick implemented in the tutorial, custom weight initialization.

In [None]:
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

In [None]:
generator = Generator()
generator.apply(weights_init)
generator_optim = torch.optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
discriminator = Discriminator()
discriminator.apply(weights_init)
discriminator_optim = torch.optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

generated_image_vectors = []

for epoch in range(10):
  train_epoch(
      epoch,
      generator=generator,
      generator_optim=generator_optim,
      discriminator=discriminator,
      discriminator_optim=discriminator_optim,
      generated_image_vectors=generated_image_vectors,
      )

In [None]:
pyplot.imshow(np.transpose(generated_image_vectors[-1],(1,2,0)))

This definitely results in some more improvement. The numbers look a lot more natural. By far, however, we can see the biggest improvement came from setting the learning rate and beta parameters.

## Closing throughts

We've now created a functional GAN network that can draw digits pretty convincingly. We have learned about:

- image data
- convolutional operations, both normal and transposed
- GAN architectures and learning algorithms
- improvements on GANs that take their output from garbage to realistic images

Something you can use this notebook in particular to play around with is the order in which you apply the improvements I've talked about in this notebook. What I noticed, for example, was that setting the right learning rate and beta value were much more important than modifying the architecture of the model too much. In fact, in rough order of importance I would rank the improvements in the following way:

1. Beta and LR values
2. Weight initialization
3. Discriminator leaky ReLU
4. Generator additional layers

You might find different results! I hope this tutorial was helpful 😁 thank you for following along!