# Generating and Translating Emojis with Generative Adversarial Networks

# Introduction

In this project we explore the exciting world of generative adversarial networks (GAN) by generating emojis using two different methods and GAN model architctures. 

In the first part, we implement a specific type of GAN designed to
process images, called a Deep Convolutional GAN (DCGAN). We train the DCGAN to generate emojis from samples of random noise.

In the second part, we look at a more complex GAN
architecture called CycleGAN, which was designed for the task of image-to-image translation. We train the CycleGAN to convert between Apple-style and
Windows-style emojis.

This project can also be found on my [Github](https://github.com/rizkmena/Generating-and-Translating-Emojis-with-Generative-Adversarial-Networks).

# 0. Setup and Helper Code
First we get some setup and helper code out of the way.

## 0.1. PyTorch Setup


In [None]:
######################################################################
# Setup python environment and changing the current working directory
######################################################################
!pip install torch torchvision
!pip install imageio

!pip install matplotlib

%mkdir -p /content/temp/
%cd /content/temp


## 0.2. Utility Functions

In [None]:
import os

import numpy as np
import matplotlib.pyplot as plt
import cv2

import torch
from torch import nn
from torch.nn import Parameter
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision import transforms

from six.moves.urllib.request import urlretrieve
import tarfile

import imageio
from IPython.display import Image
from urllib.error import URLError
from urllib.error import HTTPError

class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self

                
def to_var(tensor, cuda=True):
    """Wraps a Tensor in a Variable, optionally placing it on the GPU.

        Arguments:
            tensor: A Tensor object.
            cuda: A boolean flag indicating whether to use the GPU.

        Returns:
            A Variable object, on the GPU if cuda==True.
    """
    if cuda:
        return Variable(tensor.cuda())
    else:
        return Variable(tensor)

    
def to_data(x):
    """Converts variable to numpy."""
    if torch.cuda.is_available():
        x = x.cpu()
    return x.data.numpy()


def create_dir(directory):
    """Creates a directory if it doesn't already exist.
    """
    if not os.path.exists(directory):
        os.makedirs(directory)


def gan_checkpoint(iteration, G, D, opts):
    """Saves the parameters of the generator G and discriminator D.
    """
    G_path = os.path.join(opts.checkpoint_dir, 'G.pkl')
    D_path = os.path.join(opts.checkpoint_dir, 'D.pkl')
    torch.save(G.state_dict(), G_path)
    torch.save(D.state_dict(), D_path)


def cyclegan_checkpoint(iteration, G_XtoY, G_YtoX, D_X, D_Y, opts):
    """Saves the parameters of both generators G_YtoX, G_XtoY and discriminators D_X, D_Y.
    """
    G_XtoY_path = os.path.join(opts.checkpoint_dir, 'G_XtoY.pkl')
    G_YtoX_path = os.path.join(opts.checkpoint_dir, 'G_YtoX.pkl')
    D_X_path = os.path.join(opts.checkpoint_dir, 'D_X.pkl')
    D_Y_path = os.path.join(opts.checkpoint_dir, 'D_Y.pkl')
    torch.save(G_XtoY.state_dict(), G_XtoY_path)
    torch.save(G_YtoX.state_dict(), G_YtoX_path)
    torch.save(D_X.state_dict(), D_X_path)
    torch.save(D_Y.state_dict(), D_Y_path)


def load_checkpoint(opts):
    """Loads the generator and discriminator models from checkpoints.
    """
    G_XtoY_path = os.path.join(opts.load, 'G_XtoY.pkl')
    G_YtoX_path = os.path.join(opts.load, 'G_YtoX.pkl')
    D_X_path = os.path.join(opts.load, 'D_X.pkl')
    D_Y_path = os.path.join(opts.load, 'D_Y.pkl')

    G_XtoY = CycleGenerator(conv_dim=opts.g_conv_dim, init_zero_weights=opts.init_zero_weights)
    G_YtoX = CycleGenerator(conv_dim=opts.g_conv_dim, init_zero_weights=opts.init_zero_weights)
    D_X = DCDiscriminator(conv_dim=opts.d_conv_dim)
    D_Y = DCDiscriminator(conv_dim=opts.d_conv_dim)

    G_XtoY.load_state_dict(torch.load(G_XtoY_path, map_location=lambda storage, loc: storage))
    G_YtoX.load_state_dict(torch.load(G_YtoX_path, map_location=lambda storage, loc: storage))
    D_X.load_state_dict(torch.load(D_X_path, map_location=lambda storage, loc: storage))
    D_Y.load_state_dict(torch.load(D_Y_path, map_location=lambda storage, loc: storage))

    if torch.cuda.is_available():
        G_XtoY.cuda()
        G_YtoX.cuda()
        D_X.cuda()
        D_Y.cuda()
        print('Models moved to GPU.')

    return G_XtoY, G_YtoX, D_X, D_Y


def merge_images(sources, targets, opts):
    """Creates a grid consisting of pairs of columns, where the first column in
    each pair contains images source images and the second column in each pair
    contains images generated by the CycleGAN from the corresponding images in
    the first column.
    """
    _, _, h, w = sources.shape
    row = int(np.sqrt(opts.batch_size))
    merged = np.zeros([3, row * h, row * w * 2])
    for (idx, s, t) in (zip(range(row ** 2), sources, targets, )):
        i = idx // row
        j = idx % row
        merged[:, i * h:(i + 1) * h, (j * 2) * h:(j * 2 + 1) * h] = s
        merged[:, i * h:(i + 1) * h, (j * 2 + 1) * h:(j * 2 + 2) * h] = t
    return merged.transpose(1, 2, 0)


def generate_gif(directory_path, keyword=None):
    images = []
    for filename in sorted(os.listdir(directory_path)):
        if filename.endswith(".png") and (keyword is None or keyword in filename):
            img_path = os.path.join(directory_path, filename)
            print("adding image {}".format(img_path))
            images.append(imageio.imread(img_path))

    if keyword:
        imageio.mimsave(
            os.path.join(directory_path, 'anim_{}.gif'.format(keyword)), images)
    else:
        imageio.mimsave(os.path.join(directory_path, 'anim.gif'), images)


def create_image_grid(array, ncols=None):
    """
    """
    num_images, channels, cell_h, cell_w = array.shape
    if not ncols:
        ncols = int(np.sqrt(num_images))
    nrows = int(np.math.floor(num_images / float(ncols)))
    result = np.zeros((cell_h * nrows, cell_w * ncols, channels), dtype=array.dtype)
    for i in range(0, nrows):
        for j in range(0, ncols):
            result[i * cell_h:(i + 1) * cell_h, j * cell_w:(j + 1) * cell_w, :] = array[i * ncols + j].transpose(1, 2,
                                                                                                                 0)

    if channels == 1:
        result = result.squeeze()
    return result


def gan_save_samples(G, fixed_noise, iteration, opts):
    generated_images = G(fixed_noise)
    generated_images = to_data(generated_images)

    grid = create_image_grid(generated_images)

    # merged = merge_images(X, fake_Y, opts)
    path = os.path.join(opts.sample_dir, 'sample-{:06d}.png'.format(iteration))
    imageio.imwrite(path, grid)
    print('Saved {}'.format(path))


def cyclegan_save_samples(iteration, fixed_Y, fixed_X, G_YtoX, G_XtoY, opts):
    """Saves samples from both generators X->Y and Y->X.
    """
    fake_X = G_YtoX(fixed_Y)
    fake_Y = G_XtoY(fixed_X)

    X, fake_X = to_data(fixed_X), to_data(fake_X)
    Y, fake_Y = to_data(fixed_Y), to_data(fake_Y)

    merged = merge_images(X, fake_Y, opts)
    path = os.path.join(opts.sample_dir, 'sample-{:06d}-X-Y.png'.format(iteration))
    imageio.imwrite(path, merged)
    print('Saved {}'.format(path))

    merged = merge_images(Y, fake_X, opts)
    path = os.path.join(opts.sample_dir, 'sample-{:06d}-Y-X.png'.format(iteration))
    imageio.imwrite(path, merged)
    print('Saved {}'.format(path))

## 0.3. Data Loader

In [None]:
def get_emoji_loader(emoji_type, opts):
    """Creates training and test data loaders.
    """
    transform = transforms.Compose([
                    transforms.Scale(opts.image_size),
                    transforms.ToTensor(),
                    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                ])

    train_path = os.path.join('data/emojis', emoji_type)
    test_path = os.path.join('data/emojis', 'Test_{}'.format(emoji_type))

    train_dataset = datasets.ImageFolder(train_path, transform)
    test_dataset = datasets.ImageFolder(test_path, transform)

    train_dloader = DataLoader(dataset=train_dataset, batch_size=opts.batch_size, shuffle=True, num_workers=opts.num_workers)
    test_dloader = DataLoader(dataset=test_dataset, batch_size=opts.batch_size, shuffle=False, num_workers=opts.num_workers)

    return train_dloader, test_dloader

In [None]:
datadir = os.path.join('data')
if not os.path.exists(datadir):
    os.makedirs(datadir)
with tarfile.open('emojis.tar.gz') as archive:
    archive.extractall(datadir)

## 0.4. Training and Evaluation Code

In [None]:
def print_models(G_XtoY, G_YtoX, D_X, D_Y):
    """Prints model information for the generators and discriminators.
    """
    if G_YtoX:
        print("                 G_XtoY                ")
        print("---------------------------------------")
        print(G_XtoY)
        print("---------------------------------------")

        print("                 G_YtoX                ")
        print("---------------------------------------")
        print(G_YtoX)
        print("---------------------------------------")

        print("                  D_X                  ")
        print("---------------------------------------")
        print(D_X)
        print("---------------------------------------")

        print("                  D_Y                  ")
        print("---------------------------------------")
        print(D_Y)
        print("---------------------------------------")
    else:
        print("                 G                     ")
        print("---------------------------------------")
        print(G_XtoY)
        print("---------------------------------------")

        print("                  D                    ")
        print("---------------------------------------")
        print(D_X)
        print("---------------------------------------")


def create_model(opts):
    """Builds the generators and discriminators.
    """
    if opts.Y is None:
        ### GAN
        G = DCGenerator(noise_size=opts.noise_size, conv_dim=opts.g_conv_dim, spectral_norm=opts.spectral_norm)
        D = DCDiscriminator(conv_dim=opts.d_conv_dim, spectral_norm=opts.spectral_norm)

        print_models(G, None, D, None)

        if torch.cuda.is_available():
            G.cuda()
            D.cuda()
            print('Models moved to GPU.')
        return G, D
          
    else:
        ### CycleGAN
        G_XtoY = CycleGenerator(conv_dim=opts.g_conv_dim, init_zero_weights=opts.init_zero_weights)
        G_YtoX = CycleGenerator(conv_dim=opts.g_conv_dim, init_zero_weights=opts.init_zero_weights)
        D_X = DCDiscriminator(conv_dim=opts.d_conv_dim)
        D_Y = DCDiscriminator(conv_dim=opts.d_conv_dim)

        print_models(G_XtoY, G_YtoX, D_X, D_Y)

        if torch.cuda.is_available():
            G_XtoY.cuda()
            G_YtoX.cuda()
            D_X.cuda()
            D_Y.cuda()
            print('Models moved to GPU.')
        return G_XtoY, G_YtoX, D_X, D_Y


def train(opts):
    """Loads the data, creates checkpoint and sample directories, and starts the training loop.
    """

    # Create train and test dataloaders for images from the two domains X and Y
    dataloader_X, test_dataloader_X = get_emoji_loader(emoji_type=opts.X, opts=opts)
    if opts.Y:
        dataloader_Y, test_dataloader_Y = get_emoji_loader(emoji_type=opts.Y, opts=opts)

    # Create checkpoint and sample directories
    create_dir(opts.checkpoint_dir)
    create_dir(opts.sample_dir)

    # Start training
    if opts.Y is None:
        G, D = gan_training_loop(dataloader_X, test_dataloader_X, opts)
        return G, D
    else:
        G_XtoY, G_YtoX, D_X, D_Y = cyclegan_training_loop(dataloader_X, dataloader_Y, test_dataloader_X, test_dataloader_Y, opts)
        return G_XtoY, G_YtoX, D_X, D_Y


def print_opts(opts):
    """Prints the values of all command-line arguments.
    """
    print('=' * 80)
    print('Opts'.center(80))
    print('-' * 80)
    for key in opts.__dict__:
        if opts.__dict__[key]:
            print('{:>30}: {:<30}'.format(key, opts.__dict__[key]).center(80))
    print('=' * 80)


## 0.5. Additional Helper Modules

In [None]:
def sample_noise(batch_size, dim):
    """
    Generate a PyTorch Tensor of uniform random noise.

    Input:
    - batch_size: Integer giving the batch size of noise to generate.
    - dim: Integer giving the dimension of noise to generate.

    Output:
    - A PyTorch Tensor of shape (batch_size, dim, 1, 1) containing uniform
      random noise in the range (-1, 1).
    """
    return to_var(torch.rand(batch_size, dim) * 2 - 1).unsqueeze(2).unsqueeze(3)
  

def upconv(in_channels, out_channels, kernel_size, stride=2, padding=2, batch_norm=True, spectral_norm=False):
    """Creates a upsample-and-convolution layer, with optional batch normalization.
    """
    layers = []
    if stride>1:
        layers.append(nn.Upsample(scale_factor=stride))
    conv_layer = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=1, padding=padding, bias=False)
    if spectral_norm:
        layers.append(SpectralNorm(conv_layer))
    else:
        layers.append(conv_layer)
    if batch_norm:
        layers.append(nn.BatchNorm2d(out_channels))
    return nn.Sequential(*layers)


def conv(in_channels, out_channels, kernel_size, stride=2, padding=2, batch_norm=True, init_zero_weights=False, spectral_norm=False):
    """Creates a convolutional layer, with optional batch normalization.
    """
    layers = []
    conv_layer = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding, bias=False)
    if init_zero_weights:
        conv_layer.weight.data = torch.randn(out_channels, in_channels, kernel_size, kernel_size) * 0.001
            
    if spectral_norm:
        layers.append(SpectralNorm(conv_layer))
    else:
        layers.append(conv_layer)

    if batch_norm:
        layers.append(nn.BatchNorm2d(out_channels))
    return nn.Sequential(*layers)
  

class ResnetBlock(nn.Module):
    def __init__(self, conv_dim):
        super(ResnetBlock, self).__init__()
        self.conv_layer = conv(in_channels=conv_dim, out_channels=conv_dim, kernel_size=3, stride=1, padding=1)

    def forward(self, x):
        out = x + self.conv_layer(x)
        return out

#Part 1: DCGAN
In this section, we implement a *Deep Convolutional GAN* (DCGAN). A
DCGAN is simply a GAN that uses a convolutional neural network as the discriminator, and a
network composed of *transposed convolutions* as the generator. To implement the DCGAN, we need
to specify three things: 1) the generator, 2) the discriminator, and 3) the training procedure. Before implementing, we briefly go over each of these three components in their respective sections.

## 1.1. Spectral Norm Class

In [None]:
def l2normalize(v, eps=1e-12):
    return v / (v.norm() + eps)


class SpectralNorm(nn.Module):
    def __init__(self, module, name='weight', power_iterations=1):
        super(SpectralNorm, self).__init__()
        self.module = module
        self.name = name
        self.power_iterations = power_iterations
        if not self._made_params():
            self._make_params()

    def _update_u_v(self):
        u = getattr(self.module, self.name + "_u")
        v = getattr(self.module, self.name + "_v")
        w = getattr(self.module, self.name + "_bar")

        height = w.data.shape[0]
        for _ in range(self.power_iterations):
            v.data = l2normalize(torch.mv(torch.t(w.view(height,-1).data), u.data))
            u.data = l2normalize(torch.mv(w.view(height,-1).data, v.data))

        sigma = u.dot(w.view(height, -1).mv(v))
        setattr(self.module, self.name, w / sigma.expand_as(w))

    def _made_params(self):
        try:
            u = getattr(self.module, self.name + "_u")
            v = getattr(self.module, self.name + "_v")
            w = getattr(self.module, self.name + "_bar")
            return True
        except AttributeError:
            return False

    def _make_params(self):
        w = getattr(self.module, self.name)

        height = w.data.shape[0]
        width = w.view(height, -1).data.shape[1]

        u = Parameter(w.data.new(height).normal_(0, 1), requires_grad=False)
        v = Parameter(w.data.new(width).normal_(0, 1), requires_grad=False)
        u.data = l2normalize(u.data)
        v.data = l2normalize(v.data)
        w_bar = Parameter(w.data)

        del self.module._parameters[self.name]

        self.module.register_parameter(self.name + "_u", u)
        self.module.register_parameter(self.name + "_v", v)
        self.module.register_parameter(self.name + "_bar", w_bar)

    def forward(self, *args):
        self._update_u_v()
        return self.module.forward(*args)

## 1.2. GAN Generator
The generator of the DCGAN consists of a sequence of transpose convolutional layers that progressively upsample the input noise sample to generate a fake image. The generator's architecture is depicted in the diagram below:

![Fig1](https://drive.google.com/uc?id=1zUSTpqoEUkvxoQCHxGjm-ZvEJpUsr9_p)

In [None]:
class DCGenerator(nn.Module):
    def __init__(self, noise_size, conv_dim, spectral_norm=False):
        super(DCGenerator, self).__init__()

        self.conv_dim = conv_dim
        self.linear_bn = upconv(noise_size, conv_dim*4, 5, stride=4)
        self.upconv1 = upconv(conv_dim*4, conv_dim*2, 5)
        self.upconv2 = upconv(conv_dim*2, conv_dim, 5)
        self.upconv3 = upconv(conv_dim, 3, 5, batch_norm=False)

    def forward(self, z):
        """Generates an image given a sample of random noise.

            Input
            -----
                z: BS x noise_size x 1 x 1   -->  BSx100x1x1 (during training)

            Output
            ------
                out: BS x channels x image_width x image_height  -->  BSx3x32x32 (during training)
        """
        batch_size = z.size(0)
        
        out = F.relu(self.linear_bn(z)).view(-1, self.conv_dim*4, 4, 4)    # BS x 128 x 4 x 4
        out = F.relu(self.upconv1(out))  # BS x 64 x 8 x 8
        out = F.relu(self.upconv2(out))  # BS x 32 x 16 x 16
        out = F.tanh(self.upconv3(out))  # BS x 3 x 32 x 32
        
        out_size = out.size()
        if out_size != torch.Size([batch_size, 3, 32, 32]):
            raise ValueError("expect {} x 3 x 32 x 32, but get {}".format(batch_size, out_size))
        return out

## 1.3. GAN Discriminator
The discriminator of the DCGAN is now implemented with the architecture depicted below. 

![Fig2](https://drive.google.com/uc?id=1PU0NDxtnNLGCqflRtwqYe9t3YTO7cRec)

In [None]:
class DCDiscriminator(nn.Module):
    """Defines the architecture of the discriminator network.
       Note: Both discriminators D_X and D_Y have the same architecture in this assignment.
    """
    def __init__(self, conv_dim=64, spectral_norm=False):
        super(DCDiscriminator, self).__init__()

        self.conv1 = conv(in_channels=3, out_channels=conv_dim, kernel_size=5, stride=2, spectral_norm=spectral_norm)
        self.conv2 = conv(in_channels=conv_dim, out_channels=conv_dim*2, kernel_size=5, stride=2, spectral_norm=spectral_norm)
        self.conv3 = conv(in_channels=conv_dim*2, out_channels=conv_dim*4, kernel_size=5, stride=2, spectral_norm=spectral_norm)
        self.conv4 = conv(in_channels=conv_dim*4, out_channels=1, kernel_size=5, stride=2, padding=1, batch_norm=False, spectral_norm=spectral_norm)

    def forward(self, x):
        batch_size = x.size(0)

        out = F.relu(self.conv1(x))    # BS x 64 x 16 x 16
        out = F.relu(self.conv2(out))    # BS x 64 x 8 x 8
        out = F.relu(self.conv3(out))    # BS x 64 x 4 x 4

        out = self.conv4(out).squeeze()
        out_size = out.size()
        if out_size != torch.Size([batch_size,]):
            raise ValueError("expect {} x 1, but get {}".format(batch_size, out_size))
        return out

## 1.4. GAN Training Loop


We now implement the training loop for the DCGAN, which is an implementation of the pseudo-code below. Note that in our implementation of the discriminator update, we add a gradient penalty  term to the discriminator loss. This is a popular technique for stabilizing GAN training that can take different forms and is active research area in its effects on GAN training. [Gulrajani et al., 2017] [Kodali
et al., 2017] [Mescheder et al., 2018].

![Fig3](https://drive.google.com/uc?id=1oEHohkMzMjDQMs1iwCuAImkRBcP9tVks)

In [None]:
def gan_training_loop(dataloader, test_dataloader, opts):
    """Runs the training loop.
        * Saves checkpoint every opts.checkpoint_every iterations
        * Saves generated samples every opts.sample_every iterations
    """

    # Create generators and discriminators
    G, D = create_model(opts)

    g_params = G.parameters()  # Get generator parameters
    d_params = D.parameters()  # Get discriminator parameters

    # Create optimizers for the generators and discriminators
    g_optimizer = optim.Adam(g_params, opts.lr, [opts.beta1, opts.beta2])
    d_optimizer = optim.Adam(d_params, opts.lr * 2., [opts.beta1, opts.beta2])

    train_iter = iter(dataloader)

    test_iter = iter(test_dataloader)

    # Get some fixed data from domains X and Y for sampling. These are images that are held
    # constant throughout training, that allow us to inspect the model's performance.
    fixed_noise = sample_noise(100, opts.noise_size)  # # 100 x noise_size x 1 x 1

    iter_per_epoch = len(train_iter)
    total_train_iters = opts.train_iters

    losses = {"iteration": [], "D_fake_loss": [], "D_real_loss": [], "G_loss": []}

    gp_weight = 10

    try:
        for iteration in range(1, opts.train_iters + 1):

            # Reset data_iter for each epoch
            if iteration % iter_per_epoch == 0:
                train_iter = iter(dataloader)

            real_images, real_labels = train_iter.next()
            real_images, real_labels = to_var(real_images), to_var(real_labels).long().squeeze()

            # ones = Variable(torch.Tensor(real_images.shape[0]).float().cuda().fill_(1.0), requires_grad=False)

            for d_i in range(opts.d_train_iters):
                d_optimizer.zero_grad()

                # 1. Compute the discriminator loss on real images
                D_real_loss = (1/2)*torch.mean((D(real_images) - 1)**2)

                # 2. Sample noise
                noise = sample_noise(real_images.shape[0], opts.noise_size)

                # 3. Generate fake images from the noise
                fake_images = G(noise)
                
                # 4. Compute the discriminator loss on the fake images
                D_fake_loss = (1/2)*torch.mean((D(fake_images))**2)

                # ---- Gradient Penalty ----
                if opts.gradient_penalty:
                    alpha = torch.rand(real_images.shape[0], 1, 1, 1)
                    alpha = alpha.expand_as(real_images).cuda()
                    interp_images = Variable(alpha * real_images.data + (1-alpha) * fake_images.data, requires_grad=True).cuda()
                    D_interp_output = D(interp_images)

                    gradients = torch.autograd.grad(outputs=D_interp_output, inputs=interp_images,
                                                    grad_outputs=torch.ones(D_interp_output.size()).cuda(),
                                                    create_graph=True, retain_graph=True)[0]
                    gradients = gradients.view(real_images.shape[0], -1)
                    gradients_norm = torch.sqrt(torch.sum(gradients ** 2, dim=1) + 1e-12)

                    gp = gp_weight * gradients_norm.mean()
                else:
                    gp = 0.0
                # --------------------------
                
                # 5. Compute the total discriminator loss
                D_total_loss = D_real_loss + D_fake_loss

                D_total_loss.backward()
                d_optimizer.step()

            g_optimizer.zero_grad()
            
            # 1. Sample noise
            noise = sample_noise(real_images.shape[0], opts.noise_size)

            # 2. Generate fake images from the noise
            fake_images = G(noise)

            # 3. Compute the generator loss
            G_loss = torch.mean((D(fake_images) - 1)**2)

            G_loss.backward()
            g_optimizer.step()

            # Print the log info
            if iteration % opts.log_step == 0:
                losses['iteration'].append(iteration)
                losses['D_real_loss'].append(D_real_loss.item())
                losses['D_fake_loss'].append(D_fake_loss.item())
                losses['G_loss'].append(G_loss.item())
                print('Iteration [{:4d}/{:4d}] | D_real_loss: {:6.4f} | D_fake_loss: {:6.4f} | G_loss: {:6.4f}'.format(
                    iteration, total_train_iters, D_real_loss.item(), D_fake_loss.item(), G_loss.item()))

            # Save the generated samples
            if iteration % opts.sample_every == 0:
                gan_save_samples(G, fixed_noise, iteration, opts)

            # Save the model parameters
            if iteration % opts.checkpoint_every == 0:
                gan_checkpoint(iteration, G, D, opts)

    except KeyboardInterrupt:
        print('Exiting early from training.')
        return G, D

    plt.figure()
    plt.plot(losses['iteration'], losses['D_real_loss'], label='D_real')
    plt.plot(losses['iteration'], losses['D_fake_loss'], label='D_fake')
    plt.plot(losses['iteration'], losses['G_loss'], label='G')
    plt.legend()
    plt.savefig(os.path.join(opts.sample_dir, 'losses.png'))
    plt.close()
    return G, D

#Part 2: CycleGAN

Now, instead of developing a model for creating brand-new emojis, we shift our attention to a GAN model that takes in as input either an Apple or Windows version of an emoji and outputs the equivalent emoji in the other's style.

But first, we take a moment to discuss the broader context of this class of models. The core concept beind CycleGAN is that of image-to-image translation. That is, using a conditional GAN to learn a mapping from input to output images. The loss functions of these approaches generally include extra terms to constrain the types of images that are generated.

CycleGAN is a recently introduced method for image-to-image translation that enbles us to use un-paired traning data, meaning that we can learn to translate images from one domain to another without having an exact correspondence between the individual images in both domains. The diagram below outlines the core CycleGAN components.

![Fig4](https://drive.google.com/uc?id=1ja7EC1IrP67a2FIFQuTO2Fqff9211CNn)

## 2.1. CycleGAN Generator

The generator in the CycleGAN has layers that implement three stages of computation: 1) the first
stage *encodes* the input via a series of convolutional layers that extract the image features; 2) the
second stage then *transforms* the features by passing them through one or more *residual blocks*;
and 3) the third stage *decodes* the transformed features using a series of transpose convolutional
layers, to build an output image of the same size as the input.

The residual block used in the transformation stage consists of a convolutional layer, where the
input is added to the output of the convolution. This is done so that the characteristics of the
output image (e.g., the shapes of objects) do not differ too much from the input. 

The below diagram depicts the generator's architecture. Note that we implement two generators in the CycleGAN model ($G_{X→Y}$ and $G_{Y→X}$) such that we are able to produce Apple → Windows and Windows → Apple translations. Both generators are identical in structure, however.


![Fig5](https://drive.google.com/uc?id=1zwiQxTMTs0gIL-patIG8mIoVkPGN9TpN)

In [None]:
class CycleGenerator(nn.Module):
    """Defines the architecture of the generator network.
       Note: Both generators G_XtoY and G_YtoX have the same architecture in this assignment.
    """
    def __init__(self, conv_dim=64, init_zero_weights=False):
        super(CycleGenerator, self).__init__()

        # 1. Define the encoder part of the generator (that extracts features from the input image)
        self.conv1 = conv(in_channels=3, out_channels=conv_dim, kernel_size=5, init_zero_weights=init_zero_weights)
        self.conv2 = conv(in_channels=conv_dim, out_channels=conv_dim*2, kernel_size=5, init_zero_weights=init_zero_weights)

        # 2. Define the transformation part of the generator
        self.resnet_block  = ResnetBlock(conv_dim*2)

        # 3. Define the decoder part of the generator (that builds up the output image from features)
        self.upconv1 = upconv(in_channels=conv_dim*2, out_channels=conv_dim, kernel_size=5)
        self.upconv2 = upconv(in_channels=conv_dim, out_channels=3, kernel_size=5, batch_norm=False)

    def forward(self, x):
        """Generates an image conditioned on an input image.

            Input
            -----
                x: BS x 3 x 32 x 32

            Output
            ------
                out: BS x 3 x 32 x 32
        """
        batch_size = x.size(0)
        out = F.relu(self.conv1(x))            # BS x 32 x 16 x 16
        out = F.relu(self.conv2(out))          # BS x 64 x 8 x 8
        out = F.relu(self.resnet_block(out))   # BS x 64 x 8 x 8
        out = F.relu(self.upconv1(out))        # BS x 32 x 16 x 16
        out = F.tanh(self.upconv2(out))        # BS x 3 x 32 x 32
        out_size = out.size()
        
        if out_size != torch.Size([batch_size, 3, 32, 32]):
            raise ValueError("expect {} x 3 x 32 x 32, but get {}".format(batch_size, out_size))

        return out

## 2.2. CycleGAN Training Loop

We now implement the training loop for the CycleGAN model as per the pseudo-code below. Training the CycleGAN involves using a *cycle consistency loss* (which gives this model its name) to constrain the model. The idea is that when we
translate an image from domain $X$ to domain $Y$ , and then translate the generated image *back* to
domain $X$, the result should look like the original image that we started with.

The cycle consistency component of the loss is the L1 distance between the input images and
their *reconstructions* obtained by passing through both generators in sequence (i.e., from domain
$X$ to $Y$ via the $X → Y$ generator, and then from domain $Y$ back to $X$ via the $Y → X$ generator).
The cycle consistency loss for the $Y → X → Y$ cycle is expressed as follows:

<center>$\lambda_{\text {cycle }} \mathcal{J}_{\text {cycle }}^{(Y \rightarrow X \rightarrow Y)}=\lambda_{\text {cycle }} \frac{1}{m} \sum_{i=1}^{m}\left\|y^{(i)}-G_{X \rightarrow Y}\left(G_{Y \rightarrow X}\left(y^{(i)}\right)\right)\right\|_{1}$,</center>

where $\lambda_{\text {cycle }}$ is a scalar hyper-parameter balancing the two loss terms: the cycle consistant loss and
the GAN loss. The loss for the $X → Y → X$ cycle is analogous.

![Fig5](https://drive.google.com/uc?id=1z7C0Gk0vS86MIuShSIqX4mMEKe0AhwEp)

In [None]:
def cyclegan_training_loop(dataloader_X, dataloader_Y, test_dataloader_X, test_dataloader_Y, opts):
    """Runs the training loop.
        * Saves checkpoint every opts.checkpoint_every iterations
        * Saves generated samples every opts.sample_every iterations
    """

    # Create generators and discriminators
    G_XtoY, G_YtoX, D_X, D_Y = create_model(opts)

    g_params = list(G_XtoY.parameters()) + list(G_YtoX.parameters())  # Get generator parameters
    d_params = list(D_X.parameters()) + list(D_Y.parameters())  # Get discriminator parameters

    # Create optimizers for the generators and discriminators
    g_optimizer = optim.Adam(g_params, opts.lr, [opts.beta1, opts.beta2])
    d_optimizer = optim.Adam(d_params, opts.lr, [opts.beta1, opts.beta2])

    iter_X = iter(dataloader_X)
    iter_Y = iter(dataloader_Y)

    test_iter_X = iter(test_dataloader_X)
    test_iter_Y = iter(test_dataloader_Y)

    # Get some fixed data from domains X and Y for sampling. These are images that are held
    # constant throughout training, that allow us to inspect the model's performance.
    fixed_X = to_var(test_iter_X.next()[0])
    fixed_Y = to_var(test_iter_Y.next()[0])

    iter_per_epoch = min(len(iter_X), len(iter_Y))

    try:
        for iteration in range(1, opts.train_iters+1):

            # Reset data_iter for each epoch
            if iteration % iter_per_epoch == 0:
                iter_X = iter(dataloader_X)
                iter_Y = iter(dataloader_Y)

            images_X, labels_X = iter_X.next()
            images_X, labels_X = to_var(images_X), to_var(labels_X).long().squeeze()

            images_Y, labels_Y = iter_Y.next()
            images_Y, labels_Y = to_var(images_Y), to_var(labels_Y).long().squeeze()


            # ============================================
            #            TRAIN THE DISCRIMINATORS
            # ============================================

            # Train with real images
            d_optimizer.zero_grad()

            # 1. Compute the discriminator losses on real images
            D_X_loss = torch.mean((D_X(images_X) - 1)**2)
            D_Y_loss = torch.mean((D_Y(images_Y) - 1)**2)

            d_real_loss = D_X_loss + D_Y_loss
            d_real_loss.backward()
            d_optimizer.step()

            # Train with fake images
            d_optimizer.zero_grad()

            # 2. Generate fake images that look like domain X based on real images in domain Y
            fake_X = G_YtoX(images_Y)

            # 3. Compute the loss for D_X
            D_X_loss = torch.mean(D_X(fake_X)**2)

            # 4. Generate fake images that look like domain Y based on real images in domain X
            fake_Y = G_XtoY(images_X)

            # 5. Compute the loss for D_Y
            D_Y_loss = torch.mean(D_Y(fake_Y)**2)

            d_fake_loss = D_X_loss + D_Y_loss
            d_fake_loss.backward()
            d_optimizer.step()

            # =========================================
            #            TRAIN THE GENERATORS
            # =========================================

            g_optimizer.zero_grad()

            # 1. Generate fake images that look like domain X based on real images in domain Y
            fake_X = G_YtoX(images_Y)

            # 2. Compute the generator loss based on domain X
            g_loss = torch.mean((D_X(fake_X) - 1)**2)

            reconstructed_Y = G_XtoY(fake_X)
            # 3. Compute the cycle consistency loss (the reconstruction loss)
            cycle_consistency_loss = torch.mean(torch.sum(torch.abs(images_Y - reconstructed_Y), [1,2,3]))

            g_loss += opts.lambda_cycle * cycle_consistency_loss

            g_loss.backward()
            g_optimizer.step()
            g_optimizer.zero_grad()

            # 1. Generate fake images that look like domain Y based on real images in domain X
            # fake_Y = ...
            fake_Y = G_XtoY(images_X)

            # 2. Compute the generator loss based on domain Y
            # g_loss = ...
            g_loss = torch.mean((D_Y(fake_Y) - 1)**2)

            reconstructed_X = G_YtoX(fake_Y)
            # 3. Compute the cycle consistency loss (the reconstruction loss)
            # cycle_consistency_loss = ...
            cycle_consistency_loss = torch.mean(torch.sum(torch.abs(images_X - reconstructed_X),[1,2,3]))

            g_loss += opts.lambda_cycle * cycle_consistency_loss

            g_loss.backward()
            g_optimizer.step()


            # Print the log info
            if iteration % opts.log_step == 0:
                print('Iteration [{:5d}/{:5d}] | d_real_loss: {:6.4f} | d_Y_loss: {:6.4f} | d_X_loss: {:6.4f} | '
                    'd_fake_loss: {:6.4f} | g_loss: {:6.4f}'.format(
                      iteration, opts.train_iters, d_real_loss.item(), D_Y_loss.item(),
                      D_X_loss.item(), d_fake_loss.item(), g_loss.item()))


            # Save the generated samples
            if iteration % opts.sample_every == 0:
                cyclegan_save_samples(iteration, fixed_Y, fixed_X, G_YtoX, G_XtoY, opts)


            # Save the model parameters
            if iteration % opts.checkpoint_every == 0:
                cyclegan_checkpoint(iteration, G_XtoY, G_YtoX, D_X, D_Y, opts)

    except KeyboardInterrupt:
        print('Exiting early from training.')
        return G_XtoY, G_YtoX, D_X, D_Y

    return G_XtoY, G_YtoX, D_X, D_Y


# Part 3: Training Our GANs


In this section we train and observe the results of the DCGAN and CycleGAN models that we implemented above.

## 3.1. Training DCGAN

### 3.1.1. Training DCGAN without Gradient Penalty
We first train a DCGAN without a gradient penalty term in the discriminator loss.

In [None]:
SEED = 11

# Set the random seed manually for reproducibility.
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed(SEED)


args = AttrDict()
args_dict = {
              'image_size':32, 
              'g_conv_dim':32, 
              'd_conv_dim':64,
              'noise_size':100,
              'num_workers': 0,
              'train_iters':20000,
              'X':'Windows',  # options: 'Windows' / 'Apple'
              'Y': None,
              'lr':0.00025,
              'beta1':0.5,
              'beta2':0.999,
              'batch_size':32, 
              'checkpoint_dir': 'results/checkpoints_gan',
              'sample_dir': 'results/samples_gan',
              'load': None,
              'log_step':200,
              'sample_every':200,
              'checkpoint_every':9999,
              'spectral_norm': False,
              'gradient_penalty': False,
              'd_train_iters': 1
}
args.update(args_dict)

print_opts(args)
G, D = train(args)

Now that we've finished training our model, let's take a look at the results!

We first generate a gif that illustrates samples of our model's generated emojis over the course of its training every 200 iterations (total of 20000 iterations).

We then depict side-by-side samples of the generated emojis after the $200^{th}$, $1000^{th}$, $5000^{th}$, $15000^{th}$,and $20000^{th}$ iterations for a closer look at the progression of our model's emoji-generating capabilities

In [None]:
generate_gif("results/samples_gan")

In [None]:
Image(open('results/samples_gan/anim.gif','rb').read())

In [None]:
img1 = 'results/samples_gan/sample-000200'
img2 = 'results/samples_gan/sample-001000'
img3 = 'results/samples_gan/sample-005000'
img4 = 'results/samples_gan/sample-015000'
img5 = 'results/samples_gan/sample-020000'

images = [img1, img2, img3, img4, img5]

f, axs = plt.subplots(1,4,figsize=(40,40))

for i in range(len(images)):
  img = cv2.imread(images[i] +".png")
  img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  plt.subplot(1, 5, i+1)
  plt.imshow(img)
  plt.title(images[i][-6:])
  plt.axis('off')


We can clearly see that our DCGAN model has made significant progress towards generating new emojis from scratch over the course of its training. After about 200 iterations, the model barely surpasses outputs of indiscernable noise. We can see clear (albiet meaningless) patterns of horizontal and vertical lines of different colours. After 1000 iterations, the model beings to output variations in the shape of emojis, however the emojis are very noisy and visual details are not discernable. The output after 5000 iterations is the most interesting from the perspective of emoji clarity, uniqueness of outpus, and colour. While we begin to see clear emoji details, we still note the presence of artifacts. After both 15000 and 20000 iterations, it appears that the quality of generated emojis decreased as the same handful of emojis keep reappearing and they appear to be less detailed and riddled with more artifacts when compared to the output after 5000 iterations.

In the next section, we attempt to train another DCGAN model in hopes of higher stability in the training output (if all goes well, we should expect to see higher quality emojis after 15000 and 20000 iterations).

### 3.1.2. Training DCGAN with Gradient Penalty
We now train a new DCGAN model, but this time with the gradient penalty term added to the discriminator network's loss function. The motivation for this is to promote stability in the model's training. After we train the model, we observe the generated emojis like the previous section and then compare the stability of training between both models.

In [None]:
SEED = 11

# Set the random seed manually for reproducibility.
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed(SEED)


args = AttrDict()
args_dict = {
              'image_size':32, 
              'g_conv_dim':32, 
              'd_conv_dim':64,
              'noise_size':100,
              'num_workers': 0,
              'train_iters':20000,
              'X':'Windows',  # options: 'Windows' / 'Apple'
              'Y': None,
              'lr':0.00025,
              'beta1':0.5,
              'beta2':0.999,
              'batch_size':32, 
              'checkpoint_dir': 'results/checkpoints_gan',
              'sample_dir': 'results/samples_gan_with_gradpenalty',
              'load': None,
              'log_step':200,
              'sample_every':200,
              'checkpoint_every':9999,
              'spectral_norm': False,
              'gradient_penalty': True,
              'd_train_iters': 1
}
args.update(args_dict)

print_opts(args)
G, D = train(args)



In [None]:
generate_gif("results/samples_gan_with_gradpenalty")

In [None]:
Image(open('results/samples_gan_with_gradpenalty/anim.gif','rb').read())

In [None]:
img1 = 'results/samples_gan_with_gradpenalty/sample-000200'
img2 = 'results/samples_gan_with_gradpenalty/sample-001000'
img3 = 'results/samples_gan_with_gradpenalty/sample-005000'
img4 = 'results/samples_gan_with_gradpenalty/sample-015000'
img5 = 'results/samples_gan_with_gradpenalty/sample-020000'

images = [img1, img2, img3, img4, img5]

f, axs = plt.subplots(1,4,figsize=(20,20))

for i in range(len(images)):
  img = cv2.imread(images[i] +".png")
  img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  plt.subplot(1, 5, i+1)
  plt.imshow(img)
  plt.title(images[i][-6:])
  plt.axis('off')

Like the results of the first DCGAN's output, the generated emojis after 200 iterations are barely a step above random noise. We begin to see some variation in emoji shapes and details by 1000 iterations. Some facial features such as eyes and mouths can even be observed at this point, but there are still quite a bit of artifacts. By 5000 iterations, we begin to observe some highly distinguised emojis. Like the DCGAN without a gradient penalty term, we also observe here the recurrence of some emojis in the 15000 and 20000 iteration samples, although with slightly higher quality. In all cases, while our models have clearly "learned" to create emojis in terms of colour and style, they certainly aren't the kinds of emojis that have any resembelence to meaningful symbols. Perhaps if we trained our models for a few thousand more iterations, we could begin to observe some emojis of symbolic value.

We analyze one more thing before moving on to CycleGAN. The motivation for adding a gradient penalty term is to stabilize training (i.e. reduce large oscillations in the training loss over training iterations). We now observe if our gradient penalty implementation succeeded in this.

In [None]:
loss_gan = 'results/samples_gan/losses'
loss_gan_with_gradpenalty = 'results/samples_gan_with_gradpenalty/losses'

plt1, plt2 = cv2.imread(loss_gan +".png"), cv2.imread(loss_gan_with_gradpenalty+".png")
plt1, plt2 = cv2.cvtColor(plt1, cv2.COLOR_BGR2RGB), cv2.cvtColor(plt2, cv2.COLOR_BGR2RGB)

f, (ax1, ax2) = plt.subplots(1,2,figsize=(20,20))

plt.subplot(1, 2, 1)
plt.imshow(plt1)
plt.xlabel("Iterations")
plt.ylabel("Loss")
plt.title('GAN without Grad Penalty')
plt.subplot(1, 2, 2)
plt.imshow(plt2)
plt.xlabel("Iterations")
plt.ylabel("Loss")
plt.title('GAN with Grad Penalty')
plt.show()


We focus on the training loss of the generator network (green line). While the loss for the DCGAN model with a gradient penalty did have much larger amplitudes in the oscillation of the loss early on in training, it can be clearly observed that the oscillations were much lower in amplitude versus the model without gradient penalty as the training continued. It is therefore shown that adding a gradient penalty did indeed help with stabilizing training in this instance.

## 3.2. Training CycleGAN

We now move on to training our CycleGAN model:

In [None]:
SEED = 4

np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed(SEED)


args = AttrDict()
args_dict = {
              'image_size':32, 
              'g_conv_dim':32, 
              'd_conv_dim':32,
              'init_zero_weights': False,
              'num_workers': 0,
              'train_iters':5000,
              'X':'Apple',
              'Y':'Windows',
              'lambda_cycle': 0.75,
              'lr':0.0003,
              'beta1':0.3,
              'beta2':0.999,
              'batch_size':32, 
              'checkpoint_dir': 'results/checkpoints_cyclegan',
              'sample_dir': 'results/samples_cyclegan',
              'load': None,
              'log_step':200,
              'sample_every':200,
              'checkpoint_every':1000
}
args.update(args_dict)


print_opts(args)
G_XtoY, G_YtoX, D_X, D_Y = train(args)


Now that we've finished training, lets take a look at the results!

In [None]:
generate_gif("results/samples_cyclegan", keyword='X-Y')
generate_gif("results/samples_cyclegan", keyword='Y-X')

First, we look at our CycleGAN's generated Windows-style emoji given an Apple equivalent:

In [None]:
Image(open('results/samples_cyclegan/anim_X-Y.gif','rb').read())

Second, we look at our CycleGAN's generated Windows-style emoji given an Apple equivalent:

In [None]:
Image(open('results/samples_cyclegan/anim_Y-X.gif','rb').read())

We can clearly see from the 2 gifs above that our CycleGAN model learned quite effectively how to generate decent quality emojis (we do still note some artifacts however). While there are some minor discernable style differences between any given pair of emojis, they are largely very similar in style and so this model primarily learned how to reproduce as opposed to translate emojis. This model was trained with a `lambda_cycle` of 0.7. 5We note that when we decreased the `lambda_cycle` to about 0.015, the generated emojis were closer to the target style, although lower in quality (less distinguished features and more artifacts. One possible interpretation for this is that by increasing the `lambda_cycle`, the penalty for an incorrect translation of an emoji back to the original emoji also increases as well. So with a `lambda_cycle` of 0.75, we are able to generate higher fidelity emojis because of the higher penalty for a mismatch with the original. However, this came at the expense of a diminished distinction between the two styles of emojis.

# References

* Jimmy Ba.  Attention-Based Neural Machine Translation. *University of Toronto, CSC413*, 2020.

* Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville.
Improved training of wasserstein gans. In *Advances in neural information processing systems*,
pages 5767–5777, 2017.

* Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt Kira. On convergence and stability of
gans. *arXiv preprint arXiv:1705.07215*, 2017.

* Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. Which training methods for gans do
actually converge? *arXiv preprint arXiv:1801.04406*, 2018.

* Hoang Thanh-Tung, Truyen Tran, and Svetha Venkatesh. Improving generalization and stability
of generative adversarial networks. *arXiv preprint arXiv:1902.03984*, 2019.
