# TOC

__Chapter 7 - Generative Networks__

1. [Import](#Import)
1. [Neural style transfer](#Neural-style-transfer)
    1. [Loading the data](#Loading-the-data)
    1. [Creating the VGG model](#Creating-the-VGG-model)
    1. [Content loss](#Content-loss)
    1. [Style loss](#Style-loss)
    1. [Extracting the losses](#Extracting-the-losses)
    1. [Creating loss function for each layers](#Creating-loss-function-for-each-layers)
    1. [Creating the optimizer](#Creating-the-optimizer)
    1. [Training](#Training)
1. [Generative adversarial networks](#Generative-adversarial-networks)
    1. [Deep convolutional GAN](#Deep-convolutional-GAN)
        1. [Defining the generator network](#Defining-the-generator-network)
        1. [Defining the discriminator network](#Defining-the-discriminator-network)


# Import

<a id = 'Import'></a>

In [None]:
# standard libary and settings
import os
import sys
import importlib
import itertools
from PIL import Image
from glob import glob
import warnings

warnings.simplefilter("ignore")
from IPython.core.display import display, HTML

display(HTML("<style>.container { width:95% !important; }</style>"))

# data extensions and settings
import numpy as np

np.set_printoptions(threshold=np.inf, suppress=True)
import pandas as pd

pd.set_option("display.max_rows", 500)
pd.set_option("display.max_columns", 500)
pd.options.display.float_format = "{:,.6f}".format

# pytorch tools
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torch.autograd import Variable
from torchvision import datasets, models, transforms

# visualization extensions and settings
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline
sns.set_style("whitegrid")

# Neural style transfer

Given a content image and a style image, generate a new image that combines the content of the content image and the style of the style image.

The style of an image is captured across multiple layers in a CNN by a technique called gram matrix. This calculates the correlation between the features maps captures across each layer. Similarly styled images have similar values for a gram matrix.



<a id = 'Neural-style-transfer'></a>

## Loading the data



<a id = 'Loading-the-data'></a>

In [None]:
# fix the image size
imsize = 512
is_cuda = torch.cuda.is_available()

# convert image for training with VGG model
prep = transforms.Compose(
    [
        transforms.Resize(imsize),
        transforms.ToTensor(),
        transforms.Lambda(lambda x: x[torch.LongTensor([2, 1, 0])]),  # turn to BGR
        transforms.Normalize(
            mean=[0.40760392, 0.45795686, 0.48501961], std=[1, 1, 1]
        ),  # subtract imagenet mean
        transforms.Lambda(lambda x: x.mul_(255)),
    ]
)

# convert the generated image back to a format that can be visualized
postpa = transforms.Compose(
    [
        transforms.Lambda(lambda x: x.mul_(1.0 / 255)),
        transforms.Normalize(mean=[0.40760392, 0.45795686, 0.48501961], std=[1, 1, 1]),
        transforms.Lambda(lambda x: x[torch.LongTensor([2, 1, 0])]),  # turn to RGB
    ]
)
postpb = transforms.Compose([transform.ToPILImage()])

# ensure data in the image does not cross the permissible range of values
def postp(tensor):
    t = postpa(tensor)
    t[t > 1] = 1
    t[t < 0] = 0
    img = postpb(t)
    return img


# ease data loading
def image_loader(image_name):
    image = Image.open(image_name)
    image = Variable(prep(image))
    image = image.unsqueeze(0)
    return image

In [None]:
# load style and conversion image
style_img = image_loader("images/vangogh_starry_night.jpg")
convert_img = image_loader("images/tuebinge_neckarfront.jpg")

opt_img = Variable(content_img.data.clone(), requires_grad=True)

## Creating the VGG model



<a id = 'Creating-the-VGG-model'></a>

In [None]:
# create a VGG model, grabbing only the convolution block (features) and freeze the parameters
vgg = vgg19(pretrained=True).features

for param in vgg.parameters():
    param.requires_grad = False

## Content loss


<a id = 'Content-loss'></a>

In [None]:
#
target_layer = dummy_fn(content_img)
noise_layer = dummy_fn(noise_img)
criterion = nn.MSELoss()
content_loss = criterion(target_layer, noise_layer)

## Style loss

Style loss is the MSE of the gram matrix generated for each feature map. Envision a feature map with dimensions representing bacth_size by color channels and values (which in this example is itself a 3 by 3 window). To calculate the gram matrix, the 9 values in each channel are flattened into a 9 value vector, and then the correlation coefficient is calculated by multiplying the flattened vector by its transpose.

The class below is written in a way so that it can be used like another PyTorch layer.  First, the batch, channel, height and width are maintained, and then the features are reshaped such that the batch and channel dimensions remain intact, and the values are flattened along the height and width dimension. The gram matrix is calculated using the PyTorch batch matrix multiplication function torch.bmm(), which will multiply the flattened values with its transposed vector. The final step normalizes the values of the gram matrix by dividing it by the number of elements. Without this, a feature map with an especially high number of values would tend to dominate the score.

The second class below calculates the style loss, which is also implemented as a PyTorch layer. It calculates the MSE between the input image gram matrix and the style image gram matrix

<a id = 'Style-loss'></a>

In [None]:
# creat gram matrix class
class GramMatrix(nn.Module):
    def forward(self, input):
        b, c, h, w = input.size()
        features = input.view(b, c, h * w)
        gram_matrix = torch.bmm(features, features.tranpose(1, 2))
        gram_matrix.div_(h * w)
        return gram_matrix


# create style loss class
class StyleLoss(nn.Module):
    def forward(self, inputs, targets):
        out = nn.MSELoss()(GramMatrix()(inputs), targets)
        return out

## Extracting the losses

Just as activations can be extracted from convolution layers use the register_forward_hook(), we can extract losses of different convolutional layers required to calculate style loss and content loss. The key difference is that rather than extracting from a single layer, we need to extract outputs from several layers.

In the class below, the init method takes in the model on which we will call register_forward_hook() as well as the layer ID number for layers from which we will extract the outputs. The init method's for loop iterates through the layer IDs and registers the forward hook required the pull outputs.

hook_fn is passed to register_forward_hook() and is called by PyTorch after the current layer is registered. Inside the function, the output is captured and stored in the features array.

Lastly, the remove function is called to clear the outputs captured, otherwise this process may result in memory issues.

The extract_layers function extracts the outputs for the style and content images. Inside this funciton, we call LayerActivations and pass in the model and the layer numbers. We follow this by ensuring we have an empty list. The an image is passed through the model, and we will review the outputs generated in the features array 


<a id = 'Extracting-the-losses'></a>

In [None]:
# create layer activations class for capturing loss at various modelslayer
class LayerActivations:
    features = []

    def __init__(self, model, layer_nums):
        self.hooks = []
        for layer_num in layer_nums:
            self.hooks.append(model[layer_num].register_forward_hook(self.hook_fn))

    def hook_fn(self, module, input, output):
        self.features.append(output)

    def remove(self):
        for hook in self.hooks:
            hook.remove()


# function for extracting outputs form the images
def extract_layers(layers, img, model=None):
    la = LayerActivations(model, layers)
    la.features = []
    out = model(img)
    la.remove()
    return la.features

In [None]:
# the outputs needs to be detached from the graphs that created them
content_targets = extract_layers(content_layers, content_img, model = vgg)
style_targets = extract_layers(style_layers, style_img, model = vgg)

content_targets [t.detach() for t in content_targets]
style_targets = [GramMatrix(t).detach() for t in style_targets]

# add all targets into one list
target = style_targets + content_targets

# specify layers to be extracted
style_layers = [1, 6, 11, 20, 25]
content_layers = [21]
loss_layers = style_layers + content_layers


In [None]:
# the optimizer needs a single scalar to minimize, so the losses frome each layer are summed
style_weights = [1e3 / n ** 2 for n in [64, 128, 256, 512, 512]]
content_weights = [1e0]
weights = style_weights + content_weights

In [None]:
# review layers selected
print(vgg)

## Creating loss function for each layers

We need to create the loss layers for the separate style losses and content losses. The variable loss_fns is a list containing several style loss objects and content loss objects that are based on the lengths of the arrays created.



<a id = 'Creating-loss-function-for-each-layers'></a>

In [None]:
# create style loss and content loss objects
loss_fns = [StyleLoss()] * len(style_layers) + [nn.MSELoss()] * len(content_layers)

## Creating the optimizer

An optimizer typically receives the parameters of the model, but in this case we are using VGG models as feature extracts, and therefore cannot pass the VGG parameters. Instead, we will provide the parameters of the opt_img variable. These are the parameters that will be optimized to make the image have the required content and style.



<a id = 'Creating-the-optimizer'></a>

In [None]:
# create optimizer object
optimizer = optim.LBFGS([opt_img])

## Training

This training method will calculate loss for multiple layers. Each time the optimizer is called, it will chang the input image so that the content and style gets nearer to the target's content and style.

In the function below, each iteration involves calcualting the output from different layers of the VGG model using extract_layers. The only values that change here are the values of the style image (opt_img). Once the outputs are calculated, we calcualte the losses by iterations through the outputs and passing them to the associated loss functions along with the targets. The losses are summed and the backward function is called.


<a id = 'Training'></a>

In [None]:
# train process
max_iter = 500
show_iter = 50
n_iter = [0]

while n_iter[0] <= max_iter:

    def closure():
        optimizer.zero_grad()

        out = extract_layers(loss, opt_img, model=vgg)
        layer_losses = [
            weights[a] * loss_fns[a](A, targets[a]) for a, A in enumerate(out)
        ]
        loss = sum(layer_losses)
        loss.backward()
        n_iter[0] += 1

        print(loss)

        if n_iter[0] % show_iter == (show_iter - 1):
            print("Iteration: {}, loss: {}".format(n_iter[0] + 1, loss.data[0]))

    return loss

optimizer.step(closure)

# Generative adversarial networks

In a general sense, GANs address the problem of unsupervise learning by training two deep neural networks - one is called the generator and the other is the discriminator. The networks compete with each other and through that competition, both become better at the tasks they perform.

The generator network can also be thought of as the counterfeiter, and the discriminator is the police. The counterfeiter shows the police fake money, and the police identifies it as fake and explains to the counterfeiter why it's fake. With that information in hand, the counterfeiter makes more fake money based on the police feedback. The police again find it to be fake and explains why. This process repeats until the police is unable to recognize the money as fake. The end result is a generator that creates fake images which are quite similar to the real images, and a classifier that is good at identifying a fake image from a real image.



<a id = 'Generative-adversarial-networks'></a>

## Deep convolutional GAN

Some of the key components of a DCGAN include:

- A generator network, which maps a vector of some fixed dimension to images of a certain shape. Our shape will be 3 by 64 by 64

- a discriminaotr network, which takes an input image either created by the generator or from the actual dataset, and maps a score estimating if an input image is real or fake

- Loss functions for the generator and discriminator

- An optimizer

- A training pipeline


<a id = 'Deep-convolutional-GAN'></a>

### Defining the generator network

The generator receives a random vector of a fixed dimension as an input and applies a process of transposed convolutions, batch normalization and ReLU activations. The results in an image with the required size.

In the implementation below, the model takes an input of tensor size nz and then passes it on to a transposed convolution which maps the input to the image size that it needs to generate. The forward function moves the input through the sequential module and returns the output.

The last layer is a tanh layer, which limits the ranage of values that the network can generate.

The model is initialized with weights defined in the paper from which this chapter draws its inspiration, but the weight can also be randomly initialized. In this example, the weight function is passed to the generator object. The weights are intitialized different in the convolution and BatchNorm layers.

<a id = 'Defining-the-generator-network'></a>

In [None]:
# function that sets weight in accordance with the reference paper
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find("Conv") != -1:
        m.weight.data.normal(0.0, 0.02)
    elif classname.find("BatchNorm") != -1:
        m.weight.data.normal_(1.0, 0.02)
        m.bias.data.fill_(0)


# class defining discriminator network
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()

        self.main = nn.Sequential(
            # input is Z, going into a convolution step
            nn.ConvTranspose2d(nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True)
            # state size = (ngf * 8) by 4 by 4
            ,
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True)
            # state size = (ngf * 4) by 8 by 8
            ,
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True)
            # state size = (ngf * 2) by 16 by 16
            ,
            nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True)
            # state size = (ngf) by 32 by 32
            ,
            nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
            nn.Tanh(True)
            # state size = (nc) by 64 by 64
        )

    def forward(self, input):
        output = self.main(input)
        return output


netG = Generator()
netG.apply(weights_init)
print(netG)

### Defining the discriminator network



<a id = 'Defining-the-discriminator-network'></a>

In [None]:
# class defining discriminator network
class Discriminator(nn.Module):
    def __init__(self):
        super(_netD, self).__init__()

        self.main = nn.Sequential(
            # input is Z, going into a convolution step
            nn.ConvTranspose2d(nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True)
            # state size = (ngf * 8) by 4 by 4
            ,
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True)
            # state size = (ngf * 4) by 8 by 8
            ,
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True)
            # state size = (ngf * 2) by 16 by 16
            ,
            nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True)
            # state size = (ngf) by 32 by 32
            ,
            nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
            nn.Tanh(True)
            # state size = (nc) by 64 by 64
        )

    def forward(self, input):
        output = self.main(input)
        return output


netG = Generator()
netG.apply(weights_init)
print(netG)