
# Introduction

In this lab we will explore neural style transfer in artwork!

This lab is adapted from Section 8.3 of Chollet.  The code is also available at https://keras.io/examples/neural_style_transfer/.

# Set Runtime Type

This lab will take literally forever if you don't set your runtime type to GPU or TPU, so go ahead and do that now :)

# Imports

In [1]:
%tensorflow_version 1.x

import os
from google.colab import drive

from __future__ import print_function
from keras.preprocessing.image import load_img, save_img, img_to_array
import numpy as np
from scipy.optimize import fmin_l_bfgs_b
import time
import argparse

from keras.applications import vgg19
from keras import backend as K

import matplotlib.pyplot as plt

TensorFlow 1.x selected.


Using TensorFlow backend.


# Set up google drive

To apply neural style transfer, we need a content image and a style image.  Of course our content image will be a picture of Benedict.  Our style image will be a painting by Georgia O'Keeffe.  (You can also experiment with other content and style images if you prefer!)

As usual, I have shared the necessary files with you in a google drive folder.  To get the data into colab, do these steps:

1. Sign into drive.google.com
2. Click on "Shared with me" on the left side of the screen
3. Right click on the stat344ne_style_transfer folder and select "Add Shortcut to Drive"
4. Run the code cell below and click on the link that is displayed.  It will pop up a new browser tab where you have to authorize Colab to access your google drive.  Then, copy the sequence of numbers and letters that is displayed and paste it in the space that shows up in the code cell below.


In [2]:
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


# Utility functions to deal with images:

You don't need to modify these functions.

In [0]:
# util function to open, resize and format pictures into appropriate tensors
def preprocess_image(image_path):
    img = load_img(image_path, target_size=(img_nrows, img_ncols))
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg19.preprocess_input(img)
    return img

# util function to convert a tensor into a valid image
def deprocess_image(x):
    x = x.reshape((img_nrows, img_ncols, 3))
    # Remove zero-center by mean pixel.  This is a transformation that was done
    # by VGG19, so we need to replicate it here.
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    # 'BGR'->'RGB'.  Again, dealing with encoding used by VGG19
    x = x[:, :, ::-1]
    x = np.clip(x, 0, 255).astype('uint8')
    return x

# Read in Images

The code below reads in the content and style images using the utility functions defined above.  The results are numpy arrays.

**Please set your name in the first line below.**

In [0]:
my_name = 'evan'

content_image_path = '/content/drive/My Drive/stat344ne_style_transfer/benedict.jpg'
style_image_path = '/content/drive/My Drive/stat344ne_style_transfer/okeeffe.jpg'
#style_image_path = '/content/drive/My Drive/stat344ne_style_transfer/scream.jpg'
result_prefix = '/content/drive/My Drive/stat344ne_style_transfer/result_' + my_name + '_'

# dimensions of the generated picture.
width, height = load_img(content_image_path).size
n_h = 400
n_w = int(width * n_h / height)

# get tensor representations of our images
content_image_np = preprocess_image(content_image_path)
style_image_np = preprocess_image(style_image_path)

# Individual Loss Functions

We'll define functions to calculate all the contributions to the loss for neural style transfer.

#### 1. Gram matrix

Write a function below to calculate the Gram matrix.  Recall that the Gram matrix is obtained in two stages:

 * Find an "unfurled" features matrix `F`.  In this matrix, each row should correspond to a channel and the columns are the vectorized rows and columns put together.  There are many ways to achieve this.  Here is one:
    * Reshape `x` to have the same number of channels it currently has, and put all remaining dimensions together.  You will want to use `K.reshape`.  One trick is that if you use a shape that includes (-1, 5), that dimension of the shape will be a product of all remaining dimensions.  For example, if you input was of shape (2, 3, 5) then after reshaping to (-1, 5) your final array will have shape (2*3, 5). 
    * Transpose the result of the previous step.  You will want to use `K.transpose`.
 * Compute the Gram matrix as the dot product of `F` and its transpose.  You will want to use `K.dot` and `K.transpose`.

In [0]:
def gram_matrix(x):
    '''
    Calculate the Gram matrix summary of style

    Arguments:
     - x: a Keras backend tensor
    
    Returns:
     - Gram matrix
    '''
    F = K.transpose(K.reshape(x, (-1, np.shape(x)[-1])))
    gram = K.dot(F, K.transpose(F))
    return gram

#### 2. Style Loss

If $G^S$ and $G^X$ are the Gram matrices for the style and combination (X) images respectively, then the style loss is

$$\frac{1}{4 n_h^2 * n_w^2 * n_c^2} \sum_{i=1}^{n_c} \sum_{j=1}^{n_c} (G^S_{i,j} - G^X_{i,j})^2$$

You will want to use `K.square` and `K.sum`.  The division by the leading constants can be done with the usual python arithmetic operators.  Remember that in python, $4^2$ is calculated as `4**2`.

In [0]:
def style_loss(a_style, a_combination):
    '''
    Calculate style loss

    Arguments:
     - style_gram: gram matrix for style image (S) representation
     - combination_gram: gram matrix for combination image (X) representation
    
    Return:
     - style loss
    '''
    # extract width, height, and number of channels in this layer's activations
    n_h, n_w, n_c = K.int_shape(a_style)

    # create Gram matrices for style and combination images using gram_matrix
    style_gram = gram_matrix(a_style)
    combination_gram = gram_matrix(a_combination)

    # calculate loss
    loss = K.sum(K.square(style_gram - combination_gram)) / (4.0 * (n_c**2) * (n_w**2) * (n_h**2))

    return loss

#### 3. Content Loss

If $a^C$ and $a^X$ are the activation outputs for the content image and combination image respectively, then the content loss is

$$0.5\sum_{i = 1}^{n_h} \sum_{j = 1}^{n_w} \sum_{c=1}^{n_c} (a^C_{i,j,c} - a^X_{i,j,c})^2$$

You will want to use the functions `K.square` and `K.sum`.

In [0]:
def content_loss(a_content, a_combination):
    '''
    Calculate content loss

    Arguments:
     - a_content: activation output from given layer for content image
     - a_combination: activation output from given layer for combination image

    Reurns:
     - content loss
    '''
    return 0.5 * K.sum(K.square(a_combination - a_content))

#### 4. Total Variation Loss
We will use the version of total variation loss in our book.  If `x` is the combination image, the total variation loss is

$$\sum_{i=1}^{n_h-1} \sum_{j=1}^{n_w-1} \sum_{c = 1}^{n_c}\left\{ (x_{i,j,c} - x_{i+1,j,c})^2  + (x_{i,j,c} - x_{i,j+1,c})^2\right\}^{1.25}$$

We will organize this calculation into five steps:

1. Create an $(n_h - 1) \times (n_w - 1)$ array `a` with the differences between elements in adjacent rows and the same column.
2. Create an $(n_h - 1) \times (n_w - 1)$ array `b` with the differences between elements in adjacent columns and the same row.
3. Calculate $a^2 + b^2$, where the squares are elementwise
4. Raise each element of the array created in step 3 to the power of 1.25
5. Add all the numbers in the array from step 4

The indexing for steps 1 and 2 is a little tricky so I'll give you that.  It's worth thinking through what's happening with them though; this trick has come in useful for me many times.

In [0]:
def total_variation_loss(x, n_h, n_w):
    '''
    Calculate total variation loss

    Arguments:
     - x: a Keras backend tensor representing the combination image.
       It has shape (1, n_h, n_w, 3)
     - n_h, n_w: height and width of x
    
    Returns:
     - total variation loss
    '''
    a = x[:,:n_h-1,:n_w-1,:] - x[:,1:,:n_w-1,:]
    b = x[:,:n_h-1,:n_w-1,:] - x[:,:n_h-1,1:,:]
    step_3 = K.square(a) + K.square(b)
    step_4 = K.pow(step_3, 1.25)
    step_5 = K.sum(step_4)

    return step_5

#### 5. Assemble network inputs

We will need to obtain activation outputs from the network for the content, style, and combination images.  The easiest way to do this is to assemble them into a single tensor of shape $(3, n_h, n_w, 3)$, where the first 3 is because we have 3 images and the last 3 is for the three channels of the RGB representation of images.  To do this, we'll:

1. Create tensor `variable`s with the content and style images.  These images are currently in the numpy arrays `content_image_np` and `style_image_np`.
2. Create a tensor `placeholder` for the combination image.
3. Concatenate the three items above into a single array.

In [0]:
# Create content and style images using K.variable
content_image = K.variable(content_image_np)
style_image = K.variable(style_image_np)

# Create combination image using K.placeholder with shape (1, n_h, n_w, 3)
combination_image = K.placeholder((1, n_h, n_w, 3))

# Concatenate the tensors above using K.concatenate
# For the code below to work, please use the order
# [content_image, style_image, combination_image]
input_tensor = K.concatenate([content_image,
                              style_image,
                              combination_image], axis=0)

#### Model Set Up (nothing for you to do)

The code below reads in the VGG19 model fit and creates a dictionary with the layer names and activation outputs for each layer in the VGG19 model.

In [0]:
# build the VGG19 network with our 3 images as input
# the model will be loaded with pre-trained ImageNet weights
model = vgg19.VGG19(input_tensor=input_tensor,
                    weights='imagenet', include_top=False)

# get the symbolic outputs of each "key" layer (we gave them unique names).
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])

Here's a look at the dictionary of layer output activations:

In [77]:
outputs_dict

{'block1_conv1': <tf.Tensor 'block1_conv1_21/Relu:0' shape=(3, 400, 299, 64) dtype=float32>,
 'block1_conv2': <tf.Tensor 'block1_conv2_21/Relu:0' shape=(3, 400, 299, 64) dtype=float32>,
 'block1_pool': <tf.Tensor 'block1_pool_21/MaxPool:0' shape=(3, 200, 149, 64) dtype=float32>,
 'block2_conv1': <tf.Tensor 'block2_conv1_21/Relu:0' shape=(3, 200, 149, 128) dtype=float32>,
 'block2_conv2': <tf.Tensor 'block2_conv2_21/Relu:0' shape=(3, 200, 149, 128) dtype=float32>,
 'block2_pool': <tf.Tensor 'block2_pool_21/MaxPool:0' shape=(3, 100, 74, 128) dtype=float32>,
 'block3_conv1': <tf.Tensor 'block3_conv1_21/Relu:0' shape=(3, 100, 74, 256) dtype=float32>,
 'block3_conv2': <tf.Tensor 'block3_conv2_21/Relu:0' shape=(3, 100, 74, 256) dtype=float32>,
 'block3_conv3': <tf.Tensor 'block3_conv3_21/Relu:0' shape=(3, 100, 74, 256) dtype=float32>,
 'block3_conv4': <tf.Tensor 'block3_conv4_21/Relu:0' shape=(3, 100, 74, 256) dtype=float32>,
 'block3_pool': <tf.Tensor 'block3_pool_21/MaxPool:0' shape=(3, 50

#### 6. Assemble the final combined loss (a scalar tensor)

In [0]:
# These are the weights of the different loss components
total_variation_weight = 1.0
style_weight = 0.2
content_weight = 0.025

# Define a Keras tensor variable initialized with the value 0.0 using K.variable
loss = K.variable(0.0)

# Extract activation features for the layer used to define content loss
# You don't need to edit these 3 lines
layer_features = outputs_dict['block5_conv2']
content_image_features = layer_features[0, :, :, :]
combination_features = layer_features[2, :, :, :]

# calculate the content loss and store in the variable cl
# You will need to call the content_loss function you defined in part 3 above,
# providing content_image_features and combination_features as arguments
cl = content_loss(content_image_features, combination_features) # call content_loss

# Add the content_weight * cl to the loss.
loss = loss + content_weight * cl

# Add the style_weight * style_loss to the loss for each layer used for style
# feature_layers is a list of layers that are used for measuring style
feature_layers = ['block1_conv1', 'block2_conv1',
                  'block3_conv1', 'block4_conv1',
                  'block5_conv1']
for layer_name in feature_layers:
    # Extract activation features for the layer used to define style loss
    # You don't need to edit these 3 lines
    layer_features = outputs_dict[layer_name]
    style_reference_features = layer_features[1, :, :, :]
    combination_features = layer_features[2, :, :, :]

    # calculate the style loss and store in the variable sl
    # you will need to call the style_loss function you defined in part 2 above
    sl = style_loss(style_reference_features, combination_features)

    # Add the style_weight * sl to the loss
    loss = loss + style_weight * sl

# calculate the total variation loss based on the combination image using the
# total_variation_loss function you defined in part 4 above
tvl = total_variation_loss(combination_image, n_h, n_w)

# add total_variation_weight * tvl to the loss
loss = loss + total_variation_weight * tvl

#### 7. Create a function of the combination_image that returns the gradient of the loss with respect to the pixel values in the combination image.

In [0]:
# get the gradients of the loss with respect to the combination image
# You will need to use K.gradients
grads = K.gradients(loss, [combination_image])

# define a function that takes the combination image as an input and returns the
# grads.  You will need to use K.function.  Note that grads is already a list so
# you don't need to put it in another list.
f_grad = K.function([combination_image], grads)

#### 8. Perform estimation by gradient descent

This will take a while to run, go get a cup of coffee :)

In [83]:
# intialize optimization at the original content image
x = preprocess_image(content_image_path)

# We'll use a learning rate of 0.00005
alpha = 0.00005

start_time = time.time()
for i in range(6001):
    # Calculate the gradient vector by calling f_grad
    # Remember that you'll need to extract component 0 of the list it returns
    grad_value = f_grad([x])[0]

    # Perform a gradient descent update step on x with step size alpha
    x = x - alpha * grad_value

    # every 10 iterations, save current generated image
    # you don't need to change this code
    if i % 500 == 0:
        img = deprocess_image(x.copy())
        fname = result_prefix + 'at_iteration_%d.png' % i
        save_img(fname, img)
        end_time = time.time()
        print('Iteration', i, 'image saved as', fname)
        print('Iterations completed in %ds' % (end_time - start_time))
        start_time = time.time()


Iteration 0 image saved as /content/drive/My Drive/stat344ne_style_transfer/result_evan_at_iteration_0.png
Iterations completed in 1s
Iteration 500 image saved as /content/drive/My Drive/stat344ne_style_transfer/result_evan_at_iteration_500.png
Iterations completed in 37s
Iteration 1000 image saved as /content/drive/My Drive/stat344ne_style_transfer/result_evan_at_iteration_1000.png
Iterations completed in 36s
Iteration 1500 image saved as /content/drive/My Drive/stat344ne_style_transfer/result_evan_at_iteration_1500.png
Iterations completed in 36s
Iteration 2000 image saved as /content/drive/My Drive/stat344ne_style_transfer/result_evan_at_iteration_2000.png
Iterations completed in 37s
Iteration 2500 image saved as /content/drive/My Drive/stat344ne_style_transfer/result_evan_at_iteration_2500.png
Iterations completed in 36s
Iteration 3000 image saved as /content/drive/My Drive/stat344ne_style_transfer/result_evan_at_iteration_3000.png
Iterations completed in 36s
Iteration 3500 image s

Congratulations on finishing a long lab!  Check out your results in the google drive folder!

If you want, you can go back to the beginning and uncomment the line changing the style image to the painting Scream by Edvard Munch instead of the O'Keeffe painting under the "Read in Images" heading.