Deep Learning & Art: Neural Style Transfer
- Implement the neural style transfer algorithm
- Generate novel artistic images using your algorithm
- Define the style cost function for Neural Style Transfer
- Define the content cost function for Neural Style Transfer

1 - Packages

In [None]:
import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
import numpy as np
import tensorflow as tf
import pprint
from public_tests import *
%matplotlib inline

Neural Style Transfer (NST) is one of the most fun and interesting optimization techniques in deep learning. It merges two images, namely: a "content" image (C) and a "style" image (S), to create a "generated" image (G). The generated image G combines the "content" of the image C with the "style" of image S.

Transfer Learning

Neural Style Transfer (NST) uses a previously trained convolutional network, and builds on top of that. The idea of using a network trained on a different task and applying it to a new task is called transfer learning.

Here we used the eponymously named VGG network from the original NST paper published by the Visual Geometry Group at University of Oxford in 2014. Specifically, we used VGG-19, a 19-layer version of the VGG network. This model has already been trained on the very large ImageNet database, and has learned to recognize a variety of low level features (at the shallower layers) and high level features (at the deeper layers).


  Load parameters from the VGG model

In [None]:
tf.random.set_seed(272) # DO NOT CHANGE THIS VALUE
pp = pprint.PrettyPrinter(indent=4)
img_size = 400
vgg = tf.keras.applications.VGG19(include_top=False,
                                  input_shape=(img_size, img_size, 3),
                                  weights='pretrained-model/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5')

vgg.trainable = False
pp.pprint(vgg)

 Neural Style Transfer (NST)
Next, we build the Neural Style Transfer (NST) algorithm in three steps:

First, we build the content cost function  𝐽𝑐𝑜𝑛𝑡𝑒𝑛𝑡(𝐶,𝐺)

Second, we build the style cost function  𝐽𝑠𝑡𝑦𝑙𝑒(𝑆,𝐺)

Finally, put it all together to get  𝐽(𝐺)=𝛼𝐽𝑐𝑜𝑛𝑡𝑒𝑛𝑡(𝐶,𝐺)+𝛽𝐽𝑠𝑡𝑦𝑙𝑒(𝑆,𝐺)
 .

***Computing the Content Cost***

One goal you should aim for when performing NST is for the content in generated image G to match the content of image C
* The shallower layers of a ConvNet tend to detect lower-level features such as edges and simple textures.
* The deeper layers tend to detect higher-level features such as more complex textures and object classes.

 A method to achieve this is to calculate the content cost function, which will be defined as:

𝐽𝑐𝑜𝑛𝑡𝑒𝑛𝑡(𝐶,𝐺)=14×𝑛𝐻×𝑛𝑊×𝑛𝐶∑all entries(𝑎(𝐶)−𝑎(𝐺))2(1)

* Here,  𝑛𝐻,𝑛𝑊
  and  𝑛𝐶
  are the height, width and number of channels of the hidden layer you have chosen, and appear in a normalization term in the cost.
* For clarity, note that  𝑎(𝐶)
  and  𝑎(𝐺)
  are the 3D volumes corresponding to a hidden layer's activations.
* In order to compute the cost  𝐽𝑐𝑜𝑛𝑡𝑒𝑛𝑡(𝐶,𝐺)
 , it might also be convenient to unroll these 3D volumes into a 2D matrix, as shown below.
* Technically this unrolling step isn't needed to compute  𝐽𝑐𝑜𝑛𝑡𝑒𝑛𝑡
 , but it will be good practice for when you do need to carry out a similar operation later for computing the style cost  𝐽𝑠𝑡𝑦𝑙𝑒
 .

In [None]:

def compute_content_cost(content_output, generated_output):

    a_C = content_output[-1]
    a_G = generated_output[-1]



    # Retrieve dimensions from a_G
    m, n_H, n_W, n_C = a_G.get_shape().as_list()

    # Reshape 'a_C' and 'a_G'
    a_C_unrolled = tf.transpose(tf.reshape(a_C, shape=[m, -1, n_C]))
    a_G_unrolled = tf.transpose(tf.reshape(a_G, shape=[m, -1, n_C]))

    # compute the cost with tensorflow
    J_content = tf.reduce_sum(tf.square(tf.subtract(a_C_unrolled, a_G_unrolled))) / (4 * n_H * n_W * n_C)

    return J_content


    """
    Computes the content cost

    Arguments:
    a_C -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image C
    a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image G

    Returns:
    J_content -- scalar that you compute using equation 1 above.
    """

* The content cost takes a hidden layer activation of the neural network, and measures how different  𝑎(𝐶)
  and  𝑎(𝐺)
  are.
* When you minimize the content cost later, this will help make sure  𝐺
  has similar content as  𝐶
 .

**Computing the Style Cost**

Style Matrix

Gram matrix

* The style matrix is also called a "Gram matrix."
* In linear algebra, the Gram matrix G of a set of vectors  (𝑣1,…,𝑣𝑛)
  is the matrix of dot products, whose entries are  𝐺𝑖𝑗=𝑣𝑇𝑖𝑣𝑗=𝑛𝑝.𝑑𝑜𝑡(𝑣𝑖,𝑣𝑗)
 .
* In other words,  𝐺𝑖𝑗
  compares how similar  𝑣𝑖
  is to  𝑣𝑗
 : If they are highly similar, you would expect them to have a large dot product, and thus for  𝐺𝑖𝑗
  to be large.

***Two meanings of the variable  𝐺***

* Note that there is an unfortunate collision in the variable names used here. Following the common terminology used in the literature:
   * 𝐺
  is used to denote the Style matrix (or Gram matrix)
   * 𝐺
  also denotes the generated image.
* For the sake of clarity, in this assignment  𝐺𝑔𝑟𝑎𝑚
  will be used to refer to the Gram matrix, and  𝐺
  to denote the generated image.

In [None]:
#Gram Matrix
def gram_matrix(A):
    """
    Argument:
    A -- matrix of shape (n_C, n_H*n_W)

    Returns:
    GA -- Gram matrix of A, of shape (n_C, n_C)"""
    GA =tf.matmul(A, A, transpose_b = True)
    return GA

***Layer style cost***

In [None]:
def compute_layer_style_cost(a_S, a_G):

    # Retrieve dimensions from a_G (≈1 line)
    _, n_H, n_W, n_C = a_G.get_shape().as_list()
    # Reshape the tensors from (1, n_H, n_W, n_C) to (n_C, n_H * n_W) (≈2 lines)
    a_S = tf.transpose(tf.reshape(a_S, shape=[-1, n_C]))
    a_G = tf.transpose(tf.reshape(a_G, shape=[-1, n_C]))


    # Computing gram_matrices for both images S and G (≈2 lines)
    GS = gram_matrix(a_S)
    GG = gram_matrix(a_G)

    # Computing the loss (≈1 line)
    J_style_layer = tf.reduce_sum((GS-GG)**2) / (2*n_C*n_H*n_W)**2
    return J_style_layer


    """
    Arguments:
    a_S -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image S
    a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image G

    Returns:
    J_style_layer -- tensor representing a scalar value, style cost defined above by equation (2)
    """

In [None]:
#choose layers to represent the style of the image and assign style costs:
STYLE_LAYERS = [
    ('block1_conv1', 0.2),
    ('block2_conv1', 0.2),
    ('block3_conv1', 0.2),
    ('block4_conv1', 0.2),
    ('block5_conv1', 0.2)]

Description of compute_style_cost

For each layer:

* Select the activation (the output tensor) of the current layer.
* Get the style of the style image "S" from the current layer.
* Get the style of the generated image "G" from the current layer.
* Compute the "style cost" for the current layer
* Add the weighted style cost to the overall style cost (J_style)
* Return the overall style cost.

In [None]:

def compute_style_cost(style_image_output, generated_image_output, STYLE_LAYERS=STYLE_LAYERS):


    # initialize the overall style cost
    J_style = 0

    # Set a_S to be the hidden layer activation from the layer we have selected.
    # The last element of the array contains the content layer image, which must not be used.
    a_S = style_image_output[:-1]

    # Set a_G to be the output of the choosen hidden layers.
    # The last element of the list contains the content layer image which must not be used.
    a_G = generated_image_output[:-1]
    for i, weight in zip(range(len(a_S)), STYLE_LAYERS):
        # Compute style_cost for the current layer
        J_style_layer = compute_layer_style_cost(a_S[i], a_G[i])

        # Add weight * J_style_layer of this layer to overall style cost
        J_style += weight[1] * J_style_layer

    return J_style


    """
    Computes the overall style cost from several chosen layers

    Arguments:
    style_image_output -- our tensorflow model
    generated_image_output --
    STYLE_LAYERS -- A python list containing:
                        - the names of the layers we would like to extract style from
                        - a coefficient for each of them

    Returns:
    J_style -- tensor representing a scalar value, style cost defined above by equation (2)
    """

* The style of an image can be represented using the Gram matrix of a hidden layer's activations.
* we get even better results by combining this representation from multiple different layers.
* This is in contrast to the content representation, where usually using just a single hidden layer is sufficient.
* Minimizing the style cost will cause the image  𝐺
  to follow the style of the image  𝑆
 .

***Defining the Total Cost to Optimize***

Finally,created a cost function that minimizes both the style and the content cost. The formula is:

𝐽(𝐺)=𝛼𝐽_𝑐𝑜𝑛𝑡𝑒𝑛𝑡(𝐶,𝐺)+𝛽𝐽_𝑠𝑡𝑦𝑙𝑒(𝑆,𝐺)

In [None]:
#Total cost
@tf.function()
def total_cost(J_content, J_style, alpha = 10, beta = 40):

    J = alpha * J_content + beta * J_style

    return J

"""
    Computes the total cost function

    Arguments:
    J_content -- content cost coded above
    J_style -- style cost coded above
    alpha -- hyperparameter weighting the importance of the content cost
    beta -- hyperparameter weighting the importance of the style cost

    Returns:
    J -- total cost as defined by the formula above."""

* The total cost is a linear combination of the content cost  𝐽𝑐𝑜𝑛𝑡𝑒𝑛𝑡(𝐶,𝐺)
  and the style cost  𝐽𝑠𝑡𝑦𝑙𝑒(𝑆,𝐺)
 .
* 𝛼
  and  𝛽
  are hyperparameters that control the relative weighting between content and style.

Solving the Optimization Problem

1.   Load the content image
2.   Load the style image
3.   Randomly initialize the image to be generated
4.   Load the VGG19 model
5.   Compute the content cost
6.   Compute the style cost
7.   Compute the total cost
8.   Define the optimizer and learning rate








 Load the Content Image

In [None]:
content_image = np.array(Image.open("images/louvre_small.jpg").resize((img_size, img_size)))
content_image = tf.constant(np.reshape(content_image, ((1,) + content_image.shape)))

print(content_image.shape)
imshow(content_image[0])
plt.show()

Load the Style Image

In [None]:
style_image =  np.array(Image.open("images/monet.jpg").resize((img_size, img_size)))
style_image = tf.constant(np.reshape(style_image, ((1,) + style_image.shape)))

print(style_image.shape)
imshow(style_image[0])
plt.show()

**Randomly Initialize the Image to be Generated**
Now, we get to initialize the "generated" image as a noisy image created from the content_image.

* The generated image is slightly
correlated with the content image.
* By initializing the pixels of the generated image to be mostly noise but slightly correlated with the content image, this will help the content of the "generated" image more rapidly match the content of the "content" image.



In [None]:
generated_image = tf.Variable(tf.image.convert_image_dtype(content_image, tf.float32))
noise = tf.random.uniform(tf.shape(generated_image), -0.25, 0.25)
generated_image = tf.add(generated_image, noise)
generated_image = tf.clip_by_value(generated_image, clip_value_min=0.0, clip_value_max=1.0)

print(generated_image.shape)
imshow(generated_image.numpy()[0])
plt.show()

**Load Pre-trained VGG19 Model**

In [None]:
def get_layer_outputs(vgg, layer_names):
    """ Creates a vgg model that returns a list of intermediate output values."""
    outputs = [vgg.get_layer(layer[0]).output for layer in layer_names]

    model = tf.keras.Model([vgg.input], outputs)
    return model

In [None]:
# define the content layer and build the model
content_layer = [('block5_conv4', 1)]

vgg_model_outputs = get_layer_outputs(vgg, STYLE_LAYERS + content_layer)

In [None]:
#save the output for the content and style layer in separate variables
content_target = vgg_model_outputs(content_image)  # Content encoder
style_targets = vgg_model_outputs(style_image)     # Style encoder

**Compute Total Cost**
Compute the Content image Encoding (a_C)

In [None]:
# Assign the content image to be the input of the VGG model.
# Set a_C to be the hidden layer activation from the layer we have selected
preprocessed_content =  tf.Variable(tf.image.convert_image_dtype(content_image, tf.float32))
a_C = vgg_model_outputs(preprocessed_content)

Compute the Style image Encoding (a_S)

The code below sets a_S to be the tensor giving the hidden layer activation for STYLE_LAYERS using our style image.

In [None]:
# Assign the input of the model to be the "style" image
preprocessed_style =  tf.Variable(tf.image.convert_image_dtype(style_image, tf.float32))
a_S = vgg_model_outputs(preprocessed_style)

 display the images generated by the style transfer model.

In [None]:
def clip_0_1(image):
    """
    Truncate all the pixels in the tensor to be between 0 and 1

    Arguments:
    image -- Tensor
    J_style -- style cost coded above

    Returns:
    Tensor
    """
    return tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)

def tensor_to_image(tensor):
    """
    Converts the given tensor into a PIL image

    Arguments:
    tensor -- Tensor

    Returns:
    Image: A PIL image
    """
    tensor = tensor * 255
    tensor = np.array(tensor, dtype=np.uint8)
    if np.ndim(tensor) > 3:
        assert tensor.shape[0] == 1
        tensor = tensor[0]
    return Image.fromarray(tensor)

Training the model

Implement the train_step() function for transfer learning

* Use the Adam optimizer to minimize the total cost J.
* Use a learning rate of 0.01
* Adam Optimizer documentation
* Use tf.GradientTape to update the image. (Course 2 Week 3: TensorFlow Introduction Assignment)
* Within the tf.GradientTape():

1. Compute the encoding of the generated image using vgg_model_outputs. Assign the result to a_G.
2. Compute the total cost J, using the global variables a_C, a_S and the local a_G
3. Use alpha = 10 and beta = 40.

In [None]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

@tf.function()
def train_step(generated_image):
    with tf.GradientTape() as tape:
        # In this function you must use the precomputed encoded images a_S and a_C

        # Compute a_G as the vgg_model_outputs for the current generated image
        a_G = vgg_model_outputs(generated_image)

        # Compute the style cost
        J_style = compute_style_cost(a_S,a_G)
        # Compute the content cost
        J_content = compute_content_cost(a_C, a_G)
        # Compute the total cost
        J = total_cost(J_content, J_style, 10, 40)

    grad = tape.gradient(J, generated_image)

    optimizer.apply_gradients([(grad, generated_image)])
    generated_image.assign(clip_0_1(generated_image))
    return J

generated_image = tf.Variable(generated_image)

train_step_test(train_step, generated_image)

In [None]:
epochs = 20000
for i in range(epochs):
    train_step(generated_image)
    if i % 250 == 0:
        print(f"Epoch {i} ")
    if i % 250 == 0:
        image = tensor_to_image(generated_image)
        imshow(image)
        image.save(f"output/image_{i}.jpg")
        plt.show()

In [None]:
# Show the 3 images in a row
fig = plt.figure(figsize=(16, 4))
ax = fig.add_subplot(1, 3, 1)
imshow(content_image[0])
ax.title.set_text('Content image')
ax = fig.add_subplot(1, 3, 2)
imshow(style_image[0])
ax.title.set_text('Style image')
ax = fig.add_subplot(1, 3, 3)
imshow(generated_image[0])
ax.title.set_text('Generated image')
plt.show()

**References**

The Neural Style Transfer algorithm was due to Gatys et al. (2015). Harish Narayanan and Github user "log0" also have highly readable write-ups this lab was inspired by. The pre-trained network used in this implementation is a VGG network, which is due to Simonyan and Zisserman (2015). Pre-trained weights were from the work of the MathConvNet team.


* Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, (2015). A Neural Algorithm of Artistic Style
* Harish Narayanan, Convolutional neural networks for artistic style transfer.
* Log0, TensorFlow Implementation of "A Neural Algorithm of Artistic Style".
* Karen Simonyan and Andrew Zisserman (2015). Very deep convolutional networks for large-scale image recognition
* MatConvNet.