# Transfer learning

## Neural style transfer

- content image (C), style image (S), generated image (G)
- cost function $J(G) = \alpha J_{content}(C,G) + \beta J_{style}(S,G)$
- initialize $G$ randomly
- use gradient descent to minimize $J(G)$
    - $G = G - \dfrac{\partial}{\partial G}J(G)$
    
## Content cost function

- say you use hidden layer $l$ to compute content cost
- use pre-trained ConvNet (eg. VGG network)
- let $a^{[l](C)}$ and $a^{[l](G)}$ be the activation of layer $l$ on the images
- if $a^{[l](C)}$ and $a^{[l](G)}$ are similar, both images have similar content
    - $J_{content}(C,G) = \dfrac{1}{2}||a^{[l](C)} - a^{[l](G)}||^{2}$
    
## Style cost function

- style matrix
    - let $a_{i,j,k}^{l}$ = activation at $(i,j,k)$ (height, weight, channel)
    - $G^{[l]}$ is $n_{c}^{[l]}$ x $n_{c}^{[l]}$
    - $G_{kk'}^{[l]} = \displaystyle\sum_{i=1}^{n_{H}^{[l]}}\displaystyle\sum_{j=1}^{n_{W}^{[l]}}a_{ijk}^[l]a_{ijk'}^[l]$ (do this for both style and generated) 
    - $J_{style}^{[l]}(S,G) = ||G^{[l](S)} - G^{[l](G)}||^{2}_{F} = \displaystyle\sum_{k}\displaystyle\sum_{k'}(G_{kk'}^{[l](S)} - G_{kk'}^{[l](G)})^{2}$ 
    - $J_{style}(S,G) = \displaystyle\sum_{l}\lambda^{[l]}J_{style}^{[l]}(S,G)$

## Example

- Merges two images (content and style) to create a new image. (generated)

<img src="img/louvre_generated.png" style="width:750px;height:200px;">

<img src="img/perspolis_vangogh.png" style="width:750px;height:300px;">

<img src="img/pasargad_kashi.png" style="width:750px;height:300px;">

<img src="img/circle_abstract.png" style="width:750px;height:300px;">

Neural style transfer
- Build the content cost function $J_{content}(C,G)$
- Build the style cost function $J_{style}(S,G)$
- Put it together to get $J(G) = \alpha J_{content}(C,G) + \beta J_{style}(S,G)$. 

### Packages

In [None]:
import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
import numpy as np
import tensorflow as tf
import pprint
%matplotlib inline

In [None]:
class CONFIG:
    IMAGE_WIDTH = 400
    IMAGE_HEIGHT = 300
    COLOR_CHANNELS = 3
    NOISE_RATIO = 0.6
    MEANS = np.array([123.68, 116.779, 103.939]).reshape((1,1,1,3)) 
    VGG_MODEL = 'pretrained-model/imagenet-vgg-verydeep-19.mat' # Pick the VGG 19-layer model by from the paper "Very Deep Convolutional Networks for Large-Scale Image Recognition".
    STYLE_IMAGE = 'images/stone_style.jpg' # Style image to use.
    CONTENT_IMAGE = 'images/content300.jpg' # Content image to use.
    OUTPUT_DIR = 'output/'

In [None]:
# The model is stored in a python dictionary.  
# The python dictionary contains key-value pairs for each layer.  
# The 'key' is the variable name and the 'value' is a tensor for that layer. 
pp = pprint.PrettyPrinter(indent=4)
model = load_vgg_model("data/imagenet-vgg-verydeep-19.mat")
pp.pprint(model)

### Compute content cost

- We want "G" to be similar to "C".
- Choosing middle layer in network gets the best result in pracice.

#### Forward prop "C"

- Set "C" as the input to pretrained VGG, and run forward prop.
- $a^{(C)}$ be the activation in the chosen layer. ($n_H \times n_W \times n_C$ tensor_

#### Forward prop "G"

- Set "G" as the input to pretrained VGG, and run forward prop.
- Let $a^{(G)}$ be the corresponding activation. 

$$J_{content}(C,G) =  \frac{1}{4 \times n_H \times n_W \times n_C}\sum _{ \text{all entries}} (a^{(C)} - a^{(G)})^2$$

<img src="img/NST_LOSS.png" style="width:800px;height:400px;">

In [5]:
def compute_content_cost(a_C, a_G):
    """
    Computes the content cost
    
    Arguments:
    a_C -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image C 
    a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image G
    
    Returns: 
    J_content -- scalar that you compute using equation 1 above.
    """
    
    # Retrieve dimensions from a_G (≈1 line)
    m, n_H, n_W, n_C = a_G.get_shape().as_list()
    
    # Reshape a_C and a_G (≈2 lines)
    a_C_unrolled = tf.reshape(a_C, [m, tf.multiply(n_H, n_W), n_C])
    a_G_unrolled = tf.reshape(a_G, [m, tf.multiply(n_H, n_W), n_C])
    
    # compute the cost with tensorflow (≈1 line)
    J_content = tf.reduce_sum(tf.square(tf.subtract(a_C_unrolled, a_G_unrolled))) / (4 * n_H * n_W * n_C)
    
    return J_content

### Compute style cost

#### Style matric (Gram matrix)

- ${\displaystyle G_{ij} = v_{i}^T v_{j} = np.dot(v_{i}, v_{j})  }$.
- Measures how similar $v_i$ is to $v_j$ .

<img src="img/NST_GM.png" style="width:900px;height:300px;">
$$\mathbf{G}_{gram} = \mathbf{A}_{unrolled} \mathbf{A}_{unrolled}^T$$

In [1]:
def gram_matrix(A):
    """
    Argument:
    A -- matrix of shape (n_C, n_H*n_W)
    
    Returns:
    GA -- Gram matrix of A, of shape (n_C, n_C)
    """
    
    GA = tf.matmul(A, tf.transpose(A))
    
    return GA

#### Style cost

$$J_{style}^{[l]}(S,G) = \frac{1}{4 \times {n_C}^2 \times (n_H \times n_W)^2} \sum _{i=1}^{n_C}\sum_{j=1}^{n_C}(G^{(S)}_{(gram)i,j} - G^{(G)}_{(gram)i,j})^2$$

In [2]:
def compute_layer_style_cost(a_S, a_G):
    """
    Arguments:
    a_S -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image S 
    a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image G
    
    Returns: 
    J_style_layer -- tensor representing a scalar value, style cost defined above by equation (2)
    """
    
    # Retrieve dimensions from a_G (≈1 line)
    m, n_H, n_W, n_C = a_G.get_shape().as_list()
    
    # Reshape the images to have them of shape (n_C, n_H*n_W) (≈2 lines)
    #a_S = tf.reshape(a_S, [n_C, tf.multiply(n_H, n_W)])
    #a_G = tf.reshape(a_G, [n_C, tf.multiply(n_H, n_W)])
    a_S = tf.transpose(tf.reshape(a_S, [tf.multiply(n_H, n_W), n_C]))
    a_G = tf.transpose(tf.reshape(a_G, [tf.multiply(n_H, n_W), n_C]))

    # Computing gram_matrices for both images S and G (≈2 lines)
    GS = gram_matrix(a_S)
    GG = gram_matrix(a_G)

    # Computing the loss (≈1 line)
    #print(tf.reduce_sum(tf.square(tf.subtract(GS, GG))).eval())
    
    J_style_layer = tf.reduce_sum(tf.square(tf.subtract(GS, GG))) / (4*n_H*n_W*n_H*n_W*n_C*n_C)
    
    return J_style_layer

#### Style weights

$$J_{style}(S,G) = \sum_{l} \lambda^{[l]} J^{[l]}_{style}(S,G)$$

where $\lambda^{[l]}$ is `STYLE_LAYERS`. 

In [3]:
STYLE_LAYERS = [
    ('conv1_1', 0.2),
    ('conv2_1', 0.2),
    ('conv3_1', 0.2),
    ('conv4_1', 0.2),
    ('conv5_1', 0.2)]

In [4]:
def compute_style_cost(model, STYLE_LAYERS):
    """
    Computes the overall style cost from several chosen layers
    
    Arguments:
    model -- our tensorflow model
    STYLE_LAYERS -- A python list containing:
                        - the names of the layers we would like to extract style from
                        - a coefficient for each of them
    
    Returns: 
    J_style -- tensor representing a scalar value, style cost defined above by equation (2)
    """
    
    # initialize the overall style cost
    J_style = 0

    for layer_name, coeff in STYLE_LAYERS:

        # Select the output tensor of the currently selected layer
        out = model[layer_name]

        # Set a_S to be the hidden layer activation from the layer we have selected, by running the session on out
        a_S = sess.run(out)

        # Set a_G to be the hidden layer activation from same layer. Here, a_G references model[layer_name] 
        # and isn't evaluated yet. Later in the code, we'll assign the image G as the model input, so that
        # when we run the session, this will be the activations drawn from the appropriate layer, with G as input.
        a_G = out
        
        # Compute style_cost for the current layer
        J_style_layer = compute_layer_style_cost(a_S, a_G)

        # Add coeff * J_style_layer of this layer to overall style cost
        J_style += coeff * J_style_layer

    return J_style

### Total cost

$$J(G) = \alpha J_{content}(C,G) + \beta J_{style}(S,G)$$

In [5]:
def total_cost(J_content, J_style, alpha = 10, beta = 40):
    """
    Computes the total cost function
    
    Arguments:
    J_content -- content cost coded above
    J_style -- style cost coded above
    alpha -- hyperparameter weighting the importance of the content cost
    beta -- hyperparameter weighting the importance of the style cost
    
    Returns:
    J -- total cost as defined by the formula above.
    """
    
    J = alpha * J_content + beta * J_style
    
    return J