# Style Transfer using VGG-16 Model

<hr>

## Introduction

We can maximize the feature activations inside a neural network so as to amplify patterns in the input image. This is called DeepDreaming.

This notebook uses a similar idea but takes two images as input: A content-image and a style-image. We then wish to create a mixed-image which has the contours of the content-image and the colours and texture of the style-image.

# Step 0 - what are we doing here?

- the basic idea in 2015 by the german researchers was to repurpose a fully trained convolutional network to help trasnfer the style between two images.
- we slice off the top layer since its used for classification and then 
- use the feature vectors at given layers to minimize the difference between two images
- tensorflow 1.0, python 3, numpy and matplotlib

# Step 1 - show style transfer examples for pics and videos

- video style transfer paper http://genekogan.com/works/style-transfer/
- introduces another loss function to minimize optical flow
- Optical flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene
- show video style transfer and gradual transform vidoe at the bottom

- Neural Doodle builds on this https://github.com/alexjc/neural-doodle creates semantic map using image segmentation

# Step 2 - what is this process?

- we want countour lines from content image
- we want and textures from style image
- whats in a filter? https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/
- each conv layer has as many 3D filters as we set. could be 1. could be 20.
- normally used to identify if features exist for classification
- i.e minimize loss between labeled image and feature map output
- each filter performs operation on input
- outputs 3D feature map/activation map. 
- start with random noise for mixed image
- calculate different loss functions at different layers
- weigh these loss functions respectively
- gradient of combined loss functions to update mixed image
- we do this 100-1000 times until image is mixed


content loss
--------
- calculate features/values at higher layer 
- minimize difference between activation features between content and  -- mixed image. calculate mean squared error. that is loss function. we want to minimize this.

- we cache the feature from content image because we don't need to recalculate that. 

style loss
--------
- multiple layers 
- minimize difference between gram matrix for layer 1 and 2 for style and mixed image



## Flowchart

This flowchart shows roughly the idea of the Style Transfer algorithm, although we use the VGG-16 model which has many more layers than shown here.

Two images are input to the neural network: A content-image and a style-image. We wish to generate the mixed-image which has the contours of the content-image and the colours and texture of the style-image.
We do this by creating several loss-functions that can be optimized.

The loss-function for the content-image tries to minimize the difference between the features that are activated for the content-image and for the mixed-image, at one or more layers in the network. This causes the contours of the mixed-image to resemble those of the content-image.

The loss-function for the style-image is slightly more complicated, because it instead tries to minimize the difference between the so-called Gram-matrices for the style-image and the mixed-image. This is done at one or more layers in the network. The Gram-matrix measures which features are activated simultaneously in a given layer. Changing the mixed-image so that it mimics the activation patterns of the style-image causes the colour and texture to be transferred.

We use TensorFlow to automatically derive the gradient for these loss-functions. The gradient is then used to update the mixed-image. This procedure is repeated a number of times until we are satisfied with the resulting image.

There are some details of the Style Transfer algorithm not shown in this flowchart, e.g. regarding calculation of the Gram-matrices, calculation and storage of intermediate values for efficiency, a loss-function for denoising the mixed-image, and normalization of the loss-functions so they are easier to scale relative to each other.



<img src='images/15_style_transfer_flowchart.png',height=70% width= 80%>

In [1]:
import matplotlib.pyplot as plt
import tensorflow as tf 
import numpy as np 
import PIL.Image
from IPython.display import Image, display
%matplotlib inline

## VGG-16 Model

After having spent 2 days trying to get the style-transfer algorithm to work with the Inception 5h model that we used for DeepDreaming in Tutorial #14, I could not produce images that looked any good. This seems strange because the images that were produced in Tutorial #14 looked quite nice. But recall that we also used a few tricks to achieve that quality, such as smoothing the gradient and recursively downscaling and processing the image.

The [original paper](https://arxiv.org/abs/1508.06576) on style transfer used the VGG-19 convolutional neural network. But the pre-trained VGG-19 models for TensorFlow did not seem suitable for this tutorial for different reasons. Instead we will use the VGG-16 model, which someone else has made available and which can easily be loaded in TensorFlow. We have wrapped it in a class for convenience.

In [2]:
import vgg16
## download vgg model 
vgg16.maybe_download()

Downloading VGG16 Model ...
Data has apparently already been downloaded and unpacked.


## Helper-functions for image manipulation

This function loads an image and returns it as a numpy array of floating-points. The image can be automatically resized so the largest of the height or width equals `max_size`.

In [3]:
def lod_image(filename,max_size=None):
    image = PIL.Image.open(filename)
    
    if max_size is not None:
        # Calculate the appropriate rescale-factor for ensuring a max height and width, while keeping
        # the proportion between them.
        factor = max_size / np.max(image.size)
        # Scale the image's height and width.
        size = np.array(image.size) * factor
        size = size.astype(int)
        image = image.resize(size,PIL.Image.LANCZOS)
    ## convert to numpy floating-pint array 
    return np.float32(image)

# Save an image as a jpeg-file. The image is given as a numpy array 
# with pixel-values between 0 and 255.
def save_image(image, filename):
    # Ensure the pixel-values are between 0 and 255.
    image = np.clip(image, 0.0, 255.0)
    # Convert to bytes.
    image = image.astype(np.uint8)
    # Write the image-file in jpeg-format.
    with open(filename, 'wb') as file:
        PIL.Image.fromarray(image).save(file, 'jpeg')

# Plot a image 
def plot_image_big(image):
    # Ensure the pixel-values are between 0 and 255.
    image = np.clip(image, 0.0, 255.0)
    # Convert pixels to bytes.
    image = image.astype(np.uint8)
    # Convert to a PIL-image and display it.
    display(PIL.Image.fromarray(image))

###############################################################
### This function plots the content-, mixed- and style-images.
###############################################################
def plot_images(content_image, style_image, mixed_image):
    # Create figure with sub-plots.
    fig, axes = plt.subplots(1, 3, figsize=(10, 10))
    # Adjust vertical spacing.
    fig.subplots_adjust(hspace=0.1, wspace=0.1)
    # Use interpolation to smooth pixels?
    smooth = True
    # Interpolation type.
    if smooth:
        interpolation = 'sinc'
    else:
        interpolation = 'nearest'

    # Plot the content-image.
    # Note that the pixel-values are normalized to
    # the [0.0, 1.0] range by dividing with 255.
    ax = axes.flat[0]
    ax.imshow(content_image / 255.0, interpolation=interpolation)
    ax.set_xlabel("Content")
    # Plot the mixed-image.
    ax = axes.flat[1]
    ax.imshow(mixed_image / 255.0, interpolation=interpolation)
    ax.set_xlabel("Mixed")
    # Plot the style-image
    ax = axes.flat[2]
    ax.imshow(style_image / 255.0, interpolation=interpolation)
    ax.set_xlabel("Style")
    # Remove ticks from all the plots.
    for ax in axes.flat:
        ax.set_xticks([])
        ax.set_yticks([])
    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()


### Define lost functions
This function creates a TensorFlow operation for calculating the Mean Squared Error between the two input tensors.

In [4]:
## mean squared error
def mean_squared_error(a, b):
    return tf.reduce_mean(tf.square(a - b))

This function creates the loss-function for the content-image. It is the Mean Squared Error of the feature activations in the given layers in the model, between the content-image and the mixed-image. When this content-loss is minimized, it therefore means that the mixed-image has feature activations in the given layers that are very similar to the activations of the content-image. Depending on which layers you select, this should transfer the contours from the content-image to the mixed-image.

In [5]:
#step 3 - content image is 3d numpy array, indices for the layers
#we want to use for content loss
#you should expirment what looks good for different layers
#there is not one best layer, we haven't found a way to minimize
#loss for beauty. how to quantify?

def create_content_loss(session,model,content_image,layer_ids):
    '''
    Create the loss-function for the content-image.
    
    Parameters:
    session: An open TensorFlow session for running the model's graph.
    model: The model, e.g. an instance of the VGG16-class.
    content_image: Numpy float array with the content-image.
    layer_ids: List of integer id's for the layers to use in the model.
    '''
    # A python dictionary object is generated with the 
    # placeholders as keys and the representative feed 
    # tensors as values.
    # Create a feed-dict with the content-image.
    feed_dict = model.create_feed_dict(image=content_image)
    # Get references to the tensors for the given layers.
    # collection of filters---------------------------- is this weights???
    layers = model.get_layer_tensors(layer_ids)
    # Calculate the output values of those layers when
    # feeding the content-image to the model.
    values = session.run(layers,feed_dict=feed_dict)
    with model.graph.as_default():
        # Initialize an empty list of loss-functions.
        #because we are calculating losses per layer
        layer_lossess=[]
        # For each layer and its corresponding values
        # for the content-image.
        for value,layer in zip(values,layers):
            # These are the values that are calculated
            # for this layer in the model when inputting
            # the content-image. Wrap it to ensure it
            # is a const - although this may be done
            # automatically by TensorFlow.
            value_const = tf.constant(value)
            ## take the mean square error of these two
            loss = mean_squared_error(layer,value_const)
            # list of loss-functions
            layer_loss.append(loss)
            
        # The combined loss for all layers is just the average.
        total_loss = tf.reduce_mean(layer_lossess)
    
    return total_loss
