# [Machine Learning with CoreML](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-core-ml)
**By:** Joshua Newnham (Author)  
**Publisher:** [Packt Publishing](https://www.packtpub.com/) 

## Chapter 7 - Fast Neural Style Transfer 
This notebook is concerned with extracting the **content** from an image and using this to *steer* the network (loss function). 

At a highlevel; this is achieved by using a model ([VGG16](https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3)) that has been trained to perform object recognition. The features it learns is classify the object within the image is what we use for both style and content.  

The model is made up of a series of convolutional layers where these layers establish **feature maps** that can be seen as the models internal representation of the image content. Typically; the shallow layers represent basic shapes but deeper layers represent more abstract features (as they operate on a layer scale and thus have a higher-level representation of the image). The image below illustrates the *features* of an image which are activated at each of the layers. 

<img src="images/layer_activations.png" />

Therefore; to compare our generated image to the content image we can extract features vectors from the deeper layers and calculate a distance (with the goal of nearing 0). The image below illustrates this process and is the purpose of this notebook. 

<img src="images/content_loss.png" width="80%" />

In [1]:
import warnings
warnings.filterwarnings("ignore", message="numpy.dtype size changed")
warnings.filterwarnings("ignore", message="numpy.ufunc size changed")

In [2]:
from builtins import range, input 

In [3]:
from keras.layers import Input, Lambda, Dense, Flatten
from keras.layers import AveragePooling2D, MaxPooling2D
from keras.layers.convolutional import Conv2D
from keras.models import Model, Sequential
from keras.applications.vgg16 import VGG16
from keras.applications.vgg16 import preprocess_input
from keras.preprocessing import image

Using TensorFlow backend.


In [4]:
import keras.backend as K
import numpy as np 
import matplotlib.pyplot as plt

In [5]:
from scipy.optimize import fmin_l_bfgs_b

In [6]:
from datetime import datetime

In [7]:
### Re-create VGG16; replacing MaxPooling with AveragePooling 

In [8]:
def VGG16_AvgPool(shape):
    vgg16 = VGG16(input_shape=shape, weights='imagenet', include_top=False)
    
    avg_vgg16 = Sequential() 
    for layer in vgg16.layers:
        if layer.__class__ == MaxPooling2D:
            avg_vgg16.add(AveragePooling2D())
        else:
            avg_vgg16.add(layer) 
            
    return avg_vgg16

In [9]:
def VGG16_Cutoff(shape, num_convs):
    """
    There are 13 convolutions in total, 
    we can choose any of them for our output 
    """
    vgg = VGG16_AvgPool(shape)
    vgg16_cutoff = Sequential() 
    n = 0 
    for layer in vgg.layers:                
        vgg16_cutoff.add(layer)
        
        if layer.__class__ == Conv2D:
            n += 1
            if n >= num_convs:
                break 
                
    return vgg16_cutoff

In [10]:
def unpreprocess(img):
    img[...,0] += 103.939
    img[...,1] += 116.779
    img[...,2] += 126.68
    
    img = img[...,::-1]
    
    return img 

In [11]:
def scale_img(img):
    img = img - img.min() 
    img = img / img.max() 
    return img 

In [12]:
def gram_matrix(img):
    """
    Input is (H, W, C) (C = # feature maps);
    we first need to convert it to HW, C    
    """
    X = K.batch_flatten(K.permute_dimensions(img, (2, 0, 1)))
    
    # Now calculate the gram matrix 
    # gram = XX^T / N
    # The constant is not important since we'll be weighting these 
    G = K.dot(X, K.transpose(X)) / img.get_shape().num_elements() 
    return G

In [13]:
def style_loss(y, t):
    """
    y: generated image 
    t: target image 
    """
    return K.mean(K.square(gram_matrix(y) - gram_matrix(t)))

In [14]:
def minimize(fn, epochs, batch_shape):
    t0 = datetime.now() 
    losses = [] 
    # initilise our generated image with random values 
    x = np.random.randn(np.prod(batch_shape))
    for i in range(epochs):
        x, l, _ = fmin_l_bfgs_b(
            func=fn, 
            x0=x, 
            maxfun=20)
        x = np.clip(x, -127, 127)

        print("iteration {} loss {}".format(i, l))
        losses.append(l)
        
    t1 = datetime.now() 
    print("duration: {}".format(t1-t0))
    plt.plot(losses)
    plt.show() 
    
    output_img = x.reshape(*batch_shape)
    output_img = unpreprocess(output_img)
    return output_img[0] 

In [15]:
def process(img_path):
    img = image.load_img(img_path)
    
    # convert image to array and preprocess for vgg 
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    
    # grab the shape 
    batch_shape = x.shape 
    shape = x.shape[1:]
        
    # lets take the first convolution of each block 
    # to be the target outputs     
    vgg = VGG16_AvgPool(shape)
    
    # Note: you need to select the output at index 1, since the 
    # output at index 0 corrosponds to the original vgg with maxpool 
    symbloic_conv_outputs = [
        layer.get_output_at(1) for layer in vgg.layers if layer.name.endswith('conv1')
    ]
    
    # Pick the earlier layers for more "localised" representaiton; 
    # this is the opposute to the content model where the 
    # later layers represent a more "global" structure 
    
    # symbloic_conv_outputs = symbloic_conv_outputs[:2] # example of a subset 
    
    # Make a big model that outputs multiple output layers 
    multi_output_model = Model(vgg.input, symbloic_conv_outputs)
    
    # calcualte the targets that are outputs for each layer 
    style_layer_outputs = [K.variable(y) for y in multi_output_model.predict(x)]
    
    # calculate the total style loss 
    loss = 0 
    for symbolic, actual in zip(symbloic_conv_outputs, style_layer_outputs):
        # gram_matrix() expects a (H, W, C) as input 
        loss += style_loss(symbolic[0], actual[0])
        
    grads = K.gradients(loss, multi_output_model.input)
    
    get_loss_and_grads = K.function(
        inputs=[multi_output_model.input], 
        outputs=[loss] + grads)        
    
    def get_loss_and_grads_wrapper(x_vec):
        """
        scipy's minimizer allows us to pass back 
        function value f(x) and its gradient f'(x) 
        simultaneously rather than using the fprime arg 
        
        We cannot use get_loss_and_grads() directly, 
        the minimizer func must be a 1-D array. 
        Input to get_loss_and_grads must be [batch_of_images]
        
        Gradient must also be a 1-D array and both, 
        loss and graident, must be np.float64 otherwise we will 
        get an error
        """
        
        l, g = get_loss_and_grads([x_vec.reshape(*batch_shape)])
        return l.astype(np.float64), g.flatten().astype(np.float64)
        
    final_img = minimize(get_loss_and_grads_wrapper, 10, batch_shape)        
    plt.imshow(scale_img(final_img))
    plt.show() 

In [None]:
STYLE_IMAGE = "../images/Van_Gogh-Starry_Night.jpg"
process(STYLE_IMAGE)