## Neural style transfer

This is another major development in deep-learnig-driven image modifications, introduced by Leon Gatys et al. in the summer of 2015 [[ref]](https://arxiv.org/abs/1508.06576).  This section focuses on the formulation described in the original paper. 

Neural style transfer consist of applying the style of a reference image to a target image while conserving the contect of the target image. 

In this context, *style* means textures, colors, and visual patterns in the image, at various spatial scales; and the content is the higher-level macrostructure of the image. 

This idea has had a long history in the image-processing community prior to the development of neural style transfer in 2015. 

The key notion behind implementing style transfer is the same idea that's central to all depp-learning algorithms: you define a loss function to specify what you want to achieve, and you minimize this loss. What do we want to achieve? Conserving the content of the original image while adopting the style of the reference image. If we were able to mathematically define *content* and *style*, the an appropriate loss function to minimize would be the following:

```python
loss = distance(style(reference_image) - style(generated_image)) + 
       distance(content(original_image) - content(generated_image))
```

Here `distance` is a norm function such as the L2 norm, `content` is a function that takes an image and computes a representation of its content, and `style` is a function that takes an image and computes a representation of its style. Minimizing this loss causes `style(generated_image)` to be close to `style(reference_image)`, and the same for `content`, thus achieving style transfer as we define it. 

A very important observation made by Gatys et al. was that deep convolutional neural networks offer a way to mathematically define the `style` and `content` functions. Let's see how:

## 8.3.1 The content loss

As you already know, activations from earlier layers in a network contain *local* information about the image, whereas activations from higher layers contain increasingly *global*, *abstract* information. Formulated in a different way, the activations of the different layers of a convnet provide a decomposition of the contents of an image over different spatial scales. Therefore, you’d expect the content of an image, which is more global and
abstract, to be captured by the representations of the upper layers in a convnet.

A good candidate for content loss is thus the L2 norm between the activations of an upper layer in a pretrained convnet, computed over the target image, and the activations of the same layer computed over the generated image. This guarantees that, as seen from the upper layer, the generated image will look similar to the original target image. Assuming that what the upper layers of a convnet see is really the content of their input images, then this works as a way to preserve image content.


## 8.3.2

The content loss only uses a single upper layer, but the style loss as defined by Gatys et al. uses multiple layers of a convnet: you try to capture the appearance of the style-reference image at all spatial scales extracted by the convnet, not just a single scale.

For the style loss, Gatys et al. use the *Gram matrix* of a layer’s activations: the inner product of the feature maps of a given layer. This inner product can be understood as representing a map of the correlations between the layer’s features. These feature correlations capture the statistics of the patterns of a particular spatial scale, which empirically correspond to the appearance of the textures found at this scale.

Hence, the style loss aims to preserve similar internal correlations within the activations of different layers, across the style-reference image and the generated image. In turn, this guarantees that the textures found at different spatial scales look similar across the style-reference image and the generated image.

In short, you can use a pretrained convnet to define a loss that will do the following:
- Preserve content by maintaining similar high-level layer activations between the target content image and the generated image. The convnet should “see” both the target image and the generated image as containing the same things.
- Preserve style by maintaining similar correlations within activations for both low-
level layers and high-level layers. Feature correlations capture textures: the generated image and the style-reference image should share the same textures at different spatial scales.

Let's see an implementation of the original 2015 work. It shares many similarities with the DeepDream implementation we already implemented. 

## 8.3.3 Neural style transfer in Keras

We'll use the VGG19 network used by Gatys et al. VGG19 is a simple variant of the VGG16 network intriduced in chapter 5, with 3 more conv. layers.

This is the general process:
1. Set up a network that computes VGG19 layer activations for the style-reference image, the target image and the generated image at the same time. 
2. Use the layer activations computed over these three images to define the loss function described earlier, which we'll minimize in order to achieve style transfer.
3. Set up a gradient-descent process to minimize this loss function

Let's start by defining the paths to the images. 

### L8.14 Defining initial variables


In [0]:
%tensorflow_version 1.x

TensorFlow 1.x selected.


In [0]:
!pip install keras==2.0.8

Collecting keras==2.0.8
[?25l  Downloading https://files.pythonhosted.org/packages/67/3f/d117d6e48b19fb9589369f4bdbe883aa88943f8bb4a850559ea5c546fefb/Keras-2.0.8-py2.py3-none-any.whl (276kB)
[K     |█▏                              | 10kB 15.9MB/s eta 0:00:01[K     |██▍                             | 20kB 4.3MB/s eta 0:00:01[K     |███▋                            | 30kB 6.1MB/s eta 0:00:01[K     |████▊                           | 40kB 7.7MB/s eta 0:00:01[K     |██████                          | 51kB 5.1MB/s eta 0:00:01[K     |███████▏                        | 61kB 5.9MB/s eta 0:00:01[K     |████████▎                       | 71kB 6.7MB/s eta 0:00:01[K     |█████████▌                      | 81kB 7.4MB/s eta 0:00:01[K     |██████████▊                     | 92kB 6.0MB/s eta 0:00:01[K     |███████████▉                    | 102kB 6.5MB/s eta 0:00:01[K     |█████████████                   | 112kB 6.5MB/s eta 0:00:01[K     |██████████████▎                 | 122kB 6.5MB/s

In [0]:
cd /content/drive/My Drive/kaggle/

/content/drive/My Drive/kaggle


In [0]:
from keras.preprocessing.image import load_img, img_to_array

target_image_path = 'm1.jpg'
# style_reference_image_path = 'sn_vg.jpg'
style_reference_image_path = 'tlor0.jpg'

width, height = load_img(target_image_path).size
img_height = 400
img_width = int(width * img_height / height)

We need to define some functions for loading, preprocessing, and postprocessing the images that go in and out of the VGG19 convnet

### L8.15 Auxiliary functions

In [0]:
import numpy as np
from keras.applications import vgg19

def preprocess_image(image_path):
  img = load_img(image_path, target_size=(img_height, img_width))
  img = img_to_array(img)
  img = np.expand_dims(img, axis=0)
  img = vgg19.preprocess_input(img)
  return img


# zero-centering by removing the mean pixel value from ImageNet. This reverses a
# transformation done by vgg19.preprocess_input
def deprocess_image(x):
  x[:, :, 0] += 103.939
  x[:, :, 1] += 116.779
  x[:, :, 2] += 123.68
  # converts images from 'BGR' to 'RGB'. This is also part of the reversal of 
  # vgg19.preprocess_input
  x = x[:, :, ::-1] 
  x = np.clip(x, 0, 255).astype('uint8')
  return x

Let’s set up the VGG19 network. It takes as input a batch of three images: the style-reference image, the target image, and a placeholder that will contain the generated image. A placeholder is a symbolic tensor, the values of which are provided externally via Numpy arrays. The style-reference and target image are static and thus defined using K.constant , whereas the values contained in the placeholder of the generated image will change over time.

### L8.16 Loading the pretrained VGG19 network and applying it to the three images

In [0]:
from keras import backend as K

target_image = K.constant(preprocess_image(target_image_path))

style_reference_image = K.constant(preprocess_image(style_reference_image_path))

combination_image = K.placeholder((1, img_height, img_width, 3))

# combining the 3 images in a single batch
input_tensor = K.concatenate([target_image, 
                              style_reference_image, 
                              combination_image], 
                             axis=0)

model = vgg19.VGG19(input_tensor=input_tensor, 
                    weights='imagenet',
                    include_top=False)

print('Model loaded.')

Model loaded.


Let's define the content loss, which will make sure the top layer of the VGG19 convnet has a similar view of the target image and the generated image. 

### L8.17 Content loss

In [0]:
def content_loss(base, combination):
  return K.sum(K.square(combination - base))

Now the style loss. It uses an aux function to compute the Gram matrix of an input matrix: a map of the correlations found in the original feature matrix.

### L8.18 Style loss

In [0]:
def gram_matrix(x):
  features = K.batch_flatten(K.permute_dimensions(x, (2, 0, 1)))
  gram = K.dot(features, K.transpose(features))
  return gram

def style_loss(style, combination):
  S = gram_matrix(style)
  C = gram_matrix(combination)
  channels = 3
  size = img_height * img_width
  return K.sum(K.square(S - C)) / (4. * (channels ** 2) * (size ** 2))

To these 2 loss components we add a third one: the *total variation loss*, which operates on the pixels of the generated combination image. It encourages spatial continuity in the generated image, thus avoiding overly pixelated results. You can interpret it as a regularization loss. 

### L8.19 Total variation loss

In [0]:
def total_variation_loss(x):
  a = K.square(
      x[:, :img_height - 1, :img_width - 1, :] - 
      x[:, 1:, :img_width - 1, :])

  b = K.square(
      x[:, :img_height - 1, :img_width - 1, :] - 
      x[:, :img_height - 1, 1:, :])

  return K.sum(K.pow(a + b, 1.25))

We will minimize a weighted average of these three losses. 

To compute the content loss, we use only one upper layer (the `block5_conv2` layer) whereas for the
style loss, we use a list of layers than spans both low-level and high-level layers. 

We add the total variation loss at the end.

We can tune the `content_weight` parameter.
A higher `content_weight` means the target content will be more recognizable in the generated image.

### L8.20 Defining the final loss that you'll minimize

In [0]:
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])

content_layer = 'block5_conv2'

style_layers = ['block1_conv1', 
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1']


total_variation_weight = 1e-4
style_weight = 1.
content_weight = 0.025

loss = K.variable(0.)
layer_features = outputs_dict[content_layer]
target_image_features = layer_features[0, :, :, :]
combination_features = layer_features[2, :, :, :]
loss += content_weight * content_loss(target_image_features, combination_features)

for layer_name in style_layers:
  layer_features = outputs_dict[layer_name]
  style_reference_features = layer_features[1, :, :, :]
  combination_features = layer_features[2, :, :, :]
  sl = style_loss(style_reference_features, combination_features)
  loss += (style_weight / len(style_layers)) * sl

loss += total_variation_weight * total_variation_loss(combination_image)



Now we’ll set up the gradient-descent process. In the original Gatys et al. paper,
optimization is performed using the L-BFGS algorithm, so that’s what you’ll use here. This is a key difference from the DeepDream example in section 8.2. The L-BFGS algorithm comes packaged with SciPy, but there are two slight limitations with the SciPy implementation:
- It requires that you pass the value of the loss function and the value of the gra-
dients as two separate functions.
- It can only be applied to flat vectors, whereas you have a 3D image array.

It would be inefficient to compute the value of the loss function and the value of the gradients independently, because doing so would lead to a lot of redundant computation between the two; the process would be almost twice as slow as computing them jointly. To bypass this, you’ll set up a Python class named Evaluator that computes both the loss value and the gradients value at once, returns the loss value when called the first time, and caches the gradients for the next call.

### L8.21 Setting up the gradient-descent process

In [0]:
grads = K.gradients(loss, combination_image)[0]

fetch_loss_and_grads = K.function([combination_image], [loss, grads])

class Evaluator(object):
  
  def __init__(self):
    self.loss_value = None
    self.grads_values = None

  def loss(self, x):
    assert self.loss_value is None
    x = x.reshape((1, img_height, img_width, 3))
    outs = fetch_loss_and_grads([x])
    loss_value = outs[0]
    grad_values = outs[1].flatten().astype('float64')
    self.loss_value = loss_value
    self.grad_values = grad_values
    return self.loss_value

  def grads(self, x):
    assert self.loss_value is not None
    grad_values = np.copy(self.grad_values)
    self.loss_value = None
    self.grad_values = None
    return grad_values

evaluator = Evaluator()

Finally, you can run the gradient-ascent process using SciPy’s L-BFGS algorithm, saving the current generated image at each iteration of the algorithm (here, a single iteration represents 20 steps of gradient ascent).

### L8.22 Style-transfer loop

In [0]:
from scipy.optimize import fmin_l_bfgs_b
# from scipy.misc import imsave
import imageio
import time

result_prefix = 'my_result'
iterations = 20

x = preprocess_image(target_image_path)
x = x.flatten()

for i in range(iterations):
  print('Start of iteration', i)
  start_time = time.time()
  x, min_val, info = fmin_l_bfgs_b(evaluator.loss,
                                   x,
                                   fprime=evaluator.grads,
                                   maxfun=20)
  
  print('Current loss value:', min_val)
  img = x.copy().reshape((img_height, img_width, 3))
  img = deprocess_image(img)
  fname = result_prefix + '_at_iteration_%d.png' % i
  # imsave(fname, img)
  imageio.imwrite(fname, img)
  # print('Image saved as', fname)
  end_time = time.time()
  # print('Iteration %d completed in %ds' % (i, end_time - start_time))

Start of iteration 0
Current loss value: 271311070.0
Start of iteration 1
Current loss value: 146308420.0
Start of iteration 2
Current loss value: 113658760.0
Start of iteration 3
Current loss value: 97613230.0
Start of iteration 4
Current loss value: 86635784.0
Start of iteration 5
Current loss value: 78297190.0
Start of iteration 6
Current loss value: 72437260.0
Start of iteration 7
Current loss value: 67907240.0
Start of iteration 8
Current loss value: 64273824.0
Start of iteration 9
Current loss value: 61370540.0
Start of iteration 10
Current loss value: 58472676.0
Start of iteration 11
Current loss value: 56075296.0
Start of iteration 12
Current loss value: 53882150.0
Start of iteration 13
Current loss value: 52020732.0
Start of iteration 14
Current loss value: 50177236.0
Start of iteration 15
Current loss value: 48594600.0
Start of iteration 16
Current loss value: 47076148.0
Start of iteration 17
Current loss value: 45704596.0
Start of iteration 18
Current loss value: 44278616.0


Note that running this style-transfer algorithm is slow. But the transformation operated by the setup is simple enough that it can be learned by a small, fast
feedforward convnet as well—as long as you have appropriate training data available.

Fast style transfer can thus be achieved by first spending a lot of compute cycles to generate input-output training examples for a fixed style-reference image, using the method outlined here, and then training a simple convnet to learn this style-specific transformation. Once that’s done, stylizing a given image is instantaneous: it’s just a forward pass of this small convnet

## 8.3.4 Wrapping up

- Style transfer consists of creating a new image that preserves the contents of a target image while also capturing the style of a reference image.
- Content can be captured by the high-level activations of a convnet.
- Style can be captured by the internal correlations of the activations of different layers of a convnet.
- Hence, deep learning allows style transfer to be formulated as an optimization
process using a loss defined with a pretrained convnet.
- Starting from this basic idea, many variants and refinements are possible.