# Introduction


# Neural style transfer

### Convolutional Neural Network:

Convolutional Neural Networks (CNNs) are a category of Neural Network that have proven very effective in areas such as image recognition and classification. CNNs have been successful in computer vision related problems like identifying faces, objects and traffic signs apart from powering vision in robots and self driving cars.

CNN is shown to be able to well replicate and optimize these key steps in a unified framework and learn hierarchical representations directly from raw images. If we take a convolutional neural network that has already been trained to recognize objects within images then that network will have developed some internal independent representations of the content and style contained within a given image.

In 2014, the winner of the ImageNet challenge was a network created by Visual Geometry Group (VGG) at Oxford University, as a basis for trying to extract content and style representations from images, naming it after them.

The VGG net where shallow layers learns low level features and as we go deeper into the network these convolutional layers are able to represent much larger scale features and thus have a higher-level representation of the image content.

Neural style transfer can be implemented using any pre-trained convnet. Here we will use the VGG19 and VGG16 networks. VGG19 is a simple variant of the VGG16 network, with three more convolutional layers.

Using multiple convolution layers with smaller convolution kernels instead of a larger convolution layer with convolution kernels can reduce parameters on the one hand, and the author believes that it is equivalent to more non-linear mapping, which increases the Fit expression ability.

For the VGG16 network, every step they applied kernals for 2–3 time and then applied a max pooling layer. In our case, we want to account for features across the entire image so we get rid of the maxpool which throws away information and replace those layers for ones that compute the Average Pooling instead. Each time the number of kernals are doubled from the previous layer meaning that each time we trying to extract more and more features. At the end the of the network three fully connected layers are used to limit the relu activation function grow. Dropout is also implemented for reduce overfitting of model.

 ![VGG16](/ImagenesNotebook/vgg16-architecture.png)

Regarding the VGG19 network, it has 16 convolutions with ReLUs between them and five maxpooling layers which we will also substitute for the Average Pooling. The number of filter maps of the convolutions start at 64 and grow until 512. After the convolutions, there is a linear classifier made-up three fully-connected (fc) layers with dropout (SHK * 14) between them, the first two have 4096 features while the last one has 1000. The last fc layer is connected to a softmax which maps each value to the probabilities of belonging to each of the 1000 classes of the ImageNet competition. 

 ![VGG19](/ImagenesNotebook/vgg19-architecture.png)

### Style Transfer

Style Transfer is a technique of modifying one image in style of another image. We are implementing Gatys style transfer which was originally released in 2015 by Gatys et al. The neural style transfer algorithm has undergone many refinements and spawned many variations. Neural style transfer consists in applying the "style" of a reference image to a target image, while conserving the "content" of the target image:

Style refers to the textures, colors, and visual patterns in an image while the "content" is the higher-level macrostructure of the image. 

The key point behind style transfer is same idea that is core to all deep learning algorithms: we define a loss function to specify what we want to achieve, and we minimize this loss. We want to achieve: conserve the "content" of the original image, while adopting the "style" of the reference image. The theoretical loss function would be the following:

We can construct images whose feature maps at a chosen convolution layer match the corresponding feature maps of a given content image. We expect the two images to contain the same content — but not necessarily the same texture and style.

#### Loss

##### The content loss

Given a chosen content layer l, the content loss is defined as the Mean Squared Error between the feature map F of our content image C and the feature map P of our generated image Y.

When this content-loss is minimized, it means that the mixed-image has feature activation in the given layers that are very similar to the activation of the content-image. Depending on which layers we select, this should transfer the contours from the content-image to the mixed-image.

As you already know, *activations from earlier layers in a network contain local information about the image*, while *activations from higher layers contain increasingly global and abstract information*. Therefore we expect the "content" of an image, which is more global and more abstract, to be captured by the representations of a top layer of a convnet.

##### The style loss

Now we want to measure which features in the style-layers activate simultaneously for the style-image, and then copy this activation-pattern to the mixed-image.

One way of doing this, is to calculate the Gram-matrix(a matrix comprising of correlated features) for the tensors output by the style-layers. The Gram-matrix is essentially just a matrix of dot-products for the vectors of the feature activations of a style-layer. This inner product can be understood as representing a map of the correlations between the features of a layer. These feature correlations capture the statistics of the patterns of a particular spatial scale, which empirically corresponds to the appearance of the textures found at this scale. If an entry in the Gram-matrix has a value close to zero then it means the two features in the given layer do not activate simultaneously for the given style-image. And vice versa, if an entry in the Gram-matrix has a large value, then it means the two features do activate simultaneously for the given style-image. We will then try and create a mixed-image that replicates this activation pattern of the style-image.

Hence the style loss aims at preserving similar internal correlations within the activations of different layers, across the style reference image and the generated image. In turn, this guarantees that the textures found at different spatial scales will look similar across the style reference image and the generated image. The loss function for style is quite similar to out content loss, except that we calculate the Mean Squared Error for the Gram-matrices instead of the raw tensor-outputs from the layers.

### TL:DR

In short, being the content image the one we wish to modify and the style reference image the one we obtain the style from, we can use a pre-trained convnet to define a loss that will:

* Preserve content by maintaining similar high-level layer activations between the target content image and the generated image. The convnet should "see" both the target image and the generated image as "containing the same things".
* Preserve style by maintaining similar correlations within activations for both low-level layers and high-level layers. Indeed, feature correlations capture textures: the generated and the style reference image should share the same textures at different spatial scales.

In [69]:
from __future__ import print_function, division
from builtins import range, input

from keras.layers import Input, Lambda, Dense, Flatten
from keras.layers import AveragePooling2D, MaxPooling2D
from keras.layers.convolutional import Conv2D
from keras.models import Model, Sequential
from keras.applications.vgg16 import VGG16
from keras.applications.vgg19 import VGG19
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg19 import preprocess_input

from keras.preprocessing import image
from skimage.transform import resize
import tensorflow as tf

import keras.backend as K
import numpy as np
import matplotlib.pyplot as plt

from glob import glob
import itertools
from datetime import datetime

import warnings
warnings.filterwarnings("ignore")

from scipy.optimize import fmin_l_bfgs_b
from keras.preprocessing.image import save_img

import time
import imageio
from PIL import Image

tf.compat.v1.disable_eager_execution()

## Functions

In [70]:
def load_img_and_preprocess(path, shape=None):
    img = image.load_img(path, target_size=shape)
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    return x

In [96]:
def unpreprocess(img):
    # Remove zero-center by mean pixel
    img[..., 0] += 103.939
    img[..., 1] += 116.779
    img[..., 2] += 126.68
    # 'BGR'->'RGB'
    img = img[:, :, ::-1]
    img = np.clip(img, 0, 255).astype('uint8')
    return img

In [72]:
def VGG16_AvgPool(shape):
    vgg = VGG16(input_shape=shape, weights='imagenet', include_top=False)
    i = vgg.input
    x = i
    for layer in vgg.layers:
        if layer.__class__ == MaxPooling2D:
        # replace it with average pooling
            x = AveragePooling2D()(x)
        else:
            x = layer(x)

    return Model(i, x)

In [73]:
def VGG19_AvgPool(shape):
    vgg = VGG19(input_shape=shape, weights='imagenet', include_top=False)
    i = vgg.input
    x = i
    for layer in vgg.layers:
        if layer.__class__ == MaxPooling2D:
        # replace it with average pooling
            x = AveragePooling2D()(x)
        else:
            x = layer(x)

    return Model(i, x)

In [74]:
def VGG16_AvgPool_concatenate(tensor):
  # we want to account for features across the entire image
  # so get rid of the maxpool which throws away information
    vgg = VGG16(input_tensor=tensor, weights='imagenet', include_top=False)
    i = vgg.input
    x = i
    for layer in vgg.layers:
        if layer.__class__ == MaxPooling2D:
        # replace it with average pooling
            x = AveragePooling2D()(x)
        else:
            x = layer(x)

    return Model(i, x)

In [75]:
def VGG19_AvgPool_concatenate(tensor):
  # we want to account for features across the entire image
  # so get rid of the maxpool which throws away information
    vgg = VGG19(input_tensor=tensor, weights='imagenet', include_top=False)
    i = vgg.input
    x = i
    for layer in vgg.layers:
        if layer.__class__ == MaxPooling2D:
        # replace it with average pooling
            x = AveragePooling2D()(x)
        else:
            x = layer(x)

    return Model(i, x)

### Loss functions

In [76]:
def content_loss(base, combination):
    return K.sum(K.square(combination - base))

In [77]:
def gram_matrix(img):
    # input is (H, W, C) (C = # feature maps)
    # we first need to convert it to (C, H*W)
    X = K.batch_flatten(K.permute_dimensions(img, (2, 0, 1)))

    # now, calculate the gram matrix
    # gram = XX^T / N
    # the constant is not important since we'll be weighting these
    G = K.dot(X, K.transpose(X)) / img.get_shape().num_elements()
    return G

In [99]:
def style_loss(style, combination):
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3
    size = h * w
    return K.sum(K.square(S - C)) / (4. * (channels ** 2) * (size ** 2))

def total_variation_loss(x):
    a = K.square(
        x[:, :h - 1, :w - 1, :] - x[:, 1:, :w - 1, :])
    b = K.square(
        x[:, :h - 1, :w - 1, :] - x[:, :h - 1, 1:, :])
    return K.sum(K.pow(a + b, 1.25))


def minimize(epochs, batch_shape):
    path = 'C:/Users/USER/Desktop/Comillas/Análisis datos no estructurados/IMAGEN/Neural-Transfer/'
    results_folder = 'Results/'
    results_prefix = 'style_transfer_result'
    t0 = datetime.now()
    losses = []
    x = content_img
    x = x.flatten()
    
    for i in range(epochs):
        print('Start of iteration', i)
        start_time = time.time()
        x, min_val, info = fmin_l_bfgs_b(evaluator.loss, x,
                                     fprime=evaluator.grads, maxfun=5)
        print('Current loss value:', min_val)
        losses.append(min_val)
        # Save current generated image
        img = x.copy().reshape((h, w, 3))
        img = unpreprocess(img)
        fname = path + results_folder + results_prefix + '_at_iteration_%d.png' % i
        imageio.imwrite(fname, img)
        end_time = time.time()
        print('Image saved as', fname)
        print('Iteration %d completed in %ds' % (i, end_time - start_time))
    return img

### Scale Image

In [79]:
def scale_img(x):
    x = x - x.min()
    x = x / x.max()
    return x

## Code

In [80]:
# folder names with content and style images
style_path = 'StyleImages'
content_path = 'ContentImages'

In [81]:
# to get multiple files
style_files = glob(style_path + '/*.jp*g')
content_files = glob(content_path + '/*.jp*g')

In [82]:
cont_img = np.random.choice(content_files)
sty_img = np.random.choice(style_files)

In [83]:
# load content image
content_img = load_img_and_preprocess(cont_img)

# height and width for the generated picture
h, w = content_img.shape[1:3]

In [84]:
# load style image
style_img = load_img_and_preprocess(sty_img, (h, w))

# we define batch_shape and shape of the image(?)
batch_shape = content_img.shape
shape = content_img.shape[1:]

In [85]:
# content and style images are static so we use K.constant
content_image = K.constant(content_img)
style_image = K.constant(style_img)

# placeholder that will contain our generated image
generated_image = K.placeholder((1, h, w, 3))

# we combine the 3 images into a single batch
input_tensor = K.concatenate([content_image,
                              style_image,
                              generated_image], axis=0)

# We build the VGG19 network with our batch of 3 images as input.
# The model will be loaded with pre-trained ImageNet weights.
model = VGG19_AvgPool_concatenate(input_tensor)
print('Model loaded.')

Note that input tensors are instantiated via `tensor = tf.keras.Input(shape)`.
The tensor that caused the issue was: Tensor("concat_3:0", shape=(3, 480, 910, 3), dtype=float32)
Model loaded.


In [86]:
# Dict mapping layer names to activation tensors
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])
# Name of layer used for content loss
content_layer = 'block5_conv2'
# Name of layers used for style loss
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1']
# Weights in the weighted average of the loss components
total_variation_weight = 1e-4
style_weight = 1.
content_weight = 0.025

In [87]:
# Define the loss by adding all components to a `loss` variable
loss = K.variable(0.)
# Loss of content image
layer_features = outputs_dict[content_layer]
content_image_features = layer_features[0, :, :, :]
generated_features = layer_features[2, :, :, :]
loss = loss + (content_weight * content_loss(content_image_features,
                                      generated_features))
# Loss of style image
for layer_name in style_layers:
    layer_features = outputs_dict[layer_name]
    style_image_features = layer_features[1, :, :, :]
    generated_features = layer_features[2, :, :, :]
    sl = style_loss(style_image_features, generated_features)
    loss += (style_weight / len(style_layers)) * sl

# Total loss
loss += total_variation_weight * total_variation_loss(generated_image)

In [None]:
loss

In [88]:
# Finally, we set up the gradient descent process. using the L-BFGS algorithm

The L-BFGS algorithms comes packaged with SciPy. However, there are two slight limitations with the SciPy implementation:

* It requires to be passed the value of the loss function and the value of the gradients as two separate functions.
* It can only be applied to flat vectors, whereas we have a 3D image array.

It would be very inefficient for us to compute the value of the loss function and the value of gradients independently, since it would lead to a lot of redundant computation between the two. We would be almost twice slower than we could be by computing them jointly. To by-pass this, we set up a Python class named Evaluator that will compute both loss value and gradients value at once, will return the loss value when called the first time, and will cache the gradients for the next call.

In [89]:
# Get the gradients of the generated image wrt the loss
grads = K.gradients(loss, generated_image)[0]

# Function to fetch the values of the current loss and the current gradients
fetch_loss_and_grads = K.function([generated_image], [loss, grads])


class Evaluator(object):

    def __init__(self):
        self.loss_value = None
        self.grads_values = None

    def loss(self, x):
        assert self.loss_value is None
        x = x.reshape((1, h, w, 3))
        outs = fetch_loss_and_grads([x])
        loss_value = outs[0]
        grad_values = outs[1].flatten().astype('float64')
        self.loss_value = loss_value
        self.grad_values = grad_values
        return self.loss_value

    def grads(self, x):
        assert self.loss_value is not None
        grad_values = np.copy(self.grad_values)
        self.loss_value = None
        self.grad_values = None
        return grad_values

evaluator = Evaluator()

In [90]:
import imageio

In [100]:
final_img = minimize(5, batch_shape)

Start of iteration 0
Current loss value: 1509184.0
Image saved as C:/Users/USER/Desktop/Comillas/Análisis datos no estructurados/IMAGEN/Neural-Transfer/Results/style_transfer_result_at_iteration_0.png
Iteration 0 completed in 37s
Start of iteration 1
Current loss value: 1502255.9
Image saved as C:/Users/USER/Desktop/Comillas/Análisis datos no estructurados/IMAGEN/Neural-Transfer/Results/style_transfer_result_at_iteration_1.png
Iteration 1 completed in 37s
Start of iteration 2
Current loss value: 1497775.2
Image saved as C:/Users/USER/Desktop/Comillas/Análisis datos no estructurados/IMAGEN/Neural-Transfer/Results/style_transfer_result_at_iteration_2.png
Iteration 2 completed in 37s
Start of iteration 3
Current loss value: 1494187.2
Image saved as C:/Users/USER/Desktop/Comillas/Análisis datos no estructurados/IMAGEN/Neural-Transfer/Results/style_transfer_result_at_iteration_3.png
Iteration 3 completed in 37s
Start of iteration 4


KeyboardInterrupt: 

In [40]:
def get_loss_and_grads_wrapper(x_vec):
    l, g = get_loss_and_grads([x_vec.reshape(*batch_shape)])
    return l.astype(np.float64), g.flatten().astype(np.float64)

In [41]:
def get_loss_and_grads_wrapper_vgg19(x_vec):
    l, g = get_loss_and_grads_vgg19([x_vec.reshape(*batch_shape)])
    return l.astype(np.float64), g.flatten().astype(np.float64)

In [None]:
final_img = minimize(get_loss_and_grads_wrapper, 10, batch_shape)

iter=0, loss=3075.550048828125
Iteration 0 completed in 191s


In [None]:
final_img_vgg19 = minimize(get_loss_and_grads_wrapper_vgg19, 10, batch_shape)

# Plot the images

In [None]:
# Content image
plt.imshow(image.load_img(c_img))
plt.figure()

# Style image
plt.imshow(image.load_img(s_img))
plt.figure()

# Generated image VGG16
plt.imshow(scale_img(final_img))
plt.show()

# Generated image VGG16
plt.imshow(scale_img(final_img_vgg19))
plt.show()