# Neural Style GAN

## Introduction

StyleGAN has been introduced by [Leon Gatys et al.](https://www.researchgate.net/publication/281312423_A_Neural_Algorithm_of_Artistic_Style) in 2015. The neural style transfer algorithm has undergone many refinements and spawned many variations since its original introduction. It consists of applying the style of a reference image to a target image while conserving the content of the target image.

![exemple](example.jpg)

In this context, style essentially means *textures*, *colors*, and *visual patterns* in the image, at various *spatial scales*, and the content is the **higher-level macrostructure of the image**.

We want to conserve the content of the original image while adopting the style of the reference image. If we were able to mathematically define content and style, then an appropriate loss function to minimize would be the following:

`loss = (distance(style(reference_image) - style(combination_image)) + distance(content(original_image) - content(combination_image)))`

Here, distance is a norm function such as the $L2$ norm, content is a function that takes an image and computes a representation of its content, and style is a function that takes an image and computes a representation of its style. Minimizing this loss causes `style(combination_image)` to be close to `style(reference_image)`, and `content(combination_image)` is close to `content(original_image)`, thus achieving
style transfer as we defined it.

## The content loss

As you already know, activations from earlier layers in a network contain local information about the image, whereas activations from higher layers contain increasingly global, abstract information. Formulated in a different way, the activations of the different layers of a convnet provide a decomposition of the contents of an image over different spatial scales. Therefore, you’d expect the content of an image, which is more global and abstract, to be captured by the representations of the upper layers in a convnet. A good candidate for content loss is thus the L2 norm between the activations of an upper layer in a pretrained convnet, computed over the target image, and the activations of the same layer computed over the generated image. This guarantees that, as seen from the upper layer, the generated image will look similar to the original target image. Assuming that what the upper layers of a convnet see is really the content of their input images, this works as a way to preserve image content.

## The style loss

The content loss only uses a single upper layer, but the style loss as defined by Gatys et al. uses multiple layers of a convnet: you try to capture the appearance of the style-reference image at all spatial scales extracted by the convnet, not just a single scale.

For the style loss, Gatys et al. use the Gram matrix of a layer’s activations: the inner product of the feature maps of a given layer. This inner product can be understood as representing a map of the correlations between the layer’s features. These feature correlations capture the statistics of the patterns of a particular spatial scale, which empirically correspond to the appearance of the textures found at this scale. Hence, the style loss aims to preserve similar internal correlations within the activations of different layers, across the style-reference image and the generated image. In turn, this guarantees that the textures found at different spatial scales look similar across the style-reference image and the generated image.

In short, you can use a pretrained convnet to define a loss that will do the following:
* Preserve content by maintaining similar high-level layer activations between the original image and the generated image. The convnet should “see” both the original image and the generated image as containing the same things.
* Preserve style by maintaining similar correlations within activations for both low-level layers and high-level layers. Feature correlations capture textures: the generated image and the style-reference image should share the same textures at different spatial scales.

## Neural style transfer in Keras

Neural style transfer can be implemented using any pretrained convnet. Here, we’ll use the VGG19 network used by Gatys et al. Here’s the general process:
* Set up a network that computes VGG19 layer activations for the style-reference image, the base image, and the generated image at the same time.
* Use the layer activations computed over these three images to define the loss function described earlier, which we’ll minimize in order to achieve style transfer.
* Set up a gradient-descent process to minimize this loss function.

In [1]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import warnings

warnings.filterwarnings("ignore")

ModuleNotFoundError: No module named 'tensorflow'

In [4]:
base_image_path = "./random_face_1.png"
style_reference_image_path = "./style_klimt.jpg"

In [5]:
original_width, original_height = keras.utils.load_img(base_image_path).size
img_height = 700
img_width = round(original_width * img_height / original_height)

In [4]:
def preprocess_image(image_path):
    img = keras.utils.load_img(image_path, 
                               target_size=(img_height, img_width))
    img = keras.utils.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = keras.applications.vgg19.preprocess_input(img)
    return img

In [5]:
def deprocess_image(img):
    img = img.reshape((img_height, img_width, 3))
    # Zero-centering by removing the mean pixel value from imagenet. 
    # This reverses a transformation done by vgg19.preprocess_input
    img[:, :, 0] += 103.939
    img[:, :, 1] += 116.779
    img[:, :, 2] += 123.68
    # Converts images from "BGR" to "RGB". 
    # This is also part of the revsersal vgg19.preprocess_input
    img = img[:, :, ::-1]
    img = np.clip(img, 0, 255).astype("uint8")
    return img

In [6]:
model = keras.applications.vgg19.VGG19(weights="imagenet", include_top=False)
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])
feature_extractor = keras.Model(inputs=model.inputs, outputs=outputs_dict)

In [7]:
def content_loss(base_img, combination_img):
    return tf.reduce_sum(tf.square(combination_img - base_img))

In [8]:
def gram_matrix(x):
    x = tf.transpose(x, (2, 0, 1))
    features = tf.reshape(x, (tf.shape(x)[0], -1))
    gram = tf.matmul(features, tf.transpose(features))
    return gram

def style_loss(style_img, combination_img):
    S = gram_matrix(style_img)
    C = gram_matrix(combination_img)
    channels = 3
    size = img_height * img_width
    return tf.reduce_sum(tf.square(S - C)) / (4.0 * (channels ** 2) * (size ** 2))

In [9]:
def total_variation_loss(x):
    a = tf.square(
        x[:, : img_height - 1, : img_width - 1, :] - x[:, 1:, : img_width - 1, :]
    )
    b = tf.square(
        x[:, : img_height - 1, : img_width - 1, :] - x[:, : img_height - 1, 1:, :]
    )
    return tf.reduce_sum(tf.pow(a + b, 1.25))

In [10]:
style_layer_names = ["block1_conv1",
                     "block2_conv1",
                     "block3_conv1",
                     "block4_conv1",
                     "block5_conv1",]

content_layer_name = "block5_conv2"

total_variation_weight = 1e-6
style_weight = 1e-6
content_weight = 2.5e-8

In [11]:
def compute_loss(combination_image, base_image, style_reference_image):
    input_tensor = tf.concat([base_image, 
                              style_reference_image, 
                              combination_image], 
                             axis=0)
    features = feature_extractor(input_tensor)
    loss = tf.zeros(shape=())
    layer_features = features[content_layer_name]
    base_image_features = layer_features[0, :, :, :]
    combination_features = layer_features[2, :, :, :]
    loss = loss + content_weight * content_loss(base_image_features, 
                                                combination_features)
    for layer_name in style_layer_names:
        layer_features = features[layer_name]
        style_reference_features = layer_features[1, :, :, :]
        combination_features = layer_features[2, :, :, :]
        style_loss_value = style_loss(style_reference_features, 
                                      combination_features)
        loss += (style_weight / len(style_layer_names)) * style_loss_value

    loss += total_variation_weight * total_variation_loss(combination_image)
    return loss

In [12]:
@tf.function
def compute_loss_and_grads(
    combination_image, base_image, style_reference_image):
    with tf.GradientTape() as tape:
        loss = compute_loss(combination_image, 
                            base_image, 
                            style_reference_image)
    grads = tape.gradient(loss, combination_image)
    return loss, grads

In [13]:
optimizer = keras.optimizers.SGD(
    keras.optimizers.schedules.ExponentialDecay(
        initial_learning_rate=100.0, 
        decay_steps=100, 
        decay_rate=0.96)
)

In [14]:
base_image = preprocess_image(base_image_path)
style_reference_image = preprocess_image(style_reference_image_path)
combination_image = tf.Variable(preprocess_image(base_image_path))

In [15]:
iterations = 2000
with tf.device("/cpu:0"):
    for i in range(1, iterations + 1):
        loss, grads = compute_loss_and_grads(
            combination_image, 
            base_image, 
            style_reference_image
        )
        optimizer.apply_gradients([(grads, combination_image)])
        if i % 100 == 0:
            print(f"Iteration {i}: loss={loss:.2f}")
            img = deprocess_image(combination_image.numpy())
            fname = f"combination_image_at_iteration_{i}.png"
            keras.utils.save_img(fname, img)

Iteration 100: loss=2154.66
Iteration 200: loss=1560.62
Iteration 300: loss=1312.12
Iteration 400: loss=1168.43
Iteration 500: loss=1072.89
Iteration 600: loss=1003.91
Iteration 700: loss=951.24
Iteration 800: loss=909.61
Iteration 900: loss=875.64
Iteration 1000: loss=847.34
Iteration 1100: loss=823.39
Iteration 1200: loss=802.88
Iteration 1300: loss=785.01
Iteration 1400: loss=769.34
Iteration 1500: loss=755.41
Iteration 1600: loss=742.97
Iteration 1700: loss=731.80
Iteration 1800: loss=721.77
Iteration 1900: loss=712.65
Iteration 2000: loss=704.34
