# Neural Style Transfer


---


In this Colab, we will be implementing neural style transfer to recreate a given image using the style properties of another. By utilizing the [VGG19](https://arxiv.org/abs/1409.1556) model to extract features from specific layers, we can calculate style and content losses to further improve our new generating image.

![style](images/style.jpg)

# Imports and Data Preprocessing

Importing the necessary libraries before continuing.

In [None]:
import os
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display as display_fn
from keras import backend as K
from IPython.display import Image, clear_output

 We define utility functions used in loading, displaying and preprocessing our style + content images before they can be inputted into our model.

In [None]:
def load_img(img_path, max_dim = 512):
  image = tf.io.read_file(img_path)
  image = tf.io.decode_jpeg(image)
  image = tf.image.convert_image_dtype(image, tf.float32)

  img_shape = tf.cast(tf.shape(image)[:-1], tf.float32)
  long_dim = max(img_shape)
  scale_factor = max_dim / long_dim

  new_shape = tf.cast(img_shape * scale_factor, dtype=tf.int32)
  image = tf.image.resize(image, new_shape)
  image = image[tf.newaxis, :]
  image = tf.image.convert_image_dtype(image, tf.uint8)

  return image

def display_imgs(images, titles=[]):
  plt.figure(figsize=(18,10))
  for idx, (image, title) in enumerate(zip(images,titles)):
    plt.subplot(1, len(images), idx + 1)
    plt.axis('off')
    if(len(tf.shape(image)) > 3): image = tf.squeeze(image)
    plt.imshow(image)
    plt.title(title)

def preprocess_img(image):
  image = tf.cast(image, tf.float32)
  image = tf.keras.applications.vgg19.preprocess_input(image)

  return image

# Downloading Images

The cells below will download a puppy dog image to use as our content and Van Goghs famous Starry Night painting to style with.


In [None]:
img_dir = 'images'
if not os.path.exists(img_dir): os.makedirs(img_dir)

!wget -q -O ./images/dog.jpg https://cdn.pixabay.com/photo/2020/06/30/22/34/dog-5357794__340.jpg
!wget -q -O ./images/night.jpg https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg/1024px-Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg

content_path = f'{img_dir}/dog.jpg'
style_path = f'{img_dir}/night.jpg'

In [None]:
content_img = load_img(content_path); style_img = load_img(style_path)
display_imgs([content_img, style_img], ["content img", "style img"])

# Creating the Model

Below is an image displaying the architecture of the VGG-19 model we are using for feature extraction. We will need to select certain layers to output from that will be used to compute the losses for the image. Thus we will drop the fully-connected layers at the end for our purposes.

![vgg](images/vgg.png)


Due to the nature of convolutional networks, it makes sense to choose a layer deeper in the network for the content as they are able to learn more complex high level features. The style layers have been chosen as the first layer in each convolutional block through experimentation.

In [None]:
content_layers = ['block5_conv2']
style_layers = ['block1_conv1', 'block2_conv1', 'block3_conv1', 'block4_conv1','block5_conv1']

combined_layers = style_layers + content_layers
num_style_layers = len(style_layers); num_content_layers = len(content_layers)

Now we can define and instantiate our model using the weight-frozen VGG-19 model, but only output from the chosen content and style layers.

In [None]:
def vgg_model(layers):
  vgg19 = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
  vgg19.trainable = False

  output_layers = [vgg19.get_layer(layer).output for layer in layers]
  model = tf.keras.Model(inputs=vgg19.input, outputs=output_layers)

  return model

vgg = vgg_model(combined_layers)

# Loss Functions and Features

Next we can define the loss functions associated with improving the generation of our mixed image. Here we can see the style loss is the reduce mean of the square of the element-wise subtraction between the features and targets, the content loss is the reduce sum instead.

In [None]:
def get_style_loss(features, targets):
  style_loss = tf.reduce_mean(tf.square(features - targets))

  return style_loss

def get_content_loss(feature, target):
  content_loss = tf.reduce_sum(tf.square(feature - target))

  return content_loss

The paper on neural style transfer suggests representing the style feature maps as gram matrices which can be done using `tf.linalg.einsum`.

In [None]:
def gram_matrix(input):
  matrix = tf.linalg.einsum('bijc,bijd->bcd', input, input)
  average_factor = tf.cast(tf.shape(input)[1] * tf.shape(input)[2], tf.float32)
  gram_matrix = matrix / average_factor

  return gram_matrix

The two functions below will return the content and gram matrix style  feature maps for an image that will be passed to our loss functions.

In [None]:
def style_img_features(image):
  preprocessed_image = preprocess_img(image)
  model_output = vgg(preprocessed_image)
  style_features = model_output[0:num_style_layers]
  gram_style_features = [gram_matrix(style_feature) for style_feature in style_features]

  return gram_style_features

def content_img_features(image):
  preprocessed_image = preprocess_img(image)
  model_output = vgg(preprocessed_image)
  content_features = model_output[num_style_layers:]

  return content_features

The total loss is a weighted summation of the style and content loss together. We scale their individual losses based on the # of layers.

In [None]:
def calculate_total_loss(style_features, content_features, style_targets, content_targets, style_weight, content_weight):
  style_loss = tf.math.add_n([get_style_loss(feature, target) for (feature, target) in zip(style_features, style_targets)])
  content_loss = tf.math.add_n([get_content_loss(feature, target) for (feature, target) in zip(content_features, content_targets)])

  style_loss = style_loss * style_weight / num_style_layers
  content_loss = content_loss * content_weight / num_content_layers
  total_loss = style_loss + content_loss

  return total_loss

# Compile and Fitting

We are now ready to put the finishing touches to our model. The function below defines a singular training step in which we provide it our style and content targets/weights, calculate the gradients w.r.t the generated image using the loss and make updates to reduce it using an optimizer.

In [None]:
def train_step(image, style_targets, content_targets, style_weight, content_weight, optimizer):
  with tf.GradientTape() as tape:
    style_features = style_img_features(image)
    content_features = content_img_features(image)

    loss = calculate_total_loss(style_features, content_features, style_targets, content_targets, style_weight, content_weight)

  gradients = tape.gradient(loss, image)
  optimizer.apply_gradients([(gradients, image)])
  image.assign(tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=255.0))

Now we can loop the previously defined training step over a number of epochs and steps to generate, update and construct the final image.

In [None]:
def model_fit(style_image, content_image, optimizer, epochs, steps_per_epoch, style_weight=1e-2, content_weight=1e-4):
  style_targets = style_img_features(style_image)
  content_targets = content_img_features(content_image)
  generated_image = tf.Variable(tf.cast(content_image, dtype=tf.float32))

  for i in range(epochs):
    for j in range(steps_per_epoch):
      train_step(generated_image, style_targets, content_targets, style_weight, content_weight, optimizer)

    print(f"Epoch {i}/{epochs}")

  generated_image = tf.cast(generated_image, dtype=tf.uint8)

  return generated_image

All thats left to do now is run the model_fit loop which should take ~8 minutes to complete with hardware acceleration and display the results!

In [None]:
style_weight = 3e-2; content_weight = 1e-2
opt = tf.optimizers.Adam(
    tf.keras.optimizers.schedules.ExponentialDecay(
        initial_learning_rate=20.0, decay_steps=100, decay_rate=0.5
    )
)

neural_style_transfer = model_fit(style_img, content_img, opt, 10, 100, style_weight, content_weight)

In [None]:
display_imgs([content_img, style_img, neural_style_transfer], ["content image", "style image", "final image"])