# Neural Style Transfer with tf.keras

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/models/blob/master/research/nst_blogpost/4_Neural_Style_Transfer_with_Eager_Execution.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/models/blob/master/research/nst_blogpost/4_Neural_Style_Transfer_with_Eager_Execution.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

## Problem Statement

The aim of this assignment is to create a deep learning model capable of adapting an existing work to resemble the aesthetic of any art. The model should be able to analyze the artistic style of the selected art and apply similar stylistic features to a new, original artwork, creating a piece that seems as though it could have been created by the artist themselves.


##Approach
Our work is based on  [Image Style Transfer Using Convolutional Neural Networks](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf) by Leon Gatys, Alexander Ecker, and Matthias Bethge.

We used a VGG19 model, pretrained on Imagenet dataset, for transfering style of one image to other.

## Setup- Download and Resize Images

Downloading images of some natural landscapes of mountains and rivers

In [None]:
import os
img_dir = '/tmp/nst'
if not os.path.exists(img_dir):
    os.makedirs(img_dir)
style_img_dir= '/tmp/nst/style'
if not os.path.exists(style_img_dir):
    os.makedirs(style_img_dir)
!wget --quiet -P /tmp/nst/style https://img.freepik.com/free-vector/watercolor-lake-scenery_23-2149159406.jpg   #style image
!wget --quiet -P /tmp/nst/ https://img.freepik.com/free-photo/cascade-boat-clean-china-natural-rural_1417-1356.jpg
!wget --quiet -P /tmp/nst/ https://img.freepik.com/free-photo/nature-chalal-trek-trail-sosan-india_181624-29503.jpg
!wget --quiet -P /tmp/nst/ https://img.freepik.com/free-photo/landscape-lake-surrounded-by-mountains_23-2148215266.jpg
!wget --quiet -P /tmp/nst/ https://img.freepik.com/free-photo/river-flowing-through-trees-mountains-scotland_181624-24054.jpg
!wget --quiet -P /tmp/nst/ https://img.freepik.com/free-photo/shallow-stream-midst-alpine-trees-rolling-hills-mountain_181624-14513.jpg

In [None]:
import os
from PIL import Image

#function to resize the downloaded images to 512x512 for our style transfer function to work
def resize_images(input_folder, output_folder, size):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    for filename in os.listdir(input_folder):
        if filename.endswith(('.jpg', '.png', '.jpeg')):  # Adjust for other image formats if needed
            input_path = os.path.join(input_folder, filename)
            output_path = os.path.join(output_folder, filename)

            original_image = Image.open(input_path)
            resized_image = original_image.resize(size)
            resized_image.save(output_path)


input_directory = "/tmp/nst/"
output_directory = "tmp/nst/resized/"  #output folder path
if not os.path.exists(output_directory):
    os.makedirs(output_directory)
target_size = (512, 512)

resize_images(input_directory, output_directory, target_size)
resize_images(style_img_dir, 'tmp/nst/resized/style_img', target_size)

### Import and configure modules

In [None]:
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (10,10)
mpl.rcParams['axes.grid'] = False

import numpy as np
from PIL import Image
import time
import functools
from tensorflow.keras.preprocessing import image
import numpy as np
import tensorflow as tf

from tensorflow.keras.utils import image_dataset_from_directory as kp_image
from tensorflow.python.keras import models
from tensorflow.python.keras import losses
from tensorflow.python.keras import layers
from tensorflow.python.keras import backend as K

## Visualize the input
The following functions are to load the image and display it

In [None]:
def load_img(path_to_img):
    max_dim = 512
    img = Image.open(path_to_img)
    long = max(img.size)
    scale = max_dim / long
    img = img.resize((round(img.size[0] * scale), round(img.size[1] * scale)), Image.ANTIALIAS)

    img = image.img_to_array(img)  # Use image.img_to_array from Keras

    # We need to broadcast the image array such that it has a batch dimension
    img = np.expand_dims(img, axis=0)
    return img

def imshow(img, title=None):
    # Remove the batch dimension
    out = np.squeeze(img, axis=0)
    # Normalize for display
    out = out.astype('uint8')
    plt.imshow(out)
    if title is not None:
        plt.title(title)


## Creating required functions
The following functions are created to allow us to load and preprocess images easily. We performed the same preprocessing process as are expected according to the VGG training process. VGG networks are trained on image with each channel normalized by `mean = [103.939, 116.779, 123.68]`and with channels BGR.

In [None]:
def load_and_process_img(path_to_img):
  img = load_img(path_to_img)
  img = tf.keras.applications.vgg19.preprocess_input(img)
  return img

###Inverse preprocessing
In order to view the outputs of our optimization, we are required to perform the inverse preprocessing step. Furthermore, since our optimized image may take its values anywhere between $- \infty$ and $\infty$, we must clip to maintain our values from within the 0-255 range.   

In [None]:
def deprocess_img(processed_img):
  x = processed_img.copy()
  if len(x.shape) == 4:
    x = np.squeeze(x, 0)
  assert len(x.shape) == 3, ("Input to deprocess image must be an image of "
                             "dimension [1, height, width, channel] or [height, width, channel]")
  if len(x.shape) != 3:
    raise ValueError("Invalid input to deprocessing image")

  # perform the inverse of the preprocessing step
  x[:, :, 0] += 103.939
  x[:, :, 1] += 116.779
  x[:, :, 2] += 123.68
  x = x[:, :, ::-1]

  x = np.clip(x, 0, 255).astype('uint8')
  return x

### Define content and style representations
In neural style transfer, the layers in a pre-trained model like VGG19 help us understand both the content and style of images.

### Content Representation:
These layers help identify the content within an image. They break down the picture into different parts, recognizing objects and their details.

### Style Representation:
They also capture the image's artistic style by recognizing patterns, textures, and colors.

This understanding is possible because these layers in the model are trained to recognize various aspects of images, allowing us to separate content from style.


Specifically we’ll pull out these intermediate layers from our network:


In [None]:
# Content layer where will pull our feature maps
content_layers = ['block5_conv2']

# Style layer we are interested in
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1'
               ]

num_content_layers = len(content_layers)
num_style_layers = len(style_layers)

## Build the Model
In this case, we load [VGG19](https://keras.io/applications/#vgg19), and feed in our input tensor to the model. This will allow us to extract the feature maps (and subsequently the content and style representations) of the content, style, and generated images.

We use VGG19, as suggested in the paper. In addition, since VGG19 is a relatively simple model (compared with ResNet, Inception, etc) the feature maps actually work better for style transfer.

In order to access the intermediate layers corresponding to our style and content feature maps, we get the corresponding outputs and using the Keras [**Functional API**](https://keras.io/getting-started/functional-api-guide/), we define our model with the desired output activations.

With the Functional API defining a model simply involves defining the input and output:

`model = Model(inputs, outputs)`

In [None]:
def get_model():
  """ Creates our model with access to intermediate layers.

  This function will load the VGG19 model and access the intermediate layers.
  These layers will then be used to create a new model that will take input image
  and return the outputs from these intermediate layers from the VGG model.

  Returns:
    returns a keras model that takes image inputs and outputs the style and
      content intermediate layers.
  """
  # Load our model. We load pretrained VGG, trained on imagenet data
  vgg = tf.keras.applications.vgg19.VGG19(include_top=False, weights='imagenet')
  vgg.trainable = False
  # Get output layers corresponding to style and content layers
  style_outputs = [vgg.get_layer(name).output for name in style_layers]
  content_outputs = [vgg.get_layer(name).output for name in content_layers]
  model_outputs = style_outputs + content_outputs
  # Build model
  return models.Model(vgg.input, model_outputs)

In the above code snippet, we have loaded our pretrained image classification network. Then we grabbed the layers of interest. Then we defined a Model by setting the model’s inputs to an image and the outputs to the outputs of the style and content layers. In other words, we created a model that will take an input image and output the content and style intermediate layers!


## Define and create our loss functions (content and style distances)

### Content Loss

The content loss is determined by measuring the difference between the feature representations of two images at specific layers in a neural network.

To find this loss, we first pass both the desired content image and the image we're working on (our "output image") through the network. Then, we compare their intermediate feature representations at predefined layers. The content loss is computed by squaring the differences between these features at those layers and summing up these differences across all the features, i.e. by measuring their Euclidean distance. This quantifies how dissimilar the content is between the two images at those specific layers. The aim is to minimize this loss during the image generation process, making the output image resemble the original content more closely at those layers.



### Computing content loss
We actually add our content losses at each desired layer. This way, each iteration when we feed our input image through the model (which in eager is simply `model(input_image)`!) all the content losses through the model is properly computed and because we are executing eagerly, all the gradients are computed.

In [None]:
def get_content_loss(base_content, target):
  return tf.reduce_mean(tf.square(base_content - target))

## Style Loss

Computing style loss is a bit more involved, but follows the same principle, this time feeding our network the base input image and the style image. However, instead of comparing the raw intermediate outputs of the base input image and the style image, we instead compare the Gram matrices of the two outputs.

Mathematically, we describe the style loss of the base input image, $x$, and the style image, $a$, as the distance between the style representation (the gram matrices) of these images. We describe the style representation of an image as the correlation between different filter responses given by the Gram matrix  $G^l$, where $G^l_{ij}$ is the inner product between the vectorized feature map $i$ and $j$ in layer $l$. We can see that $G^l_{ij}$ generated over the feature map for a given image represents the correlation between feature maps $i$ and $j$.

To generate a style for our base input image, we perform gradient descent from the content image to transform it into an image that matches the style representation of the original image. We do so by minimizing the mean squared distance between the feature correlation map of the style image and the input image. The contribution of each layer to the total style loss is described by
$$E_l = \frac{1}{4N_l^2M_l^2} \sum_{i,j}(G^l_{ij} - A^l_{ij})^2$$

where $G^l_{ij}$ and $A^l_{ij}$ are the respective style representation in layer $l$ of $x$ and $a$. $N_l$ describes the number of feature maps, each of size $M_l = height * width$. Thus, the total style loss across each layer is
$$L_{style}(a, x) = \sum_{l \in L} w_l E_l$$
where we weight the contribution of each layer's loss by some factor $w_l$. In our case, we weight each layer equally ($w_l =\frac{1}{|L|}$)

### Computing style loss
Again, we implement our loss as a distance metric .

In [None]:
def gram_matrix(input_tensor):
  # We make the image channels first
  channels = int(input_tensor.shape[-1])
  a = tf.reshape(input_tensor, [-1, channels])
  n = tf.shape(a)[0]
  gram = tf.matmul(a, a, transpose_a=True)
  return gram / tf.cast(n, tf.float32)

def get_style_loss(base_style, gram_target):
  """Expects two images of dimension h, w, c"""
  # height, width, num filters of each layer
  # We scale the loss at a given layer by the size of the feature map and the number of filters
  height, width, channels = base_style.get_shape().as_list()
  gram_style = gram_matrix(base_style)

  return tf.reduce_mean(tf.square(gram_style - gram_target))# / (4. * (channels ** 2) * (width * height) ** 2)

## Apply style transfer to our images


### Run Gradient Descent
In this case, we used the [Adam](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam)* optimizer in order to minimize our loss. We iteratively update our output image such that it minimizes our loss: we don't update the weights associated with our network, but instead we train our input image to minimize loss. In order to do this, we must know how we calculated our loss and gradients.

We also defined a little helper function that loads our content and style image, feeds them forward through our network, which then outputs the content and style feature representations from our model.

In [None]:
def get_feature_representations(model, content_path, style_path):
  """Helper function to compute our content and style feature representations.

  This function will simply load and preprocess both the content and style
  images from their path. Then it will feed them through the network to obtain
  the outputs of the intermediate layers.

  Arguments:
    model: The model that we are using.
    content_path: The path to the content image.
    style_path: The path to the style image

  Returns:
    returns the style features and the content features.
  """
  # Load our images in
  content_image = load_and_process_img(content_path)
  style_image = load_and_process_img(style_path)

  # batch compute content and style features
  style_outputs = model(style_image)
  content_outputs = model(content_image)


  # Get the style and content feature representations from our model
  style_features = [style_layer[0] for style_layer in style_outputs[:num_style_layers]]
  content_features = [content_layer[0] for content_layer in content_outputs[num_style_layers:]]
  return style_features, content_features

### Computing the loss and gradients
Here we use [**tf.GradientTape**](https://www.tensorflow.org/programmers_guide/eager#computing_gradients) to compute the gradient. It allows us to take advantage of the automatic differentiation available by tracing operations for computing the gradient later. It records the operations during the forward pass and then is able to compute the gradient of our loss function with respect to our input image for the backwards pass.

In [None]:
def compute_loss(model, loss_weights, init_image, gram_style_features, content_features):
  """This function will compute the loss total loss.

  Arguments:
    model: The model that will give us access to the intermediate layers
    loss_weights: The weights of each contribution of each loss function.
      (style weight, content weight, and total variation weight)
    init_image: Our initial base image. This image is what we are updating with
      our optimization process. We apply the gradients wrt the loss we are
      calculating to this image.
    gram_style_features: Precomputed gram matrices corresponding to the
      defined style layers of interest.
    content_features: Precomputed outputs from defined content layers of
      interest.

  Returns:
    returns the total loss, style loss, content loss, and total variational loss
  """
  style_weight, content_weight = loss_weights

  # Feed our init image through our model. This will give us the content and
  # style representations at our desired layers. Since we're using eager
  # our model is callable just like any other function!
  model_outputs = model(init_image)

  style_output_features = model_outputs[:num_style_layers]
  content_output_features = model_outputs[num_style_layers:]

  style_score = 0
  content_score = 0

  # Accumulate style losses from all layers
  # Here, we equally weight each contribution of each loss layer
  weight_per_style_layer = 1.0 / float(num_style_layers)
  for target_style, comb_style in zip(gram_style_features, style_output_features):
    style_score += weight_per_style_layer * get_style_loss(comb_style[0], target_style)

  # Accumulate content losses from all layers
  weight_per_content_layer = 1.0 / float(num_content_layers)
  for target_content, comb_content in zip(content_features, content_output_features):
    content_score += weight_per_content_layer* get_content_loss(comb_content[0], target_content)

  style_score *= style_weight
  content_score *= content_weight

  # Get total loss
  loss = style_score + content_score
  return loss, style_score, content_score

Then computing the gradients is easy:

In [None]:
def compute_grads(cfg):
  with tf.GradientTape() as tape:
    all_loss = compute_loss(**cfg)
  # Compute gradients wrt input image
  total_loss = all_loss[0]
  return tape.gradient(total_loss, cfg['init_image']), all_loss

### Optimization loop

In [None]:
import IPython.display

def run_style_transfer(content_path,
                       style_path,
                       num_iterations=1000,
                       content_weight=1e3,
                       style_weight=1e-2):
  # We don't need to (or want to) train any layers of our model, so we set their
  # trainable to false.
  model = get_model()
  for layer in model.layers:
    layer.trainable = False

  # Get the style and content feature representations (from our specified intermediate layers)
  style_features, content_features = get_feature_representations(model, content_path, style_path)
  gram_style_features = [gram_matrix(style_feature) for style_feature in style_features]

  # Set initial image
  init_image = load_and_process_img(content_path)
  init_image = tf.Variable(init_image, dtype=tf.float32)
  # Create our optimizer
  opt = tf.optimizers.Adam(learning_rate=5, epsilon=1e-1)

  # For displaying intermediate images
  iter_count = 1

  # Store our best result
  best_loss, best_img = float('inf'), None

  # Create a nice config
  loss_weights = (style_weight, content_weight)
  cfg = {
      'model': model,
      'loss_weights': loss_weights,
      'init_image': init_image,
      'gram_style_features': gram_style_features,
      'content_features': content_features
  }

  # For displaying
  num_rows = 2
  num_cols = 5
  display_interval = num_iterations/(num_rows*num_cols)
  start_time = time.time()
  global_start = time.time()

  norm_means = np.array([103.939, 116.779, 123.68])  # mean values of BGR colors for ImageNet Dataset
  min_vals = -norm_means
  max_vals = 255 - norm_means

  imgs = []
  for i in range(num_iterations):
    grads, all_loss = compute_grads(cfg)
    loss, style_score, content_score = all_loss
    opt.apply_gradients([(grads, init_image)])
    clipped = tf.clip_by_value(init_image, min_vals, max_vals)
    init_image.assign(clipped)
    end_time = time.time()

    if loss < best_loss:
      # Update best loss and best image from total loss.
      best_loss = loss
      best_img = deprocess_img(init_image.numpy())


  return best_img, best_loss

To download the image from Colab uncomment the following code:

In [None]:
#from google.colab import files
#final_img = Image.fromarray(best)
#final_img.save('wave_turtle.png')
#files.download('wave_turtle.png')

## Visualize outputs
We "deprocess" the output image in order to remove the processing that was applied to it.

In [None]:
def show_results(best_img, content_path, style_path, show_large_final=True):
  plt.figure(figsize=(5, 5))
  content = load_img(content_path)
  style = load_img(style_path)

  plt.subplot(1, 2, 1)
  imshow(content, 'Content Image')

  plt.subplot(1, 2, 2)
  imshow(style, 'Style Image')

  if show_large_final:
    plt.figure(figsize=(5, 5))

    plt.imshow(best_img)
    plt.title('Output Image')
    plt.show()

##Reading and processing all image of the folder
The following function takes the path of the content images and of the style image. It reads the image of the provided folder and perfroms style transfer on it and displays it.

In [None]:
import os
def save_image(image, path):
    image.save(path)

def style_transfer_folder(content_folder, style_image, output_folder, n_iterations=1000):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    content_images = os.listdir(content_folder)

    for content_image in content_images:
        content_path = os.path.join(content_folder, content_image)
        if os.path.isfile(content_path) and content_image.lower().endswith(('.png', '.jpg', '.jpeg')):
        # Proceed with style transfer for valid image files
          styled_img, _ = run_style_transfer(content_path, style_image, num_iterations=n_iterations)  # Style transfer function

          styled_image_name = f"styled_{content_image}"

          # Display content, style, and styled images
          #content_img = load_img(content_path)
          style_img = load_img(style_image)
          show_results(styled_img, content_path, style_image)  # Display the images
        else:
            print(f"Ignoring non-image file: {content_image}")


    return output_folder


###Styling all images of the content folder

In [None]:
result_path = style_transfer_folder('/content/tmp/nst/resized', '/content/tmp/nst/resized/style_img/watercolor-lake-scenery_23-2149159406.jpg', '/content/tmp/nst/results')

### What we covered:

We used various loss functions and backpropagation to transform the input image. A pretrained model helped define content and style representations by its learned feature maps. This was implemented using eager execution and a custom model via the Functional API, enabling dynamic tensor work and simplifying debugging. The image was iteratively updated by optimizing loss with respect to the input image using tf.gradient.

Images taken from : https://www.freepik.com/free-photo



###Limitations:
* A pretrained model is used. A model trained on images similar to the style images would have worked better.
* Model does not pick the features of the style image in an intelligent manner i.e. does not identify and label the color pattern, edges and other speacial features so that the same can be applied to the content image at similar places.

###Potentials:
* Gives a reasonable styling to the content images when the objects in the images are similar. For example, we used watercolor image of a scenery as style image and transferred its style to the real images of the scenery.
* It produces results fast.