<a href="https://colab.research.google.com/github/rafm1101/deeplear/blob/main/2108_nst.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural style transfer -- an art of art transferation
Transferring the style from one picture to another picture or photograph, that is the idea behind this application of neural networks, and this notebook is my personal way of exploring this technique. It is far more than applying some impressionism filter to an image, this technique extracts stylistic features from one (style) image and applies them to another (content) image. this technique builds upon the way a convolutional neural network reads pictures for a classification task. Main feature is that instead of training a neural network to do this job, a pre-trained network is used without training of any layer. Instead, only the content image is the object that is trained on.

Inspiration and curiosity is the related exercise of the Deeplearning Specialization of *Deeplearning.ai* in *coursera*, and to mention is the [original paper](https://arxiv.org/abs/1508.06576) (Gatys et. al) as well as the [tutorial](https://www.tensorflow.org/tutorials/generative/style_transfer?hl=en) at *tensorflow.org*. The latter mentions a newer approach to the style transfer problem, but this for later exploration.

Personally, I want to understand the idea, experiment on the code, and get a better understanding of coding. The notebook concentrates on three main parts: Exploring a layer's response, the Gatys et. al algorithm for transfering style, and reconstructing style images.

What is the idea behind that Gatys et al algorithm? Passing an image to a cnn trained on image classification, the image is processed through a sequence of convolutional and pooling layers, each of which producing a representation of the image. Along this sequence, representations contain to an increasing extend information about the content starting from more textual information. The style of a picture is extracted from the filter responses of the different layers. More precisely, it is given by the spatial correlations.
The key of understanding the magic of transfering a style of one picture to some input image is that the representation of content and style are to some extend separable, and therefore a manipulation of the style can be performed without changing the content representation: After reconstruction, the global properties, the composition of objects, is preserved, while local features are modified. This is coded in the [second main part](#4).

In the [first main part](#2), the feature representations are extracted from the filters of chosen hidden layers and shown as images. The [third and final main part](#5) contains reconstructions of the low level feature representations, they were part of the original paper.

<a name="1"></a>
##1 Preparations
###1.1 Packages
Import the packages.


In [None]:
import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
import numpy as np
import tensorflow as tf
from tensorflow.python.framework.ops import EagerTensor
import pprint
%matplotlib inline

Working on Colab, data are loaded and saven from Ggl Drive, so mount Ggl drive to get access to the pretrained model and further images.

In [None]:
from google.colab import drive
drive.mount('/content/drive')
path = 'drive/MyDrive/2108-transferstyle/'
os.chdir( path )
print( os.getcwd() )
print( os.listdir() )

###1.2 Load pre-trained nn
The model used in the original paper is loaded here. Image size is defined as well as the model is set untrainable.

In [None]:
# in case on non-random randomness desired set the seed manually
#tf.random.set_seed( 272 ) 

# image dimensions
IMG_WIDTH = 600
IMG_HEIGHT = 400

pp = pprint.PrettyPrinter( indent=4 )

# load pre-trained vgg19 without the dense layers on top
# change weights argument to 'imagenet' to download weigths
vgg = tf.keras.applications.VGG19(
        include_top=False, input_shape=( IMG_HEIGHT, IMG_WIDTH, 3),
        weights='vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5')

vgg.trainable = False
pp.pprint(vgg)

Retrieve the model's architecture details. The names of the layers are needed as reference later to be able to extract the stylistic features of the images.

In [None]:
#for layer in vgg.layers:
#    print( layer.name )
vgg.summary()

###1.3 Load images
Images are loaded and preprocessed: Their shape needs to agree with the set up model, and they are converted into tensors

In [None]:
def preprocess_image( file_name, height, width ):
  """ 
  preprocess images: here loading and scaling only
  Arguments:
    file_name - including path
    height    - height after scaling
    width     - width after scaling
  Returns:
    image -- a tf constant of shape (1, width, height, 3)
  """
  image = np.array( Image.open( file_name ).resize( ( width, height ) ) )
  image = tf.constant( np.reshape( image, ( (1,) + image.shape ) ) )

  return image

def plot_row( images, titles, num=3, n=3, J=np.zeros((1,0)) ):
  """ 
  plot images next to each other with titles
  Arguments:
    images - tuple of images or tensor of images
    titles - tuple of titles
    num - number of images to display
    n - max number of in a row
    J - array of values of a function to display together with the images
        (typically the loss function at the end of the loops through the epochs)
  """
  fig = plt.figure( figsize=(16, 4) )
  #print( images.shape )
  for i in range( np.minimum( num, images.shape[0] ) ):
    ax = fig.add_subplot( 1, n, i+1 )
    if not tf.is_tensor( images ):
      imshow( images[i] )
    else:
      imshow( images[i].numpy() )
    ax.title.set_text( titles[i] )
  
  if J.size:
    if num == 3:
      plot.show()
      fig = plt.figure( figsize=(16, 4) )
    ax = fig.add_subplot( 1, n, num % 3 + 1 )
    plt.plot(J)
    ax.title.set_text( 'style loss' )

  plt.show()
  

In [None]:
# images located in the ggl drive directory
#content_image_name = 'SAM_0016_math.jpg'
content_image_name = 'SAM_0868.JPG'
#content_image_name = 'SAM_7044kl.jpg'
#style_image_name = 'monet-mohn.jpg'
style_image_name = 'van-gogh-noche-estrellada.jpg'

# preprocess images
content_image = preprocess_image( content_image_name, IMG_HEIGHT, IMG_WIDTH )
style_image = preprocess_image( style_image_name, IMG_HEIGHT, IMG_WIDTH )
print( type( content_image ) )

print(f"shapes of images:\n  content - {content_image.shape}\n    style - {style_image.shape}\n")
print( content_image.shape )
image_list = np.concatenate( [ content_image, style_image ], axis=0 )
print( type(image_list) )
plot_row( image_list, 
              [ 'Content_image', 'Style_image' ], 2 )

<a name="2"></a>
##2 Exploring a layer's response to an image
This first main part's objective is to visualise the activations of different hidden layers of an image. These feature maps are 4d tensors of shape (1, *, *, c), where the very first dimension is meaningless for processing a batch of size one only, and the remainder is a quadratic image with c channels, i.e. as many channels as filters used to process the input signal. They are shown as c gray-scale images arranged as a panorama image.

First, define helper functions which extract any named layer from the pre-trained model. These layers are used to access the hidden layer's activations.

In [None]:
def get_layer_outputs( model, layer_names ):
  """ little helper function
  retrieve list of layer outputs 

  Arguments:
    model -- a tensorflow model, supposed to be the pre-trained one
    layer_names -- a list of layer names, supposed to contained in the model

  Returns:
    layer_outputs -- list of tensors
  """
  if len( layer_names[0] ) == 2:
    layer_names = [ x for (x,_) in layer_names ]
  layer_outputs = [ layer.output for layer in model.layers if layer.name in layer_names ] 
  assert len( layer_names ) == len( layer_outputs ), "Did not find all layers in the model. Compare layer names and model's summary."

  return layer_outputs

def create_layer_output_model( model, outputs ):
  """ little helper function
  creates model that returns a list of intermediate output values. (not used)
    
  Arguments:
    model -- a tensorflow model, supposed to be the pre-trained one
    outputs -- a list of tensors representing the desired hidden layers to watch
  Returns:
     model with the desired outputs
  """
    
  return tf.keras.Model( inputs = [ model.input ], outputs = outputs)

Choose the layers by their names and define a helper model that outputs exactly the chosen layers.

In [None]:
# choose the öayers of interest (check the summary of the model)
VISUALIZED_LAYERS = [ 'block1_conv1', 'block1_conv2', 'block2_conv1', 'block3_conv1', 'block4_conv1', 'block5_conv1' ]
#VISUALIZED_LAYERS = [ 'block1_conv2', 'block2_conv2', 'block3_conv4', 'block4_conv4', 'block5_conv4' ]
#VISUALIZED_LAYERS = [ 'block1_conv1', 'block1_conv2', 'block2_conv1', 'block2_conv2' ]

# create list of the outputs of these layers
layer_outputs = get_layer_outputs( vgg, VISUALIZED_LAYERS )
pp.pprint( layer_outputs )

# create model
vgg_vis = tf.keras.Model( inputs = vgg.input, outputs = layer_outputs) #create_layer_output_model( vgg.input, layer_outputs )

The helper function for displaying the feature maps, which are assumed to be computed already. beserve that the single feature maps are standardised.

In [None]:
def show_layer_representations( feature_maps, layer_names ):
  """
  Display the feature maps

  Arguments:
    feature_maps -- predictions of the model's hidden layers
    layer_names  -- names of the layers (list of str)
  """
  for layer_name, feature_map in zip( layer_names, feature_maps ):
    
    # show features of convolution and pooling layers only
    if len( feature_map.shape ) == 4:
      # shape of feature map is ( 1, h, w, n )
      n_C = feature_map.shape[-1]
      n_H, n_W = feature_map.shape[ 1:3 ]
      # display features in matrix
      display_grid = np.zeros( ( n_H, n_W * n_C) )
      # fill in processed features
      for i in range( n_C ):
        display_grid[ :, i * n_W : (i + 1) * n_W ] = feature_map_standardization( feature_map[0, :, :, i] )
      # Display the grid
      scale = 200. / n_C
      plt.figure( figsize=( scale * n_C, scale ) )
      plt.title( layer_name )
      #plt.grid( False )
      plt.imshow(display_grid, aspect='auto', cmap='magma')
  
def feature_map_standardization( slice ):
  """ little helper's helper function
  Standardize and scale a feature map

  Arguments:
    slice -- slice of a feature map,
              np.ndarray ( h, w )
  
  Returns:
    clipped np.ndarray ( h, w ) dtype uint8
  """
  slice -= slice.mean()
  s = slice.std()
  if s>0: # prevent from dividing by 0, in case it is constant anyways
    slice /= slice.std()
  slice *= 64
  slice += 128
  return np.clip( slice, 0, 255 ).astype( 'uint8' )


In [None]:
feature_maps = vgg_vis.predict( content_image )
show_layer_representations( feature_maps=feature_maps, layer_names=VISUALIZED_LAYERS )

In [None]:
show_layer_representations( feature_maps=feature_maps, layer_names=VISUALIZED_LAYERS )

<a name="3"></a>
##3 Little helpers: Cost function
Define the cost functions given in the original paper, the computations are done in seveal steps:
0.   Define style layers and their weight 
1.   Compute cost due to changes of the content
2.   Compute cost due the differences in the Gram marices of a given layer
3.   Compute Gram matrix
4.   Compute cost due to different styles
5.   Compute total cost




In [None]:
# choose style layers and their corresponding weights
STYLE_LAYERS = [
    ('block1_conv1', 0.2),
    ('block2_conv1', 0.2),
    ('block3_conv1', 0.2),
    ('block4_conv1', 0.2),
    ('block5_conv1', 0.2)]

@tf.function()
def compute_content_cost( content_output, generated_output ):
  """ little helper function
  Computes the content cost
    
  Arguments:
    a_C --  intermediate representation of content of image C
            tensor of shape (1, n_H, n_W, n_C)
    a_G --  intermediate representation of content of image C
            tensor of shape (1, n_H, n_W, n_C)
    
  Returns: 
    J_content -- scalar, loss due to content
  """
  a_C = content_output[-1]
  a_G = generated_output[-1]
    
  # dimensions from a_G
  m, n_H, n_W, n_C = a_G.get_shape().as_list()
    
  # reshape a_C and a_G
  a_C_unrolled = tf.transpose( tf.reshape( a_C, shape=[m, -1, n_C] ), perm=[0,2,1] )
  a_G_unrolled = tf.transpose( tf.reshape( a_G, shape=[m, -1, n_C] ), perm=[0,2,1] )
    
  # compute the cost with tensorflow (≈1 line)
  J_content =  tf.reduce_sum( tf.square( tf.subtract( a_C_unrolled, a_G_unrolled ) ) ) / ( 4 * n_H*n_W*n_C )
    
  return J_content

def compute_layer_style_cost(a_S, a_G):
  """ little helper function
  Compute style cost of a given layer

  Arguments:
    a_S -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image S 
    a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image G
    
  Returns: 
    J_style_layer -- tensor representing a scalar value, style cost
  """
  m, n_H, n_W, n_C = a_G.get_shape().as_list()
  #print( a_G.get_shape().as_list() )
    
  # reshape images into shape ( n_C, n_H * n_W )
  a_S = tf.squeeze( tf.transpose( tf.reshape( a_S, shape=[-1, n_H*n_W, n_C] ), perm=[2,1,0] ), axis=2 )
  a_G = tf.squeeze( tf.transpose( tf.reshape( a_G, shape=[-1, n_H*n_W, n_C] ), perm=[2,1,0] ), axis=2 )
    
  # computing gram_matrices
  GS = gram_matrix( a_S )
  GG = gram_matrix( a_G )
    
  # computeing the loss
  J_style_layer = tf.reduce_sum( tf.square( tf.subtract( GS, GG ) ) ) / ( 2 * n_H*n_W*n_C )**2
    
  return J_style_layer

def gram_matrix(A):
  """ little helper's helper function
    
  Arguemts:
    A -- matrix of shape (n_C, n_H*n_W)
    
  Returns:
    GA -- Gram matrix of A, of shape (n_C, n_C)
  """  
  GA = tf.matmul( A, tf.transpose( A ) )

  return GA

@tf.function
def compute_style_cost( style_image_output, generated_image_output, style_layers=STYLE_LAYERS ):
  """ little helper function
  Compute the overall style cost from several chosen layers
    
  Arguemnts:
    style_image_output -- our tensorflow model's hidden layers representations
    generated_image_output -- our tensorflow model's hidden layers representations
    style_layers -- A python list containing:
                        - the names of the layers we would like to extract style from
                        - a coefficient for each of them
    
  Returns: 
    J_style -- tensor representing a scalar value, style cost

  Additional remark:
    the observed layers are the style layers and the content layer (whose feature map is removed first)
  """
  # initialize
  J_style = 0

  # hidden layer activation from selected layer (last element contains the not to be used image)
  a_S = style_image_output[ :-1 ]

  # hidden layers activation (same here)
  a_G = generated_image_output[ :-1 ]

  # collect single layer's costs and sum up
  for i, weight in zip( range( len(a_S) ), style_layers ):  
      J_style_layer = compute_layer_style_cost( a_S[i], a_G[i] )
      J_style += weight[1] * J_style_layer

  return J_style

@tf.function
def total_cost( J_content, J_style, alpha = 10, beta = 40 ):
  """ little helper funtion
  Compute the total cost function
    
  Arguments:
    J_content -- content cost coded above
    J_style -- style cost coded above
    alpha -- hyperparameter weighting the importance of the content cost
    beta -- hyperparameter weighting the importance of the style cost
    
  Returns:
    J -- total cost as defined by the formula above.
  """
  J = alpha * J_content + beta * J_style
  return J

<a name="4"></a>
##4 Optimisation problem
First, helper functions to clip and convert images are defined, then the model is set up and images preprocessed. Then a training step is defined as a function and finally the iterations are perfomed.

In [None]:
def clip_0_1( image ):
  """
  truncate pixels in the tensor to be between 0 and 1
    
  Arguments:
    image -- tensor
    
  Returns:
    tensor
  """
  return tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)

def tensor_to_image( tensor, keepdims=False ):
  """
  converts the given tensor into a PIL image
    
  Arguments:
    tensor -- tensor
    keepdims -- boolean
    
  Returns:
    Image -- PIL image
  """
  tensor = tensor * 255
  tensor = np.array(tensor, dtype=np.uint8)
  if np.ndim(tensor) > 3:
      assert tensor.shape[0] == 1
      if not keepdims:
        tensor = tensor[0]
  return Image.fromarray(tensor)

The content representation is taken from the last convolutional layer and a helper model is created with the style layers previously defined and the content layer as outputs of the helper model. For computations, images are converted into float32 tensor objects. The initial generated image is simply a noisy content image.

Compare content loss to style loss. Since the proprocessed content image is a noisy version of the original content image, its loss is quite small, where the style loss is quite large. The objectiv of interest is the sum of these two losses. During optimisation, the style loss will decrease at cost of increasing the content loss. But anyways, that is desired.

In [None]:
# build model to retrieve the activations of the style layers and the content layer
CONTENT_LAYER = [('block5_conv4', 1)]
layer_outputs = get_layer_outputs( vgg, STYLE_LAYERS + CONTENT_LAYER )
pp.pprint( layer_outputs )
vgg_model_outputs = tf.keras.Model( vgg.input, layer_outputs )
print( vgg_model_outputs )

# preprocess content image and retrieve hidden layer's responses
print( type( content_image ) ) 
preprocessed_content =  tf.Variable(tf.image.convert_image_dtype( content_image, tf.float32 ) )
a_C = vgg_model_outputs( preprocessed_content )
print(type(a_C))

# add noise to content image and retrieve hidden layer's respones
noise = tf.random.uniform( tf.shape( preprocessed_content ), 0, 0.5 )
generated_image = tf.add( preprocessed_content, noise )
generated_image = tf.Variable( clip_0_1( generated_image ) )
print(type(generated_image))
a_G = vgg_model_outputs( generated_image )

# compute the content cost
J_content = compute_content_cost( a_C, a_G )
print(J_content)

# preprocess style image and retrieve hidden layer's responses
preprocessed_style =  tf.Variable( tf.image.convert_image_dtype( style_image, tf.float32 ) )
a_S = vgg_model_outputs( preprocessed_style )

# Compute the style cost
J_style = compute_style_cost( a_S, a_G, STYLE_LAYERS )
print(J_style)


Here, the training step funtion is defined. Note that the generated image needs to be a tf.Variable. Not the following: The tf.function decoration lets the function be executed in graph mode allowing a much faster run. On the first run, a computation graph is built by running Python commands only, and they are translated into Tensorflow objects. On each following run, only Tensorflow commands are executed, but not Python ones. Whenever a non-tensorflow variable is changed after the first run, that yields an error.

In [None]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.05)

@tf.function()
def train_step( generated_image, a_S, a_C ):
  """
  train step for generating the style transferred image

  Arguments:
    generated_image -- tensor, the image to be optimsed
    a_S -- list of tensors, model outputs of the style image
    a_C -- list of tensors, model outputs of the content image
  
  Returns:
    J -- loss
  """
  # let tf compute the gradient along computations for loss
  with tf.GradientTape() as tape:
    tape.watch( generated_image )

    # get hidden layer's responses of generated image
    a_G = vgg_model_outputs( generated_image )

    # style cost
    J_style = compute_style_cost( a_S[:-1], a_G[:-1] )
    # content cost
    J_content = compute_content_cost( a_C, a_G )
    # total cost
    J = total_cost( J_content, J_style )

  grad = tape.gradient( J, generated_image )

  optimizer.apply_gradients( [ ( grad, generated_image ) ] )
  generated_image.assign( clip_0_1( generated_image ) )
  #print(type(generated_image))
  
  return J

A single train step. For test purposes only. Note that for that curious reason just mentioned, executing the following cell must be followed by executing the train step cell before executing the optimisation loop.

In [None]:
train_step( generated_image, a_S, a_C )

Optimisation loop. Ensure to execute the train step definition right before.

In [None]:
def get_train_loop():
  #@tf.function()
  def train_loop( generated_image, a_S, a_C, epochs=1, intermediate=7, save_image=False ):
    """
    training loop for generating the style transferred image

    Arguments:
      generated_image -- tensor, the image to be optimsed
      a_S -- list of tensors, model outputs of the style image
      a_C -- list of tensors, model outputs of the content image
      epochs -- numbwe of epochs to run the optimisation procedure
      intermediate -- show intermediate resulte every ... steps
      save_image -- save intermediate images to disk
  
    Returns:
      J -- loss
    """
    intermediate_images = np.zeros( (3, IMG_HEIGHT, IMG_WIDTH, 3), dtype='uint8' )
    intermediate_titles = ['0','0','0']
    pos = 0

    for i in range( epochs ):
      train_step( generated_image, a_S, a_C )
      if i % intermediate == 0:
          if save_image:
            image.save(f"{content_image_name[:-4]}_{i}.jpg")
          intermediate_images[pos,:,:,:] = tensor_to_image( generated_image )
          intermediate_titles[pos] = str(i)
          pos += 1
          if pos==3:
            pos = 0
            plot_row( intermediate_images, intermediate_titles, 3 )
    if pos in [1, 2]:
      plot_row( intermediate_images, intermediate_titles, num=pos )
    print( 'ferddich' )
  return train_loop  

In [None]:
# Show the generated image at some epochs
# Uncoment to reset the style transfer process. You will need to compile the train_step function again 
EPOCHS = 59

train = get_train_loop()
train( generated_image, a_S, a_C, epochs=EPOCHS, intermediate=7 )

Plot the result next to the style image and the original content image.

In [None]:
# Show the 3 images in a row
plot_row( np.concatenate([ content_image, style_image, 
                          tf.cast( generated_image*255, dtype='uint8') ] , axis=0), 
         [ 'Content_image', 'Style_image', 'Generated image' ], 3 )


<a name="5"></a>
##5 Reconstruction of the style
Here the optimisation problem is tackled for the style loss only. More precisely for any given number of the chosen style layers starting from the first one (similar to the original paper: at first the first chosen layer, then the first two chosen layers and so on). The image on which the optimisation is run is initialised with pure noise, and once the given number of epochs is done, the result is handed over to be optimised with one additional layer. 

The images show how structures emerge and in which way the layers contribute. At first, noise stays noise, but changes its colour. During the second run with two layers, grains emerge from the noise giving the image som granularity. After the third run the characteristic usage of the brush gets visible. Finally, abstract sructures appear in the style of the style image, but without showing any shape of any object.

At first the loss function, which is almost similar to the compute_style cost function apart from extended functionality (could be coded more elegantly), is defined.

Up to now, the code runs in eager mode only. Any of my attempts to get it run in graph mode did not succeed.

In [None]:
def get_compute_style_loss():
  #@tf.function
  def compute_style_loss( a_S, a_G, layers_with_weights ):
    """ little helper function
    compute the style loss of given number of style layers

    Arguments:
      a_S -- response of vgg on the style image
      a_G -- response of vgg on the generated image
      layers_with_weights -- python list containing:
                            - the names of the layers the style is extracted from
                            - a coefficient for each of them
                          ! 
      n_L -- number of layers to be considered starting from the beginning of the model

    Returns:
      J_style -- loss of the style corresponding to the number of layers n_L
    """
    # initialize
    J_style = np.zeros( (1,), dtype='float32' )
    
    # collect single layer's costs and sum up
    for i, weight in enumerate( layers_with_weights ):  
      J_style_layer = compute_layer_style_cost( a_S[i], a_G[i] )
      J_style += weight[1] * J_style_layer

    return J_style

  return compute_style_loss

Next, the train step function and the train loop function are created.

In [None]:
def get_style_train_step():  
  #@tf.function
  def style_train_step( generated_image, a_S, style_layers, loss ):
    """
    single optimizatiion step for the style extraction

    Arguments:
    generated_image -- tensor, image that is optimized
    a_S -- list of tensors, model's response to style image
    style_layers -- python list containing:
                          - the names of the layers the style is extracted from
                          - a coefficient for each of them
    loss -- function that computes the loss

    Returns:
    J -- computed loss
    """
    
    with tf.GradientTape() as tape:
      # retrieve activations
      a_G = vgg_model_outputs( generated_image )

      # compute loss
      J = loss( a_S, a_G, style_layers )

    grad = tape.gradient( J, generated_image )

    # perform descent step and clip to unit intervall
    optimizer.apply_gradients( [ ( grad, generated_image ) ] )
    generated_image.assign( clip_0_1( generated_image ) )
  
    return J
  return style_train_step

def get_style_train_loop():
  def style_train_loop( generated_image, a_S, style_layers, epochs=1, intermediate=7, save_image=False ):
    """
    optimizatiion loop for the style extraction

    Arguments:
      generated_image -- tensor, image that is optimized
      a_S -- list of tensors, model's response to style image
      style_layers -- python list containing:
                            - the names of the layers the style is extracted from
                            - a coefficient for each of them
      epochs -- number of epochs to run
      intermediate -- after this number of steps the current generated image is shown
      save_image -- in case of these images shall be saved to disk
    """

    # initialise the variables that hold the intermediate images and their titles
    intermediate_images = np.zeros( (3, IMG_HEIGHT, IMG_WIDTH, 3), dtype='uint8' )
    intermediate_titles = ['0','0','0']
    pos = 0
    J = np.zeros( (epochs,) )
    train = get_style_train_step()
    compute_style_loss = get_compute_style_loss()

    for i in range( epochs ):
      # perform a training step
      J[i] = train( generated_image, a_S, style_layers, compute_style_loss )
      # image handling
      if i % intermediate == 0:
          if save_image:
            image.save(f"{style_image_name[:-4]}_extractedlayers{len(style_layers)}_{i}.jpg")
          intermediate_images[pos,:,:,:] = tensor_to_image( generated_image )
          intermediate_titles[pos] = str(i)
          pos += 1
          if pos==3:
            pos = 0
            plot_row( intermediate_images, intermediate_titles, 3 )
    if pos in [1, 2]:
      plot_row( intermediate_images, intermediate_titles, num=pos, J=J )
    else:
      plot_row( intermediate_images, intermediate_titles, num=0, J=J )
    #print( f'loss in each step {J}' )
    print( 'ferddich' )
  return style_train_loop  


Finally, the final loop that executes the runs through the training loop follows. Note that with 5 style layers more epochs are needed than with one or two style layers.

In [None]:
# Show the generated image at some epochs
# Uncoment to reset the style transfer process. You will need to compile the train_step function again 
EPOCHS = 43
INTERMEDIATE = 7

# prepare initial image to be purely noisy
noise = tf.random.uniform( tf.shape( preprocessed_style ), 0, 0.5 )
generated_image = tf.Variable( clip_0_1( noise ), trainable=True, dtype=tf.float32 )

# loop: optimise for the first n_L layers
for n_L in range( len( STYLE_LAYERS ) ):

  print( f"Optimize against {n_L+1} layer(s)" )
  style_train = get_style_train_loop()
  style_train( generated_image, a_S, STYLE_LAYERS[ :n_L+1 ], epochs=EPOCHS, intermediate=INTERMEDIATE )


That's it, the following blocks are just there and do not intend to be run.

In [None]:
class StyleTrain():
  def __init__( self, model_outputs, a_S, style_layers, n_L, cmp_layer_style_cost ):
    self._n_L = n_L
    self._optimizer = tf.keras.optimizers.Adam(learning_rate=0.05)
    self._model_outputs = model_outputs
    self._a_S = a_S
    self._style_layers = style_layers
    #self.J_style = 0
    self.cmp_layer_style_cost = cmp_layer_style_cost

  #training step method
  @tf.function
  def __call__(self, generated_image ): #style_train_step( generated_image, n_L=1 ):
    print( "Inputs to train step: ", type( generated_image ), generated_image )
    #print("trainables ", tf.trainable_variables )
    with tf.GradientTape( ) as tape: #watch_accessed_variables=False ) as tape:
      tape.watch( generated_image )
      #tf.print( type( generated_image ) )
      # retrieve activations
      a_G = self._model_outputs( generated_image )
    
      # compute loss
      J = self.compute_style_loss( self._a_S, a_G )

    grad = tape.gradient( J, generated_image )
    tf.print( "nach GradTape: ", type( grad ) )
    self._optimizer.apply_gradients( [ ( grad, generated_image ) ] )
    #generated_image.assign( clip_0_1( generated_image ) )
  
    return J

  @tf.function
  def compute_style_loss( self, generated_image_output, STYLE_LAYERS=STYLE_LAYERS ):
    """
    compute the style loss of given number of style layers

    Inputs:
    style_layer_output -- response of vgg on the style image
    generated_layer_output -- response of vgg on the generated image
    STYLE_LAYERS -- python list containing:
                        - the names of the layers the style is extracted from
                        - a coefficient for each of them

    Output:
    J_style -- loss of the style corresponding to the number of layers n_L
    """
    # initialize
    J_style = 0
    #num = tf.gather( n_L, 0 ).numpy()
    #print("CSL lengths: ",type(style_image_output),len(style_image_output),len(generated_image_output),len(STYLE_LAYERS),n_L)

    # extract style responses of n_L layers 
    #layers = STYLE_LAYERS[ :n_L ]

    # collect single layer's costs and sum up
    for i in range( self._n_L ):  
      J_style_layer = self.cmp_layer_style_cost( a_S[i], a_G[i] )
      J_style += self._style_layers[i][1] * J_style_layer

    return J_style

In [None]:
print( tf.autograph.to_code(style_train_step.python_function))

Training loop.

In [None]:
# Show the generated image at some epochs
# Uncoment to reset the style transfer process. You will need to compile the train_step function again 
EPOCHS = 13
pos = 1
fig = plt.figure(figsize=(16, 4))
J = np.zeros( (EPOCHS,) )

noise = tf.random.uniform( tf.shape( preprocessed_style ), 0, 0.5 )
generated_image = tf.Variable( clip_0_1( noise ), trainable=True )

print(type(generated_image))
print( generated_image.get_shape().as_list())
print(type(generated_image[0]))

for n_L in range( 2 ): #len( STYLE_LAYERS ) ):
  #n = tf.Variable( n_L )
  print( f"Optimize against {n_L+1} layer(s)" )
  style_train_step = StyleTrain( vgg_model_outputs, a_S, STYLE_LAYERS, n_L, compute_layer_style_cost )
  for i in range( EPOCHS ):
    J[i] = style_train_step( generated_image )
    if i % 3 == 0:
        ax = fig.add_subplot( 1, 3, pos )
        image = tensor_to_image( generated_image )
        imshow( image )
        ax.title.set_text( f"{i}" )
        #image.save(f"{content_image_name[:-4]}_{i}.jpg")
        if pos==1:
          pos = 2
        elif pos==2:
          pos = 3
        else:
          pos = 1
          plt.show() 
          fig = plt.figure(figsize=(16, 4))

  ax = fig.add_subplot( 1, 3, pos )
  plt.plot( J )
  plt.show()