# Art Style Transfer

This notebook is a re-implementation of the algorithm described in "A Neural Algorithm of Artistic Style" (http://arxiv.org/abs/1508.06576) by Gatys, Ecker and Bethge. Additional details of their method are available at http://arxiv.org/abs/1505.07376 and http://bethgelab.org/deepneuralart/.

An image is generated which combines the content of a photograph with the "style" of a painting. This is accomplished by jointly minimizing the squared difference between feature activation maps of the photo and generated image, and the squared difference of feature correlation between painting and generated image. A total variation penalty is also applied to reduce high frequency noise. 

This notebook was originally sourced from [Lasagne Recipes](https://github.com/Lasagne/Recipes/tree/master/examples/styletransfer), but has been modified to use a GoogLeNet network (pre-trained and pre-loaded), in TensorFlow and given some features to make it easier to experiment with.

Other implementations : 
  *  https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/15_Style_Transfer.ipynb (with [video](https://www.youtube.com/watch?v=LoePx3QC5Js))
  *  https://github.com/cysmith/neural-style-tf
  *  https://github.com/anishathalye/neural-style

In [None]:
import tensorflow as tf

import numpy as np
import scipy
import scipy.misc  # for imresize

import matplotlib.pyplot as plt
%matplotlib inline

import time

from urllib.request import urlopen  # Python 3+ version (instead of urllib2)

import os # for directory listings
import pickle

AS_PATH='./images/art-style'

### Add TensorFlow Slim Model Zoo to path

In [None]:
import os, sys

tf_zoo_models_dir = './models/tensorflow_zoo'

if not os.path.exists(tf_zoo_models_dir):
    print("Creating %s directory" % (tf_zoo_models_dir,))
    os.makedirs(tf_zoo_models_dir)
if not os.path.isfile( os.path.join(tf_zoo_models_dir, 'models', 'README.md') ):
    print("Cloning tensorflow model zoo under %s" % (tf_zoo_models_dir, ))
    !cd {tf_zoo_models_dir}; git clone https://github.com/tensorflow/models.git

sys.path.append(tf_zoo_models_dir + "/models/slim")

print("Model Zoo model code installed")

### The Inception v1 (GoogLeNet) Architecture|

![GoogLeNet Architecture](../../images/presentation/googlenet-arch_1228x573.jpg)

### Download the Inception V1 checkpoint¶

Functions for building the GoogLeNet model with TensorFlow / slim and preprocessing the images are defined in ```model.inception_v1_tf``` - which was downloaded from the TensorFlow / slim [Model Zoo](https://github.com/tensorflow/models/tree/master/slim).

The actual code for the ```slim``` model will be <a href="model/tensorflow_zoo/models/slim/nets/inception_v1.py" target=_blank>here</a>.

In [None]:
from datasets import dataset_utils

targz = "inception_v1_2016_08_28.tar.gz"
url = "http://download.tensorflow.org/models/"+targz
checkpoints_dir = './data/tensorflow_zoo/checkpoints'

if not os.path.exists(checkpoints_dir):
    os.makedirs(checkpoints_dir)

if not os.path.isfile( os.path.join(checkpoints_dir, 'inception_v1.ckpt') ):
    tarfilepath = os.path.join(checkpoints_dir, targz)
    if os.path.isfile(tarfilepath):
        import tarfile
        tarfile.open(tarfilepath, 'r:gz').extractall(checkpoints_dir)
    else:
        dataset_utils.download_and_uncompress_tarball(url, checkpoints_dir)
        
    # Get rid of tarfile source (the checkpoint itself will remain)
    os.unlink(tarfilepath)
        
print("Checkpoint available locally")

In [None]:
slim = tf.contrib.slim

from nets import inception
from preprocessing import inception_preprocessing

image_size = inception.inception_v1.default_image_size

IMAGE_W=224
image_size

In [None]:
#from model import googlenet
#
#net = googlenet.build_model()
#net_input_var = net['input'].input_var
#net_output_layer = net['prob']

In [None]:
#params = pickle.load(open('./data/googlenet/blvc_googlenet.pkl', 'rb'), encoding='iso-8859-1')
#model_param_values = params['param values']
##classes = params['synset words']
#lasagne.layers.set_all_param_values(net_output_layer, model_param_values)
#print("Loaded Model parameters")

In [None]:
def prep_image(im):
    if len(im.shape) == 2:
        im = im[:, :, np.newaxis]
        im = np.repeat(im, 3, axis=2)
        
    # Resize so smallest dim = 224, preserving aspect ratio
    h, w, _ = im.shape
    if h < w:
        im = scipy.misc.imresize(im, (224, int(w*224/h)))
    else:
        im = scipy.misc.imresize(im, (int(h*224/w), 224))

    # Central crop to 224x224
    h, w, _ = im.shape
    im = im[h//2-112:h//2+112, w//2-112:w//2+112]
    
    rawim = np.copy(im).astype('uint8')
    return rawim, im
    
    # Shuffle axes to c01
    #im = np.swapaxes(np.swapaxes(im, 1, 2), 0, 1)
    
    # Convert to BGR
    #im = im[::-1, :, :]

    #MEAN_VALUES = np.array([104, 117, 123]).reshape((3,1,1))
    #im = im - MEAN_VALUES
    #return rawim, floatX(im[np.newaxis])


### Choose the Photo to be *Enhanced*


In [None]:
photos = [ '%s/photos/%s' % (AS_PATH, f) for f in os.listdir('%s/photos/' % AS_PATH) if not f.startswith('.')]
photo_i=-1 # will be incremented in next cell (i.e. to start at [0])

Executing the cell below will iterate through the images in the ```./images/art-style/photos``` directory, so you can choose the one you want

In [None]:
photo_i += 1
photo = plt.imread(photos[photo_i % len(photos)])
photo_rawim, photo = prep_image(photo)
plt.imshow(photo_rawim)

### Choose the photo with the required 'Style'

In [None]:
styles = [ '%s/styles/%s' % (AS_PATH, f) for f in os.listdir('%s/styles/' % AS_PATH) if not f.startswith('.')]
style_i=-1 # will be incremented in next cell (i.e. to start at [0])

Executing the cell below will iterate through the images in the ```./images/art-style/styles``` directory, so you can choose the one you want

In [None]:
style_i += 1
style = plt.imread(styles[style_i % len(styles)])
style_rawim, style = prep_image(style)
plt.imshow(style_rawim)

This defines various measures of difference that we'll use to compare the current output image with the original sources.

In [None]:
def plot_layout(combined):
    def no_axes():
        plt.gca().xaxis.set_visible(False)    
        plt.gca().yaxis.set_visible(False)    
        
    plt.figure(figsize=(9,6))

    plt.subplot2grid( (2,3), (0,0) )
    no_axes()
    plt.imshow(photo_rawim)

    plt.subplot2grid( (2,3), (1,0) )
    no_axes()
    plt.imshow(style_rawim)

    plt.subplot2grid( (2,3), (0,1), colspan=2, rowspan=2 )
    no_axes()
    plt.imshow(combined, interpolation='nearest')

    plt.tight_layout()

In [None]:
#def gram_matrix(x):
#    x = x.flatten(ndim=3)
#    g = T.tensordot(x, x, axes=([2], [2]))
#    return g

#def content_loss(P, X, layer):
#    p = P[layer]
#    x = X[layer]
#    
#    loss = 1./2 * ((x - p)**2).sum()
#    return loss

#def style_loss(A, X, layer):
#    a = A[layer]
#    x = X[layer]
#    
#    A = gram_matrix(a)
#    G = gram_matrix(x)
#    
#    N = a.shape[1]
#    M = a.shape[2] * a.shape[3]
#    
#    loss = 1./(4 * N**2 * M**2) * ((G - A)**2).sum()
#    return loss

#def total_variation_loss_l125(x):
#    return (((x[:,:,:-1,:-1] - x[:,:,1:,:-1])**2 + (x[:,:,:-1,:-1] - x[:,:,:-1,1:])**2)**1.25).sum()

### Precompute layer activations for photo and artwork 
This takes ~ 20 seconds

In [None]:
tf.reset_default_graph()

# This creates an image 'placeholder'
input_image = tf.placeholder(tf.uint8, shape=[None, None, 3], name='input_image')
#input_image_var = tf.Variable(tf.zeros([image_size,image_size,3], dtype=tf.uint8), name='input_image_var' )

# This will convert uint8(0...255) to float32(0.0...1.0)
input_image_float = tf.cast( input_image, tf.float32 ) / 255.0

# Define the pre-processing chain within the graph - based on the input 'image' above
#processed_image = inception_preprocessing.preprocess_image(input_image, image_size, image_size, is_training=False)
processed_image = inception_preprocessing.preprocess_for_eval(input_image_float, 0, 0, central_fraction=None)
processed_images = tf.expand_dims(processed_image, 0)

# Reverse out some of the transforms, so we can see the area/scaling of the inception input
numpyish_image = tf.multiply(processed_image, 0.5)
numpyish_image = tf.add(numpyish_image, 0.5)
numpyish_image = tf.multiply(numpyish_image, 255.0)

# Create the model - which uses the above pre-processing on image
#   it also uses the default arg scope to configure the batch norm parameters.
print("Model builder starting")

# Here is the actual model zoo model being instantiated :
with slim.arg_scope(inception.inception_v1_arg_scope()):
    logits, end_points = inception.inception_v1(processed_images, num_classes=1001, is_training=False)
#probabilities = tf.nn.softmax(logits)

# Create an operation that loads the pre-trained model from the checkpoint
init_fn = slim.assign_from_checkpoint_fn(
    os.path.join(checkpoints_dir, 'inception_v1.ckpt'),
    slim.get_model_variables('InceptionV1')
)
#init_image = input_image_var.assign( input_image )

print("Model defined")

In [None]:
#dir(slim.get_model_variables('InceptionV1')[10])
#[ v.name for v in slim.get_model_variables('InceptionV1') ]
sorted(end_points.keys())
#dir(end_points['Mixed_4b'])
#end_points['Mixed_4b'].name

So that gives us a pallette of GoogLeNet layers from which we can choose to pay attention to :

In [None]:
layer_names = [
    # used for 'content' in photo - a mid-tier convolutional layer 
    'Mixed_4b', #Theano : 'inception_4b/output', 
    
    # used for 'style' - conv layers throughout model (not same as content one)
    'Conv2d_1a_7x7', #Theano : 'conv1/7x7_s2',        
    'Conv2d_2c_3x3', #Theano : 'conv2/3x3', 
    'Mixed_3b', #Theano : 'inception_3b/output',  
    'Mixed_4d', #Theano : 'inception_4d/output',
]

# Different set of layers, for experimentation
#layers = [  
#    # used for 'content' in photo - a mid-tier convolutional layer 
#    'pool4/3x3_s2', 
#    
#    # used for 'style' - conv layers throughout model (not same as content one)
#    'conv1/7x7_s2', 'conv2/3x3', 'pool3/3x3_s2', 'inception_5b/output',
#]

#layers = { k: end_points[k] for k in layer_names }

Let's grab (constant) values for all the ```layer_names``` for the original photo, and the style image :

In [None]:
# Now let's run the pre-trained model on the photo and the style
style_features={}
photo_features={}
#layer_shapes={}

with tf.Session() as sess:
    # This is the loader 'op' we defined above
    init_fn(sess)  
    #init_image(sess)
    
    #style_layers_np = sess.run([ end_points[k] for k in layer_names ], feed_dict={input_image: style})
    #with tf.variable_scope('style', reuse=True):
    #    for i,l in enumerate(style_layers_np):
    #        style_features[ layer_names[i] ] = tf.get_variable(layer_names[i], l.shape, 
    #                                                       initializer=tf.constant_initializer(l))
    
    # This is two ops : one merely loads the image from numpy, 
    #   the other runs the network to get the layer outputs
    
    #sess.run( input_image_var.assign(style) ) 
    style_layers_np = sess.run([ end_points[k] for k in layer_names ], feed_dict={input_image: style})
    
    for i,l in enumerate(style_layers_np):
        style_features[ layer_names[i] ] = l


    #sess.run( input_image_var.assign(photo) ) 
    photo_layers_np = sess.run([ end_points[k] for k in layer_names ], feed_dict={input_image: photo})
    
    for i,l in enumerate(photo_layers_np):
        photo_features[ layer_names[i] ] = l
    
    
    # Print summary of the layers we're capturing
    print("Number of layers to capture as constants : %d" % (len(style_layers_np),))
    for i,l in enumerate(style_layers_np):
        print("  Layer[%d].shape=%s, .name = '%s'" % (i, str(l.shape), layer_names[i],))


Here are what the layers each see (photo on the top, style on the bottom for each set) :

In [None]:
for name in layer_names:
    print("Layer Name : '%s'" % (name,))
    plt.figure(figsize=(12,6))
    for i in range(4):
        plt.subplot(2, 4, i+1)
        plt.imshow(photo_features[ name ][0, :, :, i], interpolation='nearest') # , cmap='gray'
        plt.axis('off')
        
        plt.subplot(2, 4, 4+i+1)
        plt.imshow(style_features[ name ][0, :, :, i], interpolation='nearest') #, cmap='gray'
        plt.axis('off')
    plt.show()

### Define the overall loss / badness function

Let's now create model losses, which involve the ```end_points``` evaluated from the generated image, coupled with the appropriate constant layer losses from above : 

In [None]:
# Initialise the inital 'art' image to being the photo (quicker convergence)
#art_image = np.random.uniform(-128, 128, (1, 3, IMAGE_W, IMAGE_W) ) 
#art_image = photo

art_features = {}
for name in layer_names:  
    art_features[name] = end_points[name]

In [None]:
def gram_matrix(tensor):
    shape = tensor.get_shape()
    
    # Get the number of feature channels for the input tensor,
    # which is assumed to be from a convolutional layer with 4-dim.
    num_channels = int(shape[3])

    # Reshape the tensor so it is a 2-dim matrix. This essentially
    # flattens the contents of each feature-channel.
    matrix = tf.reshape(tensor, shape=[-1, num_channels])
    
    # Calculate the Gram-matrix as the matrix-product of
    # the 2-dim matrix with itself. This calculates the
    # dot-products of all combinations of the feature-channels.
    gram = tf.matmul(tf.transpose(matrix), matrix)
    return gram

def content_loss(P, X, layer):
    p = tf.constant( P[layer] )
    x = X[layer]
    
    loss = 1./2 * tf.reduce_mean(tf.square(x - p))
    return loss

def style_loss(S, X, layer):
    s = tf.constant( S[layer] )
    x = X[layer]
    
    S_gram = gram_matrix(s)
    X_gram = gram_matrix(x)
    
    #layer_shape = layer_shapes[layer]
    layer_shape = s.get_shape()
    N = layer_shape[1]
    M = layer_shape[2] * layer_shape[3]
    
    loss = tf.reduce_mean(tf.square(X_gram - S_gram)) / (4. * tf.cast( tf.square(N) * tf.square(M), tf.float32))
    return loss

#def create_denoise_loss(model):
def total_variation_loss_l1(x):
    loss = tf.add( 
            tf.reduce_sum(tf.abs(x[1:,:,:] - x[:-1,:,:])), 
            tf.reduce_sum(tf.abs(x[:,1:,:] - x[:,:-1,:]))
           )
    return loss
    #return tf.cast(loss, dtype=tf.float32)

In [None]:
# And here are some more TF nodes, to compute the losses using the layer values 'saved off' earlier
losses = []

# content loss
cl = 10.
losses.append( cl *100. * content_loss(photo_features, art_features, 'Mixed_4b'))

# style loss
sl = 20.
losses.append(sl *         style_loss(style_features, art_features, 'Conv2d_1a_7x7'))
losses.append(sl *100.   * style_loss(style_features, art_features, 'Conv2d_2c_3x3'))
losses.append(sl *10000. * style_loss(style_features, art_features, 'Mixed_3b'))
losses.append(sl *10000. * style_loss(style_features, art_features, 'Mixed_4d'))

# total variation penalty
vp = 0.01 /1000.
losses.append(vp *10.    * total_variation_loss_l1(input_image_float))

#total_loss = content_loss(photo_features, art_features, 'Mixed_4b')
#total_loss = style_loss(style_features, art_features, 'Conv2d_2c_3x3')
#total_loss = total_variation_loss_l1(input_image_float)

#['0.000372', '0.681764', '0.002374', '0.000064', '0.000004', '0.074681']
#['2.715543', '0.701646', '0.223764', '0.492285', '0.031124', '0.749490']

total_loss = tf.reduce_sum(losses)

### The *Famous* Symbolic Gradient operation

In [None]:
total_grad = tf.gradients(total_loss, [input_image_float])[0] / 255.0  # Needs scaling due to initial /255.0

### Get Ready for Optimisation by SciPy

This uses the BFGS routine : 
  *  R. H. Byrd, P. Lu and J. Nocedal. A Limited Memory Algorithm for Bound Constrained Optimization, (1995), SIAM Journal on Scientific and Statistical Computing, 16, 5, pp. 1190-1208.

Initialize with the original ```photo```, since going from noise (the code that's commented out) takes many more iterations.

In [None]:
art_image = photo
#art_image = np.random.uniform(0, 255, (image_size, image_size, 3))

x0 = art_image.flatten().astype('float64')
iteration=0

### Optimize all those losses, and show the image

To refine the result, just keep hitting 'run' on this cell (each iteration is about 60 seconds) :

In [None]:
t0 = time.time()

with tf.Session() as sess:
    init_fn(sess)
    
    # These helper functions (to interface with scipy.optimize) must close over sess
    def eval_loss(x):  # x0 is a 3*image_size*image_size float64 vector
        #print("Eval Loss @ ", x)
        x_image = x.reshape(image_size,image_size,3).astype('uint8')
        x_loss = sess.run( total_loss, feed_dict={input_image: x_image} )
        #print("Eval Loss = ", x_loss)
        losses_ = sess.run( losses, feed_dict={input_image: x_image} )
        print("Eval loss components = ", [ "%.6f" % l for l in losses_])
        return x_loss.astype('float64')

    def eval_grad(x):
        #print("Eval Grad @ ", x)
        x_image = x.reshape(image_size,image_size,3).astype('uint8')
        x_grad = sess.run( total_grad, feed_dict={input_image: x_image} )
        #print("Eval Grad.shape = ", x_grad.shape)
        #print("Eval Grad = ", x_grad.flatten()[100:106])
        return x_grad.flatten().astype('float64')

    x0, x0_loss, state = scipy.optimize.fmin_l_bfgs_b( eval_loss, x0, fprime=eval_grad, maxfun=40) 

    iteration += 1

print("Iteration %d, in %.1fsec, Current loss : %.4f" % (iteration, float(time.time() - t0), x0_loss))
plot_layout(x0.reshape(image_size,image_size,3).astype('uint8'))

In [None]:
for v in tf.trainable_variables():
    pass
    #print(v.name)