# Neural style transfer

**Define object : Conserving the content of the original iamge while adopting the style of the reference image. **

Below, mathmatically define the Object 

$$
Loss(image \ to \ transfer) 
= distance(style(reference_image) - style(generated_image)) + \\
distance(content(original_image) - content(generated_image))
$$

* Distance 
  -  Norm kind fucntion to minimize 

* Content loss - Consistency between Original and Generated one 

 - would be captured in upper layers 
 - $L2$ norm between the activations of an upper layer in a pretrained convnet, compute over the target image, and the activations of the same layer computedd over the generated image. 
 - Use only a layer 

* Style

  - use Multiple layer 
  - Use $Gram \ matrix$ of a layer's activation : which is Inner product of the feature maps of a given layer. -> it represents a map of correlations between the layer's feature. These feature corelations capture the statistics of the patterns of a particular spatial scale, which empirically correspond to the appearance of the textures found at this scale. 
  - aims to preserve similar internal correlations within the activations of different layers, across the style-reference image and the generated image. 
  - It gurantees that the textures found at different spatial scales look similar across the style-reference image and the generated one.

### What I am doing 

- Preserve content by maintaing similar high-level layer activations between the target content image and the generated image. The convnet should see both the target iamge, and generated image as containing the same things.

- Preserve style by maintaining similar correlations within actvations for both low level layers and high-level layers. 
> Feature correlations capture textures : generated iamge and the syle reference image should share the same textures at different spatial scales. 

### Implementing 

* Pretrained model - VGG19

  1. Setting up a network that computes VGG19 layer activations for the style-feference image, the target image, and the generated image at the same time. 

  2.  Use the layer activations computed over these three iamges to define the loss function described earlier, which you'll minimize in order to achieve style transfer. 
  
  3. Setting up gradient-descent process to minimize this loss function.
  
  
- We use images which have 400px height fixxed (widely different sizes make style transger more difficult)

In [6]:
from keras.preprocessing.image import load_img, img_to_array
from keras.applications import vgg19
target_image_path = 'target_image.png'
style_reference_iamge_path = 'style_reference.png'

width, height = load_img(target_image_path).size

img_height = 400 
img_width = int(width * img_height / height)

In [4]:
def preprocess_image(image_path):
    # this line to be refactored 
    img = load_img(image_path, target_size=(img_height, img_width))
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg19.preprocess_input(img)
    return img 

def deprocess_image(x):
    
    #### where these numbers came from
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    
    x = x[:, :, ::-1] # BGR to RGB
    x = np.clip(x, 0, 255).astype('uint8')
    return x 

In [8]:
from keras import backend as K

target_image = K.constant(preprocess_image(target_image_path))
style_reference_iamge = K.constant(preprocess_image(style_reference_iamge_path))

# placeholder that will contain the generated image 
combination_image = K.placeholder((1, img_height, img_width, 3))

# combines the three images in a single batch 
input_tensor = K.concatenate([target_image, 
                             style_reference_iamge,
                             combination_image], axis=0)


model = vgg19.VGG19(input_tensor=input_tensor,
                   weights='imagenet', include_top=False)

### input_tensor 
print("model loaded sucessfully")

model loaded sucessfully


In [15]:
def content_loss(base, combination):
    return K.sum(K.square(combination - base))

def gram_matrix(x):
    features = K.batch_flatten(K.permute_dimensions(x, (2, 0, 1)))
    gram = K.dot(features, K.transpose(features))
    return gram

def style_loss(style, combination):
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3 
    size = img_height * img_width
    
    return K.sum(K.square(S - C)) / (4. * (channels**2) * (size**2))

In [10]:
def total_variation_loss(x) :
    """
    it operates on the pixels of the generated combination image. 
    It encourages spatial
    """
    # y축으로의 변화
    a = K.square(
        x[:, :img_height - 1, :img_width - 1, :] - x[:, 1:, :img_width - 1, :])
    
    # x축으로의 변화 
    b = K.square(
        x[:, :img_height - 1, :img_width -1, :] - x[:, :img_height - 1, 1:, :])
    
    return K.sum(K.pow(a + b, 1.25))

In [11]:
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])

# layer for content loss
content_layer = 'block5_conv2'

# layers for style loss
style_layers = ['block1_conv1', 
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1']

total_variation_weight = 1e-4
style_weight = 1.
content_weight = 0.025

In [13]:
loss = K.variable(0.) 

layer_features = outputs_dict[content_layer]
target_image_features = layer_features[0, :, :, :]
combination_features = layer_features[2, :, :, :]

loss += content_weight * content_loss(target_image_features, combination_features)



In [16]:
for layer_name in style_layers:
    layer_features = outputs_dict[layer_name]
    style_reference_features = layer_features[1, :, :, :]
    combination_features = layer_features[2, :, :, :]
    s1 = style_loss(style_reference_features, combination_features)
    loss += (style_weight / len(style_layers)) * s1
    
loss += total_variation_weight * total_variation_loss(combination_image)

In [17]:
grads = K.gradients(loss, combination_image)[0]

fetch_loss_and_grads = K.function([combination_image], [loss, grads])

class Evaluator(object):
    
    def __init__(self):
        self.loss_value = None
        self.grads_values = None
        
    def loss(self, x):
        assert self.loss_value is None
        x = x.reshape((1, img_height, img_width, 3))
        outs = fetch_loss_and_grads([x])
        loss_value = outs[0]
        grad_values = outs[1].flatten().astype("float64")
        self.loss_value = loss_value
        self.grad_value = grad_values
        return self.loss_value
    
    def grads(self, x):
        assert self.loss_value is not None 
        
        grad_values = np.copy(self.grad_values)
        self.loss_value = None
        self.grad_values = None 
        return grad_values

In [18]:
evaluator = Evaluator()

In [None]:
from scipy.optimize import fmin_l_bfgs_b
from scipy.misc import imsave 
import tensorflow as tf
import time 

K.get_session().run(tf.global_variables_initializer())

result_prefix = 'result_NST'
iterations = 20 

x = preprocess_image(target_image_path)

x = x.flatten()
for i in range(iterations):
    print("iter : {}".format(i))
    start_time = time.time()
    x, min_val, info = fmin_l_bfgs_b(evaluator.loss,
                                    x,
                                    fprime=evaluator.grads,
                                    maxfun=20)
    
    print("Loss {} : {}".format(i ,min_val))
    img = x.copy().reshape((img_height, img_width, 3))
    img = deprocess_image(img)
    fname = result_prefix + '_at_iteration_%d.png' % i
    
    imsave(fname, img)
    print('image saved as {}'.format(fname))
    end_time = time.time()
    print('Iter : {} completed in {}'.format(i, end_time - start_time))

iter : 0
