# Chapter 8

### Reweighing a probability distribution to a different temperature

In [None]:
import numpy as np

# original distribution is 1D numpy array of probability value that add up to 1.
# Temperature is a factor quantifying the entropy of the output distribution

def reweight_distribution(original_distribution, temperature=0.5):
    distribution = np.log(original_distribution) / temperature
    distributino = np.exp(distribution)
    
    return distribution/np.sum(distribution)

# Reweighted version of the original distribution. 
# The sum of the distribution may no longer be 1, 
# so you divide it by its sum to obtain the new distribution

### Implementing character-level LSTM text generation

In [2]:
# Downloading and parsin the initial text file
import keras
import numpy as np

path = keras.utils.get_file('nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')

text = open(path).read().lower()
print('Corpus Length', len(text))

Using TensorFlow backend.


Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt


Exception: URL fetch failure on https://s3.amazonaws.com/text-datasets/nietzsche.txt: None -- [Errno 8] nodename nor servname provided, or not known

Next, you’ll extract partially overlapping sequences of length maxlen, one-hot encode them, and pack them in a 3D Numpy array x of shape (sequences, maxlen, unique_characters). Simultaneously, you’ll prepare an array y containing the corre- sponding targets: the one-hot-encoded characters that come after each extracted sequence.

In [3]:
# Vectorizing sequences of characters

# You'll extract sequences of 60 characters
maxlen= 60

# You'll sample a new sequence every 3 characters
step = 3

# Holds the extracted sequences
sentences = []

# Holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i:i+maxlen])
    next_chars.append(text[i+maxlen])
    
print('Number of sequences:',len(sentences))

chars = sorted(list(set(text)))
print('Unique characters:', len(chars))

# Dictionary that maps unique character to their index in the list "chars"
char_indices = dict((char, chars.index(char)) for char in chars)

print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
             
for i, sentences in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_char[i]]] = 1

SyntaxError: unexpected EOF while parsing (<ipython-input-3-15f8103ebff6>, line 15)

### Building The Network

Next, you’ll extract partially overlapping sequences of length maxlen, one-hot encode them, and pack them in a 3D Numpy array x of shape (sequences, maxlen, unique_characters). Simultaneously, you’ll prepare an array y containing the corre- sponding targets: the one-hot-encoded characters that come after each extracted sequence.

In [None]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

### Compile the Model
Because your targets are one-hot encoded, you’ll use categorical_crossentropy as
the loss to train the model.


In [None]:
optimizer = keras.application.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

Given a trained model and a seed text snippet, you can generate new text by doing the following repeatedly:
1 Draw from the model a probability distribution for the next character, given the generated text available so far.
2 Reweight the distribution to a certain temperature.
3 Sample the next character at random according to the reweighted distribution.
4 Add the new character at the end of the available text.
This is the code you use to reweight the original probability distribution coming out of the model and draw a character index from it (the sampling function).

### Function to sample the next character given the model's predictions

In [None]:
def sample(preds,temperature =1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

Finally, the following loop repeatedly trains and generates text. You begin generating text using a range of different temperatures after every epoch. This allows you to see how the generated text evolves as the model begins to converge, as well as the impact of temperature in the sampling strategy.

### Text Generation Loop

In [None]:
import random
import sys

# Train the model for 60 epochs
for epoch in range(1,60):
    print('epoch', epoch)
    # Fits the model for one iteration on the data
    model.fit(x, y, batch_size=128, epochs=1)
    
    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text(start_index: start_index + maxlen)
    print('----Generating with seed:"' + generated_text + '"')
    
    for temperature in [0.2,0.5, 1.0, 1.2]:
        print('--------- temperature:', temperature)
        sys.stdout.write(generated_text)
    

### Implementing DeepDream in Keras

In [None]:
# loading the pretrained Inception V3 model

from keras.application import inception_v3
from keras import backend as K

# You won't be training the model so this command disables all training specfici operations
K.set_learning_phase(0)

# Build the Inception V3 netwokrk without its convolutional base. The model will
# loaded with pretrained ImageNet weights
model = inception_v3.InceptionV3(weights='imagenet',
                                include_top=False)

Next, you’ll compute the loss: the quantity you’ll seek to maximize during the gradient-ascent process. In chapter 5, for filter visualization, you tried to maximize the value of a specific filter in a specific layer. Here, you’ll simultaneously maximize the activation of all filters in a number of layers. Specifically, you’ll maximize a weighted sum of the L2 norm of the activations of a set of high-level layers. The exact set of layers you choose (as well as their contribution to the final loss) has a major influence on the visuals you’ll be able to produce, so you want to make these parameters easily configurable. Lower layers result in geometric patterns, whereas higher layers result in visuals in which you can recognize some classes from ImageNet (for example, birds or dogs). You’ll start from a somewhat arbitrary configuration involving four layers—but you’ll definitely want to explore many different configurations later.


### Setting Up the DeepDream Configuration 

In [None]:
# Dictionary mapping layer names ot a coefficient quantifying how much the layer's 
# activation contributes to the loss you'll seek to maximize. Note that the layer
# are hardcoded in the built-in Inception V3 application. You can list all layer
# name using model.summary()

layer_contributions = {
    'mixed2': 0.2,
    'mixed3': 3.,
    'mixed4': 2.,
    'mixed5': 1.5,
}

### Defining the loss to be maximized
Now, let’s define a tensor that contains the loss: the weighted sum of the L2 norm of
the activations of the layers in listing 8.9.

In [None]:
# Creates a dictionary that maps layer names to a layer instances
layer_dict = dict([(layer.name, layer) for layer in model.layers])

# You'll define the loss by adding layer contribution to this scalar variable
loss = K.variable(0.)
for layer_name in layer_contributions:
    coeff = layer_contribution[layer_name]
    # Retrieves the layer's output
    activation = layer_dict[layer_name].output
    
    # Adds the L2 norm of the features of a layer to the loss. You avoid border
    # artifacts by only involving nonborder pixels in the loss.
    scaling = K.prod(K.cast(K.shape(activation), 'float32'))
    loss += coeff * K.sum(K.square(activation[:,2,-2,2:-2, :])) / scaling

### Gradient-ascent Process

In [None]:
# This tensor holds the generated image: the dream
dream = model.input

# Computes the gradients of the dream with regard to the loss
grads = K.gradients(loss, dream)[0]

# Normalizes the gradients (important trick)
grads /= K.maximum(K.mean(K.abs(grads)), 1e-7)

# Sets up a Keras function to retrieve the value of the loss and gradient, 
# given an input image
outputs = [loss,grads]
fetch_loss_and_grades = K.function([dream], outputs)

def eval_loss_and_grade(x):
    outs = fetch_loss_and_grades([x])
    loss_value = outs[0]
    grad_value = outs[1]
    return loss_value, grad_value

def gradient_ascent(x, iterations, step, max_loss=None):
    for i in range(iterations):
        loss_value, grad_values = eval_loss_and_grads(x)
        if max_loss is not None and loss_value > max_loss:
            break
        print('...Lose Valu at', i, ':', loss_value)
        x += step * grad_value
    return x

Finally: the actual DeepDream algorithm. First, you define a list of scales (also called octaves) at which to process the images. Each successive scale is larger than the previ- ous one by a factor of 1.4 (it’s 40% larger): you start by processing a small image and then increasingly scale it up (see figure 8.4).

### Running Gradient Ascent over Different Successive Scales

In [None]:
import numpy as np

step = 0.01 #Gradient ascent step size
num_octave = 3 # Number of scales at which to run gradient ascent
octave_scale = 1.4 #Size ration betweent scales
iterations = 20 # Number of ascent steps so sun at each scale

# If the loss grows larger than 10, you'll interrupt the gradient-ascent process to avoid
# ugly artifacts
max_loss = 10 
base_image_path = '...'

# Load the base image into a Numpy array
img = preprocess_image(base_image_path)

# Prepares a list of shape tuples defining the different scales at which to run 
# gradient ascent
original_shape = img.shape[1:3]
successive_shapes = [original_shape]
for i in range(1, num_octave):
    shape = tuple([int(dim / (octave_scale ** i))
        for dim in original_shape])
    successive_shapes.append(shape)
    
# Reverses the list of shapes so they’re in increasing order
successive_shapes = successive_shapes[::-1]

# Resizes the Numpy array of the image to the smallest scale
original_img = np.copy(img)
shrunk_original_img = resize_img(img, successive_shapes[0])

# Scale of the dream image
for shape in successive_shapes:
    print('Processing image shape', shape)
    img = resize_img(img, shape)
    
    # Runs gradient ascent altering the dream
    img = gradient_ascent(img, iterations=iterations,
                      step=step,
                      max_loss=max_loss)
    
    # Scales up the smaller version of the original image: it will be pixellated
    upscaled_shrunk_original_img = resize_img(shrunk_original_img, shape)
    # Computes the high-quality version of the original image at this size
    same_size_original = resize_img(original_img, shape)
    # The difference between the two is the detail that was lost when scaling up
    lost_detail = same_size_original - upscaled_shrunk_original_img
    
    # Reinjects lost detail into the dream
    img += lost_detail
    shrunk_original_img = resize_img(original_img, shape)
    save_img(img, fname='dream_at_scale_' + str(shape) + '.png')
    
save_img(img, fname='final_dream.png')

Note that this code uses the following straightforward auxiliary Numpy functions, which all do as their names suggest. They require that you have SciPy installed.

### Auxillary functions

In [None]:
import scipy
from keras.preprocessing import image

def resize_img(img, size):
    img = np.copy(img)
    factors = (1,
               float(size[0]) / img.shape[1],
               float(size[1]) / img.shape[2],
               1)
    return scipy.ndimage.zoom(img, factors, order=1)

def save_img(img, fname):
    pil_img = deprocess_image(np.copy(img))
    scipy.misc.imsave(fname, pil_img)
    
# Util function to open, resize, and format pictures into tensors that Inception V3 can process
def preprocess_image(image_path):
    img = image.load_img(image_path)
    img = image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = inception_v3.preprocess_input(img)
    return img

# Util function to convert a tensor into a valid image
def deprocess_image(x):
    if K.image_data_format() == 'channels_first':
        x = x.reshape((3, x.shape[2], x.shape[3]))
        x = x.transpose((1, 2, 0))
    else:
        # Undoes preprocessing that was performed by inception_v3.preprocess_ input
        x = x.reshape((x.shape[1], x.shape[2], 3))
    x /= 2.
    x += 0.5
    x *= 255.
    x = np.clip(x, 0, 255).astype('uint8')
    return x

### Neural Style Transfer Loss Function

In [None]:
loss = distance(styple(reference_image) - style(generated_image)) + 
distance(content(original_image) - content(generated_image))

Here, distance is a norm function such as the L2 norm, content is a function that takes an image and computes a representation of its content, and style is a function that takes an image and computes a representation of its style. Minimizing this loss causes style(generated_image) to be close to style(reference_image), and content(generated_image) is close to content(generated_image), thus achieving style transfer as we defined it.
A fundamental observation made by Gatys et al. was that deep convolutional neu- ral networks offer a way to mathematically define the style and content functions. Let’s see how.

### Neural Style Transfer in Keras 


Neural style transfer can be implemented using any pretrained convnet. Here, you’ll use the VGG19 network used by Gatys et al. VGG19 is a simple variant of the VGG16 net- work introduced in chapter 5, with three more convolutional layers.
This is the general process:
1 Set up a network that computes VGG19 layer activations for the style-reference image, the target image, and the generated image at the same time.
2 Use the layer activations computed over these three images to define the loss function described earlier, which you’ll minimize in order to achieve style transfer.
3 Set up a gradient-descent process to minimize this loss function.
Let’s start by defining the paths to the style-reference image and the target image. To make sure that the processed images are a similar size (widely different sizes make style transfer more difficult), you’ll later resize them all to a shared height of 400 px.

### Defining Initial Variables

In [None]:
from keras.preprocessing.image impot load_img, img_to_array

# Path to the image that you want to transform
target_image_path = 'img/portrait.jpg'
# Path to the style image
style_reference_image_path= 'img/transfer_style_reference.jpg' 

# Dimesions generated by the image
width, height = load(target_img_path).size 
img_hieght = 400
img_width = int(width*img_height / height)

# Auxillary function
import numpy as np 
from keras.applications import vgg19

def proprocess_image(image_path):
    img = load_img(image_path, target_size=(img_height, img_width))
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg19.preprocess_input(img)
    return img

### Loading the Pretrained VGG19 Network and applying it to three images

In [None]:
from keras import backend as K
target_image = K.constant(preprocess_image(target_image_path))
style_reference_image = K.constant(preprocess_image(style_reference_image_path))
combination = K.placeholder((1, img_height, img_width, 3))

input_tensor = K.concatenate([target_image, style_reference_image, combination_image], axis=0)

model = vgg19.VGG19(input_tensor=input_tensor, 
                   weights='imagenet',
                   include_top=False)

print('Model loaded.')

### Content Loss

Next is the style loss. It uses an auxiliary function to compute the Gram matrix of an input matrix: a map of the correlations found in the original feature matrix.

In [3]:
def gram_matrix(x):
    features = K.batch_flatten(K.permute_dimensions(x, (2,0,1)))
    gram = K.dot(features, K.transpose(features))
    return gram

def style_loss(style, combination):
    S = gram_matrix(style)
    C = gram_matir(combination)
    channels = 3
    size = img_height * img_width
    return K.sum(K.square(S -C)/(4. * (channels ** 2) * (size **2 )))

To these two loss components, you add a third: the total variation loss, which operates on the pixels of the generated combination image. It encourages spatial continuity in the generated image, thus avoiding overly pixelated results. You can interpret it as a regularization loss.

### Total Variation loss

In [None]:
 def total_variation_loss(x):
        a = K.square(
            x[:, :img_height - 1, :img_width - 1, :] -
            x[:, 1:, :img_width - 1, :])
        b = K.square(
            x[:, :img_height - 1, :img_width - 1, :] -
            x[:, :img_height - 1, 1:, :])
        return K.sum(K.pow(a + b, 1.25))


The loss that you minimize is a weighted average of these three losses. To compute the content loss, you use only one upper layer—the block5_conv2 layer—whereas for the style loss, you use a list of layers than spans both low-level and high-level layers. You add the total variation loss at the end.
Depending on the style-reference image and content image you’re using, you’ll likely want to tune the content_weight coefficient (the contribution of the content loss to the total loss). A higher content_weight means the target content will be more recognizable in the generated image.

### Setting up the gradient Descent Process