In neural style transfer, we take a content image and a style image. Then, we generate an image to have the content of the content image and the artistic style of the style image. 

## loading Data

In [None]:
from PIL import Image
path2content= "./data/content.jpg"
path2style= "./data/style.jpg"
content_img = Image.open(path2content)
style_img = Image.open(path2style)

In [None]:
content_img

In [None]:
style_img

In [None]:
import torchvision.transforms as transforms

h, w = 256, 384 
mean_rgb = (0.485, 0.456, 0.406)
std_rgb = (0.229, 0.224, 0.225)

transformer = transforms.Compose([
                    transforms.Resize((h,w)),  
                    transforms.ToTensor(), # scale the pixels' values to the range of [0, 1]
                    transforms.Normalize(mean_rgb, std_rgb)])  # depend on the pretrained model

In [None]:
content_tensor = transformer(content_img)
print(content_tensor.shape, content_tensor.requires_grad) # no optimization is applied to the content and style images, this attribute should be False

In [None]:
style_tensor = transformer(style_img)
print(style_tensor.shape, style_tensor.requires_grad)

In [None]:
input_tensor = content_tensor.clone().requires_grad_(True)
print(input_tensor.shape, input_tensor.requires_grad)

In [None]:
import torch

def imgtensor2pil(img_tensor):
    img_tensor_c = img_tensor.clone().detach() # cloned the tensor to prevent making any changes to the original tensor
    
    # normalize back from zero-mean-unit-variance normalization to original values
    img_tensor_c*=torch.tensor(std_rgb).view(3, 1,1)
    img_tensor_c+=torch.tensor(mean_rgb).view(3, 1,1)
    
    img_tensor_c = img_tensor_c.clamp(0,1) # made sure that the values are in the range [0, 1] by using the .clamp()
    img_pil=to_pil_image(img_tensor_c)
    return img_pil

As you have seen throughout this book, we have followed a few standard steps: we loaded the input and target data, defined a model, an objective function and optimizer, and then trained the model by updating the model parameters using the gradient-descent algorithm. In all of these past cases, the input to the model was kept unchanged during the training process and the model was updated.

Now, imagine a situation whereby we keep the model parameters fixed and instead update the input to the model during training. This twist is the intuition behind the neural style transfer algorithm.

Specifically, the neural style transfer algorithm works as follows:

1. Take a pretrained classification model (for example, VGG19), remove the last layers, and keep the remaining layers to serve as a feature extractor.

2. Feed the content image to the model and get selected features to serve as the target content.

3. Feed the style image to the model and get the Gram matrix of selected features to serve as the target style.

4. Feed the input to the model and get the features and the Gram matrix of selected features to serve as the predicted content and style, respectively.

5. Compute the content and style errors, and use this information to update the input and reduce the error.

6. Repeat step 4 until the error is minimized.


In [None]:
import matplotlib.pylab as plt
%matplotlib inline
from torchvision.transforms.functional import to_pil_image

plt.imshow(imgtensor2pil(content_tensor))
plt.title("content image");

In [None]:
plt.imshow(imgtensor2pil(style_tensor))
plt.title("style image");

In [None]:
import torchvision.models as models

device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")

"""
The model expects the input to be mini-batches of shape (3, Height, Width) and normalized. 
This is why we transformed the content and style images in the Loading the data section.
"""
model_vgg = models.vgg19(pretrained=True).features.to(device).eval()

# Freeze the model parameters
"""
We froze the model parameters using the requires_grad_ method to avoid any changes to the 
model during the algorithm optimization. 
???
"""
for param in model_vgg.parameters():
    param.requires_grad_(False)   
print(model_vgg)

In [None]:
"""
Get the intermediate features of the pretrained model. These features will be used in 
calculating the style and content loss values.
"""
def get_features(x, model, layers):
    features = {}
    for name, layer in enumerate(model.children()):
        x = layer(x)
        if str(name) in layers:
            features[layers[str(name)]] = x
    return features

In [None]:
"""
Compute the Gram matrix of a tensor, which will be used to calculate the style loss value.
The input tensor x comes from the intermediate features of the model. In the helper function,
we reshaped x from a 4D tensor to a 2D tensor and then calculated the Gram matrix. The 
output is a tensor of shape [c, c].
"""
def gram_matrix(x):
    # x: A tensor of shape [1, c, h, w], where c, h, w are the number of channels, height, and width of x
    n, c, h, w = x.size() 
    x = x.view(n*c, h * w) 
    gram = torch.mm(x, x.t()) 
    return gram

In [None]:
import torch.nn.functional as F
"""
Compute the content loss. We extracted the target and predicted tensors for the layer 
specified by the argument layer. Then, we calculated the mean squared error (MSE) between 
the two tensors and returned its value.
"""
def get_content_loss(pred_features, target_features, layer):
    """
    pred_features: A Python dictionary containing the intermediate features of the model given the input tensor
    target_features: A Python dictionary containing the intermediate features of the model given the content tensor
    layer: A string containing the layer name
    """
    target= target_features[layer]
    pred = pred_features [layer]
    loss = F.mse_loss(pred, target)
    return loss

In [None]:
"""
We iterated over the style layers and extracted the predicted and target tensors per layer. 
Then, we calculated the Gram matrix for the two tensors and used them to compute the MSE. 
The loss value was calculated per layer, multiplied by the layer weight, normalized and then 
added all together. The function returned the accumulated loss value for all the layers 
included in the style loss.
"""
def get_style_loss(pred_features, target_features, style_layers_dict):  
    """
    pred_features: A Python dictionary containing intermediate features of the model given the input tensor
    target_features: A Python dictionary containing intermediate features of the model given the style tensor
    style_layers_dict: A Python dictionary containing the name and weight of the layers included in the style loss
    """
    loss = 0
    for layer in style_layers_dict:
        pred_fea = pred_features[layer]
        pred_gram = gram_matrix(pred_fea)
        n, c, h, w = pred_fea.shape
        target_gram = gram_matrix (target_features[layer])
        layer_loss = style_layers_dict[layer] *  F.mse_loss(pred_gram, target_gram)
        loss += layer_loss/ (n* c * h * w)
    return loss

In [None]:
# Get the features for the content and style images
"""
We called the get_features helper function to get the content and style features. 
To get the features, we passed the content tensor and style tensor to the helper function. 
Notice that we added a dimension to the tensors using the .unsqueeze method since the model 
input shape is [1, 3, height, width]. The name and number of the layers were defined in the 
Python dictionary, feature_layers. 
"""
feature_layers = {'0': 'conv1_1',
                  '5': 'conv2_1',
                  '10': 'conv3_1',
                  '19': 'conv4_1',
                  '21': 'conv4_2',  
                  '28': 'conv5_1'}

con_tensor = content_tensor.unsqueeze(0).to(device)
sty_tensor = style_tensor.unsqueeze(0).to(device)

content_features = get_features(con_tensor, model_vgg, feature_layers)
style_features = get_features(sty_tensor, model_vgg, feature_layers)

In [None]:
# for debugging purposes only
for key in content_features.keys():
    print(content_features[key].shape)

In [None]:
from torch import optim
"""
We defined the input tensor. If you recall, the goal in the neural style transfer 
algorithm is to update the input to minimize the loss function. The input can be 
initialized randomly or with the content image. As was observed, we cloned the content 
tensor as the input tensor. Notice that the requires_grad method should be set to True 
since we want to be able to update the input tensor.
"""
input_tensor = con_tensor.clone().requires_grad_(True) # Initialize the input tensor with the content tensor:
optimizer = optim.Adam([input_tensor], lr=0.01)

In [None]:
# Set the hyperparameters
"""
These parameters define the contributions of the content loss and style loss in the 
overall loss value. A higher style_weight parameter is usually desirable compared to 
content_weight, but you can play around with the values to see their contributions.
"""
num_epochs = 300
content_weight = 1e1
style_weight = 1e4
content_layer = "conv5_1" # we used conv5_1 as the content layer. You can set this parameter to a different layer and see the impact on the outcome
# define the name and weight of the layers included in the style loss, can also be changed as you desire
style_layers_dict = { 'conv1_1': 0.75,
                      'conv2_1': 0.5,
                      'conv3_1': 0.25,
                      'conv4_1': 0.25,
                      'conv5_1': 0.25}

for epoch in range(num_epochs+1):
    optimizer.zero_grad()
    input_features = get_features(input_tensor, model_vgg, feature_layers)
    content_loss = get_content_loss (input_features, content_features, content_layer)
    style_loss = get_style_loss(input_features, style_features, style_layers_dict)
    neural_loss = content_weight * content_loss + style_weight * style_loss
    neural_loss.backward(retain_graph=True)
    optimizer.step()
    
    if epoch % 100 == 0:
        print('epoch {}, content loss: {:.2}, style loss {:.2}'.format(
          epoch,content_loss, style_loss))


In [None]:
plt.imshow(imgtensor2pil(input_tensor[0].cpu()));