# Style Transfer

Style transfer is a fascinating technique in image processing where the style of one image is applied to the content of another image. Essentially, it allows you to take the artistic style of a painting, for example, and apply it to a photograph, creating a new image that combines the content of the photograph with the style of the painting.

The VGG-19 model is a convolutional neural network (CNN) that was developed by the Visual Geometry Group (VGG) at the University of Oxford. It is one of the most well-known and widely used deep learning models for image classification and feature extraction.

The VGG-19 model is described in the research paper titled "Very Deep Convolutional Networks for Large-Scale Image Recognition" by Karen Simonyan and Andrew Zisserman. This paper was published in 2014 and is available on arXiv (https://arxiv.org/abs/1409.1556).

The seminal paper that introduced the concept of neural style transfer using the VGG-19 model is titled "A Neural Algorithm of Artistic Style" by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. This paper was published in 2015 and is available on arXiv1 (https://arxiv.org/abs/1508.06576).

The VGG-19 model is popular in style transfer because of its ability to effectively capture and represent the visual features of images, making it ideal for combining the content of one image with the style of another.

Here are some key points about the VGG-19 model:

1. Architecture: VGG-19 consists of 19 layers, including 16 convolutional layers and 3 fully connected layers. The convolutional layers use small 3x3 filters, which helps in capturing fine details in images.

2. Pre-trained Model: In the context of style transfer, the VGG-19 model is often used in its pre-trained form. This means it has already been trained on a large dataset (such as ImageNet) and has learned to extract useful features from images.

3. Feature Extraction: For style transfer, the VGG-19 model is used to extract features from both the content and style images. These features are then used to compute the content and style losses, which guide the optimization process to generate the final stylized image.

4. Layer Selection: Different layers of the VGG-19 model capture different levels of abstraction. Lower layers capture basic features like edges and textures, while higher layers capture more complex patterns. In style transfer, specific layers are chosen to compute the content and style representations.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms, models
from PIL import Image
import copy

In [2]:
# Define the device to use (CPU or GPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [3]:
# Load the images
def load_image(image_path, max_size=400, shape=None):
    image = Image.open(image_path).convert('RGB')
    
    if max_size:
        size = max_size if max(image.size) > max_size else max(image.size)
    if shape:
        size = shape
    
    in_transform = transforms.Compose([
        transforms.Resize(size),
        transforms.ToTensor(),
        transforms.Normalize((0.485, 0.456, 0.406), 
                             (0.229, 0.224, 0.225))])
    
    image = in_transform(image)[:3, :, :].unsqueeze(0)
    
    return image

In [4]:
# Load content and style images from data folder and send to the device for model inference
content = load_image('data/santosh-sharma-content.jpg').to(device)
style = load_image('data/radha-krishna-style.jpg', shape=content.shape[-2:]).to(device)

In [None]:
# Display the images
import matplotlib.pyplot as plt
from PIL import Image

def displayImage(image_path):
    image = Image.open(image_path)
    plt.imshow(image)
    plt.axis('off')  # Hide the axes
    plt.show()

displayImage('data/santosh-sharma-content.jpg')
displayImage('data/radha-krishna-style.jpg')

In [None]:
# Load the VGG19 model with pretrained weights and extract the feature layers (i.e. convolutional and pooling layers)
vgg = models.vgg19(pretrained=True).features

In [None]:
# Freeze all VGG parameters since we're only optimizing the target image
for param in vgg.parameters():
    param.requires_grad_(False)

vgg.to(device)

In [8]:
# Define content and style loss functions
def get_features(image, model, layers=None):
    if layers is None:
        layers = {'0': 'conv1_1', '5': 'conv2_1', '10': 'conv3_1', 
                  '19': 'conv4_1', '21': 'conv4_2', '28': 'conv5_1'}
    features = {}
    x = image
    for name, layer in model._modules.items():
        x = layer(x)
        if name in layers:
            features[layers[name]] = x
    return features

In [9]:
# Get content and style features
content_features = get_features(content, vgg)
style_features = get_features(style, vgg)

In [10]:
def gram_matrix(tensor):
    _, d, h, w = tensor.size()
    tensor = tensor.view(d, h * w)
    gram = torch.mm(tensor, tensor.t())
    return gram

In [11]:
# Calculate the gram matrices for each layer of our style representation
style_grams = {layer: gram_matrix(style_features[layer]) for layer in style_features}

In [12]:
# Create a target image and prepare it for optimization
target = content.clone().requires_grad_(True).to(device)

In [13]:
# Define weights for each style layer
style_weights = {'conv1_1': 1.0, 'conv2_1': 0.75, 'conv3_1': 0.2, 'conv4_1': 0.2, 'conv5_1': 0.2}

In [14]:
# Define weights for content and style loss
content_weight = 1  # alpha
style_weight = 1e6  # beta

In [15]:
# Set up optimizer
optimizer = optim.Adam([target], lr=0.003)
steps = 5000  # number of iterations

In [16]:
# Convert the target image back to a PIL image and save it
import numpy as np
def im_convert(tensor):
    image = tensor.to("cpu").clone().detach()
    image = image.numpy().squeeze()
    image = image.transpose(1, 2, 0)
    image = image * np.array((0.229, 0.224, 0.225)) + np.array((0.485, 0.456, 0.406))
    image = image.clip(0, 1)
    return Image.fromarray((image * 255).astype('uint8'))

In [None]:
for ii in range(1, steps+1):
    target_features = get_features(target, vgg)
    content_loss = torch.mean((target_features['conv4_2'] - content_features['conv4_2'])**2)
    
    style_loss = 0
    for layer in style_weights:
        target_feature = target_features[layer]
        target_gram = gram_matrix(target_feature)
        style_gram = style_grams[layer]
        layer_style_loss = style_weights[layer] * torch.mean((target_gram - style_gram)**2)
        style_loss += layer_style_loss
    
    total_loss = content_weight * content_loss + style_weight * style_loss
    
    optimizer.zero_grad()
    total_loss.backward()
    optimizer.step()
    
    if ii % 500 == 0:
        print('Total loss: ', total_loss.item())

        intermediate_image = im_convert(target)
        intermediate_image.save('data/output_image_' + str(content_weight) + '_' + str(style_weight) + '_' + str(ii) + '.jpg')


In [18]:
final_image = im_convert(target)

In [19]:
final_image.save('data/output_image_final_' + str(content_weight) + '_' + str(style_weight) + '_' + str(ii) + '.jpg')

In [None]:
# Display the output image
displayImage('data/output_image.jpg')