# Neural Style Transfer and Creative Applications

## Introduction

Neural Style Transfer is an exciting application of deep learning that allows us to create artistic images by combining the content of one image with the style of another. This technique leverages convolutional neural networks (CNNs) to extract and recombine image content and style representations.

In this tutorial, we'll explore how neural networks can create artistic images through style transfer. We'll delve into the underlying mathematics, provide example code using PyTorch, and explain the processes involved. We'll reference key papers and discuss some of the latest developments in this field. Relevant imagery (generated by code) will be included to enhance understanding.

## Table of Contents

1. [Understanding Neural Style Transfer](#1)
   - [What is Neural Style Transfer?](#1.1)
   - [Underlying Mathematics](#1.2)
2. [Implementing Neural Style Transfer](#2)
   - [Prerequisites](#2.1)
   - [Code Implementation with PyTorch](#2.2)
3. [Creative Applications](#3)
   - [Advanced Techniques](#3.1)
   - [Latest Developments](#3.2)
4. [Conclusion](#4)
5. [References](#5)

<a id="1"></a>
# 1. Understanding Neural Style Transfer

<a id="1.1"></a>
## 1.1 What is Neural Style Transfer?

Neural Style Transfer (NST) is a technique that takes two images—a **content image** and a **style image**—and blends them together so that the output image looks like the content image painted in the style of the style image.

**Example:**

- **Content Image**: A photograph of a cityscape.
- **Style Image**: A painting by Vincent van Gogh.
- **Output Image**: The cityscape rendered in the style of van Gogh's painting.

This technique was introduced by [Gatys et al., 2015](#ref1), who demonstrated that deep CNNs can be used to separate and recombine content and style of natural images.

<a id="1.2"></a>
## 1.2 Underlying Mathematics

Neural Style Transfer relies on the representations learned by a pre-trained CNN, typically the VGG network. The process involves minimizing a loss function that captures the differences in content and style between the generated image and the input images.

### Content Representation

The content loss measures the difference in high-level features between the generated image $( \mathbf{G} )$ and the content image $( \mathbf{C} )$. This is computed using the activations from a specific layer (e.g., layer $( l )$) of the CNN:

$[
\mathcal{L}_{\text{content}}(\mathbf{G}, \mathbf{C}) = \frac{1}{2} \sum_{i,j} (F_{ij}^{l} - P_{ij}^{l})^{2}
]$

- $( F^{l} )$: Feature representation of $( \mathbf{G} )$ at layer $( l )$.
- $( P^{l} )$: Feature representation of $( \mathbf{C} )$ at layer $( l )$.

### Style Representation

The style loss measures the difference in style between $( \mathbf{G} )$ and the style image $( \mathbf{S} )$. This is done by comparing the **Gram matrices** of their feature maps:

$[
\mathcal{L}_{\text{style}}(\mathbf{G}, \mathbf{S}) = \sum_{l} w_{l} E_{l}
]$

Where $( E_{l} )$ is the style loss for layer $( l )$:

$[
E_{l} = \frac{1}{4 N_{l}^{2} M_{l}^{2}} \sum_{i,j} (G_{ij}^{l} - A_{ij}^{l})^{2}
]$

- $( G^{l} )$: Gram matrix of $( \mathbf{G} )$ at layer $( l )$.
- $( A^{l} )$: Gram matrix of $( \mathbf{S} )$ at layer $( l )$.
- $( N_{l} )$: Number of feature maps at layer $( l )$.
- $( M_{l} )$: Height $( \times )$ Width of the feature map at layer $( l )$.
- $( w_{l} )$: Weight assigned to layer $( l )$.

The Gram matrix $( G^{l} )$ is computed as:

$[
G_{ij}^{l} = \sum_{k} F_{ik}^{l} F_{jk}^{l}
]$

### Total Variation Loss (Optional)

To encourage spatial smoothness in the generated image, a total variation loss can be added:

$[
\mathcal{L}_{\text{tv}}(\mathbf{G}) = \sum_{i,j} ((G_{i,j+1} - G_{i,j})^{2} + (G_{i+1,j} - G_{i,j})^{2})
]$

### Total Loss

The total loss is a weighted sum of the content loss, style loss, and total variation loss:

$[
\mathcal{L}_{\text{total}} = \alpha \, \mathcal{L}_{\text{content}} + \beta \, \mathcal{L}_{\text{style}} + \gamma \, \mathcal{L}_{\text{tv}}
]$

- $( \alpha )$: Weight for content loss.
- $( \beta )$: Weight for style loss.
- $( \gamma )$: Weight for total variation loss.

<a id="2"></a>
# 2. Implementing Neural Style Transfer

<a id="2.1"></a>
## 2.1 Prerequisites

We'll implement Neural Style Transfer using PyTorch. Before we begin, ensure you have the following libraries installed:

- **PyTorch**
- **Torchvision**
- **PIL** (Python Imaging Library)

**Note:** GPU acceleration is highly recommended for faster computation.

In [ ]:
# Import necessary libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import models, transforms
from PIL import Image
import matplotlib.pyplot as plt
import copy

# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f'Using device: {device}')

<a id="2.2"></a>
## 2.2 Code Implementation with PyTorch

### 2.2.1 Loading and Preprocessing Images

We'll start by loading the content and style images. We'll define functions to load and preprocess the images.

In [ ]:
# Image loading and preprocessing functions
def image_loader(image_name, imsize):
    loader = transforms.Compose([
        transforms.Resize(imsize),
        transforms.ToTensor()
    ])
    image = Image.open(image_name)
    image = loader(image).unsqueeze(0)
    return image.to(device, torch.float)

# Define image size
imsize = 512 if torch.cuda.is_available() else 128  # Use small size if no GPU

# Load content and style images
content_img = image_loader("path_to_content_image.jpg", imsize)
style_img = image_loader("path_to_style_image.jpg", imsize)

assert content_img.size() == style_img.size(), "Content and style images must be the same size."

# Display images
def imshow(tensor, title=None):
    image = tensor.cpu().clone()
    image = image.squeeze(0)
    unloader = transforms.ToPILImage()
    image = unloader(image)
    plt.imshow(image)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated

plt.figure()
imshow(content_img, title='Content Image')

plt.figure()
imshow(style_img, title='Style Image')

**Note:** Replace `"path_to_content_image.jpg"` and `"path_to_style_image.jpg"` with the actual file paths to your images.

### 2.2.2 Defining Content and Style Loss Modules

We'll create custom modules to compute the content and style losses.

In [ ]:
# Content Loss Module
class ContentLoss(nn.Module):
    def __init__(self, target,):
        super(ContentLoss, self).__init__()
        self.target = target.detach()
    def forward(self, input):
        self.loss = F.mse_loss(input, self.target)
        return input

# Style Loss Module
def gram_matrix(input):
    batch_size, feature_maps, h, w = input.size()
    features = input.view(batch_size * feature_maps, h * w)
    G = torch.mm(features, features.t())
    return G.div(batch_size * feature_maps * h * w)

class StyleLoss(nn.Module):
    def __init__(self, target_feature):
        super(StyleLoss, self).__init__()
        self.target = gram_matrix(target_feature).detach()
    def forward(self, input):
        G = gram_matrix(input)
        self.loss = F.mse_loss(G, self.target)
        return input

### 2.2.3 Loading the Pre-trained Model

We'll use a pre-trained VGG19 model from `torchvision.models`.

In [ ]:
# Load pre-trained VGG19 model
cnn = models.vgg19(pretrained=True).features.to(device).eval()

# Normalization module
cnn_normalization_mean = torch.tensor([0.485, 0.456, 0.406]).to(device)
cnn_normalization_std = torch.tensor([0.229, 0.224, 0.225]).to(device)

class Normalization(nn.Module):
    def __init__(self, mean, std):
        super(Normalization, self).__init__()
        self.mean = mean.view(-1, 1, 1)
        self.std = std.view(-1, 1, 1)
    def forward(self, img):
        return (img - self.mean) / self.std

### 2.2.4 Building the Model

We'll build a new model that incorporates the content and style loss modules.

In [ ]:
import copy

def get_style_model_and_losses(cnn, normalization_mean, normalization_std,
                               style_img, content_img,
                               content_layers=['conv_4'],
                               style_layers=['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']):
    cnn = copy.deepcopy(cnn)
    
    normalization = Normalization(normalization_mean, normalization_std).to(device)
    
    content_losses = []
    style_losses = []
    
    model = nn.Sequential(normalization)
    
    i = 0  # Increment every time we see a conv
    for layer in cnn.children():
        if isinstance(layer, nn.Conv2d):
            i += 1
            name = 'conv_{}'.format(i)
        elif isinstance(layer, nn.ReLU):
            name = 'relu_{}'.format(i)
            layer = nn.ReLU(inplace=False)
        elif isinstance(layer, nn.MaxPool2d):
            name = 'pool_{}'.format(i)
        elif isinstance(layer, nn.BatchNorm2d):
            name = 'bn_{}'.format(i)
        else:
            raise RuntimeError('Unrecognized layer: {}'.format(layer.__class__.__name__))
        
        model.add_module(name, layer)
        
        if name in content_layers:
            target = model(content_img).detach()
            content_loss = ContentLoss(target)
            model.add_module("content_loss_{}".format(i), content_loss)
            content_losses.append(content_loss)
        
        if name in style_layers:
            target_feature = model(style_img).detach()
            style_loss = StyleLoss(target_feature)
            model.add_module("style_loss_{}".format(i), style_loss)
            style_losses.append(style_loss)
    
    # Trim off the layers after the last content and style losses
    for i in range(len(model) -1, -1, -1):
        if isinstance(model[i], ContentLoss) or isinstance(model[i], StyleLoss):
            break
    model = model[:(i+1)]
    
    return model, style_losses, content_losses

### 2.2.5 Setting Up the Input Image

We'll use the content image as the starting point for the generated image.

In [ ]:
# Input image
image = content_img.clone().requires_grad_(True)

# If you want to start from a white noise image, uncomment the following:
# input_img = torch.randn(content_img.data.size(), device=device)

### 2.2.6 Defining the Optimization Algorithm

We'll use the LBFGS optimizer as recommended by the original paper.

In [ ]:
def get_input_optimizer(input_img):
    optimizer = optim.LBFGS([input_img])
    return optimizer

### 2.2.7 Running the Style Transfer

We'll define the `run_style_transfer` function to perform the optimization.

In [ ]:
def run_style_transfer(cnn, normalization_mean, normalization_std,
                        content_img, style_img, input_img, num_steps=300,
                        style_weight=1000000, content_weight=1):
    print('Building the style transfer model...')
    model, style_losses, content_losses = get_style_model_and_losses(cnn, normalization_mean, normalization_std,
                                                                     style_img, content_img)
    optimizer = get_input_optimizer(input_img)
    
    print('Optimizing...')
    run = [0]
    while run[0] <= num_steps:
        def closure():
            input_img.data.clamp_(0, 1)
            optimizer.zero_grad()
            model(input_img)
            style_score = 0
            content_score = 0
            for sl in style_losses:
                style_score += sl.loss
            for cl in content_losses:
                content_score += cl.loss
            
            style_score *= style_weight
            content_score *= content_weight
            
            loss = style_score + content_score
            loss.backward()
            
            run[0] += 1
            if run[0] % 50 == 0:
                print("Run {}:").format(run)
                print('Style Loss : {:4f} Content Loss: {:4f}'.format(
                    style_score.item(), content_score.item()))
                print()
            
            return style_score + content_score
        optimizer.step(closure)
    
    # Clamp the final image
    input_img.data.clamp_(0, 1)
    return input_img

### 2.2.8 Performing Style Transfer

In [ ]:
# Run the style transfer
output = run_style_transfer(cnn, cnn_normalization_mean, cnn_normalization_std,
                            content_img, style_img, image, num_steps=300)

### 2.2.9 Displaying the Output Image

In [ ]:
# Display the output image
plt.figure()
imshow(output, title='Output Image')
plt.ioff()
plt.show()

**Result:** The output image should display the content of the content image rendered in the style of the style image.

<a id="3"></a>
# 3. Creative Applications

<a id="3.1"></a>
## 3.1 Advanced Techniques

### 3.1.1 Adjusting Style and Content Weights

By modifying the `style_weight` and `content_weight` parameters, you can control the emphasis on style versus content in the generated image.

- **Higher `style_weight`**: The output image will more closely mimic the style image.
- **Higher `content_weight`**: The output image will retain more of the content image's structure.

### 3.1.2 Multi-Style Transfer

You can blend multiple styles by combining their style losses. This involves computing the style loss for each style image and summing them with appropriate weights.

### 3.1.3 Style Transfer on Videos

Applying neural style transfer to each frame of a video allows for the creation of stylized videos. Temporal consistency techniques are used to ensure smooth transitions between frames.

<a id="3.2"></a>
## 3.2 Latest Developments

### 3.2.1 Real-Time Style Transfer

Johnson et al. [[2]](#ref2) introduced methods for real-time style transfer using feed-forward networks trained to approximate the optimization process.

### 3.2.2 Adaptive Instance Normalization (AdaIN)

Huang and Belongie [[3]](#ref3) proposed AdaIN, which aligns the mean and variance of the content features with those of the style features, enabling arbitrary style transfer in real-time.

### 3.2.3 StyleGAN and GAN-based Methods

Generative Adversarial Networks (GANs) have been used to generate high-quality stylized images. StyleGAN [[4]](#ref4) allows for fine-grained control over style at different levels of detail.

<a id="4"></a>
# 4. Conclusion

Neural Style Transfer is a fascinating intersection of art and deep learning, demonstrating the creative capabilities of neural networks. By understanding the underlying mathematics and implementation details, you can experiment with creating your own stylized images and explore further advancements in this field.

<a id="5"></a>
# 5. References

1. <a id="ref1"></a>Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). *A Neural Algorithm of Artistic Style*. [arXiv:1508.06576](https://arxiv.org/abs/1508.06576)
2. <a id="ref2"></a>Johnson, J., Alahi, A., & Fei-Fei, L. (2016). *Perceptual Losses for Real-Time Style Transfer and Super-Resolution*. [arXiv:1603.08155](https://arxiv.org/abs/1603.08155)
3. <a id="ref3"></a>Huang, X., & Belongie, S. (2017). *Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization*. [arXiv:1703.06868](https://arxiv.org/abs/1703.06868)
4. <a id="ref4"></a>Karras, T., Laine, S., & Aila, T. (2019). *A Style-Based Generator Architecture for Generative Adversarial Networks*. [arXiv:1812.04948](https://arxiv.org/abs/1812.04948)

---

This notebook provides an in-depth exploration of Neural Style Transfer and its creative applications. You can run the code cells to see how the style transfer is implemented and experiment with different images and parameters.