# Neural Style Transfer using Deep Learning

In this notebook, we implement the Neural Style Transfer algorithm based on the paper "A Neural Algorithm of Artistic Style" by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge.

## Introduction
Neural Style Transfer (NST) is a technique that takes two images—a content image and a style image—and blends them together so that the output image looks like the content image but "painted" in the style of the style image.

The algorithm uses a pretrained Convolutional Neural Network (CNN), typically VGG19, to extract features from both the content and style images. The core idea is to match the content representation of the output image with that of the content image and the style representation of the output image with that of the style image.

### Loss Functions
The NST algorithm optimizes the output image by minimizing a loss function that has two components:
- **Content Loss**: Measures the difference in content between the output image and the content image.
- **Style Loss**: Measures the difference in style between the output image and the style image using the Gram matrix.


In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from PIL import Image
import torchvision.transforms as transforms
import torchvision.models as models
from torchvision.utils import save_image

### Defining the VGG Model
We use a pretrained VGG19 model for feature extraction. Since Layer 29 and onwards of the model consist of FC layers, we cut them off

In [2]:
class VGG(nn.Module):
    def __init__(self):
        super(VGG, self).__init__()
        self.chosen_features = ['0', '5', '10', '19', '28']
        self.model = models.vgg19(pretrained=True).features[:29]

    def forward(self, x):
        features = []
        for layer_num, layer in enumerate(self.model):
            x = layer(x)
            if str(layer_num) in self.chosen_features:
                features.append(x)
        return features

### Loading and Preprocessing Images
We define a function to load and preprocess the images. The images are resized and normalized to be compatible with the pretrained VGG model.

In [3]:
def load_image(image_name):
  image = Image.open(image_name)
  image = loader(image).unsqueeze(0)
  return image.to(device)

### Loading Images and Initializing the Model

In [4]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
image_size = 512
loader = transforms.Compose([
    transforms.Resize((image_size, image_size)),
    transforms.ToTensor()
])

original_img = load_image("input.png")
style_img = load_image("style.jpg")

model = VGG().to(device).eval()
generated = original_img.clone().requires_grad_(True)

Downloading: "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth" to /root/.cache/torch/hub/checkpoints/vgg19-dcbb9e9d.pth
100%|██████████| 548M/548M [00:03<00:00, 164MB/s]


### Defining the Loss Functions
The content loss and style loss functions are defined as described in the paper. The content loss is the mean squared error between the feature maps of the generated and content images. The style loss is the mean squared error between the Gram matrices of the feature maps of the generated and style images.

In [5]:
def content_loss(generated_feature, content_feature):
    return torch.mean((generated_feature - content_feature) ** 2)

def gram_matrix(feature):
    _, n_channels, height, width = feature.size()
    feature = feature.view(n_channels, height * width)
    G = torch.mm(feature, feature.t())
    return G / (n_channels * height * width)

def style_loss(generated_feature, style_feature):
    G = gram_matrix(generated_feature)
    A = gram_matrix(style_feature)
    return torch.mean((G - A) ** 2)

### Optimizing the Generated Image
We perform gradient descent on the generated image to minimize the combined content and style loss. The total loss is a weighted sum of the content and style losses.

In [None]:
total_steps = 10000
learning_rate = 0.001
alpha = 1
beta = 0.01
optimizer = optim.Adam([generated], lr=learning_rate)

for step in range(total_steps):
    # extract features
    generated_features = model(generated)
    original_img_features = model(original_img)
    style_img_features = model(style_img)

    c_loss = 0
    s_loss = 0

    for gen_feature, orig_feature, style_feature in zip(generated_features, original_img_features, style_img_features):
        batch_size, channel, height, width = gen_feature.shape
        c_loss += torch.mean((gen_feature - orig_feature)**2)
        # compute gram matrix
        G = gen_feature.view(channel, height*width).mm(
            gen_feature.view(channel, height*width).t()
        )

        A = style_feature.view(channel, height*width).mm(
            style_feature.view(channel, height*width).t()
        )

        s_loss += torch.mean((G-A)**2)

    total_loss = alpha * c_loss + beta * s_loss
    optimizer.zero_grad()
    total_loss.backward()
    optimizer.step()

    if step % 200 == 0:
        print(f'Step [{step}/{total_steps}], Content Loss: {c_loss.item():.4f}, Style Loss: {s_loss.item():.4f}, Total Loss: {total_loss.item():.4f}')
        save_image(generated, f"generated_{step}.png")

Step [0/10000], Content Loss: 0.0001, Style Loss: 323228064.0000, Total Loss: 3232280.5000
Step [200/10000], Content Loss: 12.5102, Style Loss: 27729052.0000, Total Loss: 277303.0000
Step [400/10000], Content Loss: 13.0549, Style Loss: 16320842.0000, Total Loss: 163221.4844
Step [600/10000], Content Loss: 13.2986, Style Loss: 8429695.0000, Total Loss: 84310.2422
Step [800/10000], Content Loss: 13.4644, Style Loss: 3846746.2500, Total Loss: 38480.9258
Step [1000/10000], Content Loss: 13.5957, Style Loss: 2248197.5000, Total Loss: 22495.5703
Step [1200/10000], Content Loss: 13.7000, Style Loss: 1721329.0000, Total Loss: 17226.9883
