# Artistic Neural Style Transfer using PyTorch

In this kernel, we’ll implement the style transfer method that is outlined in the paper, [Image Style Transfer Using Convolutional Neural Networks, by Gatys](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf) in PyTorch.

In this paper, style transfer uses the features found in the 19-layer VGG Network, which is comprised of a series of convolutional and pooling layers, and a few fully-connected layers. In the image below, the convolutional layers are named by stack and their order in the stack. Conv_1_1 is the first convolutional layer that an image is passed through, in the first stack. Conv_2_1 is the first convolutional layer in the *second* stack. The deepest convolutional layer in the network is conv_5_4.

<img src="https://github.com/udacity/deep-learning-v2-pytorch/raw/master/style-transfer/notebook_ims/vgg19_convlayers.png" width=80% />

### Separating Style and Content

Style transfer relies on separating the content and style of an image. Given one content image and one style image, we aim to create a new, _target_ image which should contain our desired content and style components:
* objects and their arrangement are similar to that of the **content image**
* style, colors, and textures are similar to that of the **style image**

An example is shown below, where the content image is of a cat, and the style image is of [Hokusai's Great Wave](https://en.wikipedia.org/wiki/The_Great_Wave_off_Kanagawa). The generated target image still contains the cat but is stylized with the waves, blue and beige colors, and block print textures of the style image!

<img src='https://github.com/udacity/deep-learning-v2-pytorch/raw/master/style-transfer/notebook_ims/style_tx_cat.png' width=80% />

In [None]:
import torch
import torchvision as tv 
import torch.optim as optim
from torchvision import transforms, models 
import IPython.display as display
from tqdm import tqdm 
import numpy as np 
from PIL import Image
import matplotlib.pyplot as plt
%matplotlib inline
mean = np.asarray([ 0.485, 0.456, 0.406 ])
std = np.asarray([ 0.229, 0.224, 0.225 ])

## Load in VGG19 (features)

VGG19 is split into two portions:
* `vgg19.features`, which are all the convolutional and pooling layers
* `vgg19.classifier`, which are the three linear, classifier layers at the end

We only need the `features` portion, which we're going to load in and "freeze" the weights of, below.

In [None]:
# get the "features" portion of VGG19 (we will not need the "classifier" portion)
model = models.vgg19(pretrained=True).features
# without freezing the weights for each layer, training will be very slow. 
for param in model.parameters():
    param.requires_grad=False
model.eval()

In [None]:
# move the model to GPU, if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
model.to(device)

### Load in Content and Style Images

You can load in any images you want! Below, we've provided a helper function for loading in any type and size of image. The `load_image` function also converts images to normalized Tensors.

Additionally, it will be easier to have smaller images and to squash the content and style images so that they are of the same size.

Next, I'm loading in images by file name and forcing the style image to be the same size as the content image.

In [None]:
def load_image_as_tensor(img_path, max_size=1024, shape=None):        
    image = Image.open(img_path).convert('RGB')
    w, h=image.size[0], image.size[1]
    # print(f'size={image.size}, h={h}, w={w} ')    
    
    # large images will slow down processing    
    if max(image.size) > max_size:
        scale=max_size/max(image.size)
        size = torch.Size((int(h*scale), int (w*scale)))
    else:        
            size=torch.Size((h, w))
            
    if shape is not None:
        size=shape
             
    tfm = transforms.Compose([
        transforms.Resize(size), 
        transforms.ToTensor(),
        # normalize image based on mean and std of ImageNet dataset
        transforms.Normalize(mean, std, inplace=True), 
        transforms.Lambda(lambda x: x.mul(255.))
        ])

    # discard the transparent, alpha channel (that's the :3) and add the batch dimension
    return tfm(image)[:3,:,:].unsqueeze(0)    

## Specify the content, style images and other settings

In [None]:
content='louvre2.jpg'
style='style2.jpg'
useLayers='relu' # conv or relu layers in vgg
optimizer='adam' # adam or lbfgs
stylized_filename=f'{optimizer} {useLayers} {style[:-4] } stylized {content[:-4]}.jpg'
print(stylized_filename)

In [None]:
# load in content and style image
content = load_image_as_tensor(content).to(device)
print(f'content shape={content.shape} ')
# Resize style to match content, makes code easier
style = load_image_as_tensor (style, shape=content.shape[-2:]).to(device)
print(f'style shape={style.shape} ')

In [None]:
def showTensorImage(img, filename=None):    
    tfm=transforms.Compose([
        # reverse the normalization
        transforms.Lambda(lambda x: x.div(255.) ), 
        transforms.Normalize((-1 * mean / std), (1.0 / std),inplace=True) 
        ])
    img2=tfm(img.cpu().squeeze(0))
    if filename is not None:  
        tv.utils.save_image(img2, filename)
    return img2.permute(1, 2, 0).detach().numpy()    

In [None]:
plt.figure(figsize=(20,20) )
plt.subplot(1, 2, 1)
plt.imshow(showTensorImage(content))
plt.title('content image')
plt.axis('off')
plt.subplot(1,2, 2)
plt.imshow(showTensorImage(style) ) 
plt.title('style image')
plt.axis('off')
plt.show()

## Content and Style Features

Below, complete the mapping of layer names to the names found in the paper for the _content representation_ and the _style representation_.

In [None]:
def getModelOutputs(image, layers):                    
    outputs={}                
    x = image
    cnt=len(layers)
    # model._modules is a dictionary holding each module in the model            
    for name, layer in model._modules.items():
        x = layer(x)        
        if name in layers:
            outputs[name]=x 
            cnt-=1
            if cnt==0: # outputs from required layers obtained, break out
                break               
    return outputs

In [None]:
def unrollTensor(A):
    if len(A.shape)==4:
        A=A.squeeze(0)
    c, h, w=A.shape
    return A.reshape(c, -1), c*h*w

## Content Cost Function $J_{content}(C,G)$

One goal you should aim for when performing NST is for the content in generated image G to match the content of image C. A method to achieve this is to calculate the content cost function, which will be defined as:

$$J_{content}(C,G) =  \frac{1}{4 \times n_H \times n_W \times n_C}\sum _{ \text{all entries}} (a^{(C)} - a^{(G)})^2\tag{1} $$

* Here, $n_H, n_W$ and $n_C$ are the height, width and number of channels of the hidden layer you have chosen, and appear in a normalization term in the cost. 
* For clarity, note that $a^{(C)}$ and $a^{(G)}$ are the 3D volumes corresponding to a hidden layer's activations. 
* In order to compute the cost $J_{content}(C,G)$, it might also be convenient to unroll these 3D volumes into a 2D matrix, as shown below.
* Technically this unrolling step isn't needed to compute $J_{content}$, but it will be good practice for when you do need to carry out a similar operation later for computing the style cost $J_{style}$.

In [None]:
def getContentCost(a_C_unrolled, a_G):        
    #unroll the tensors to be (b, c, h x w)    
    a_G_unrolled,chw =unrollTensor(a_G)            
    return torch.sum((a_C_unrolled - a_G_unrolled)**2) / (4*chw)
    

## GRAM Matrix

In [None]:
def getGRAMmatrix(A):    
    return torch.mm(A, A.t())

## Style Cost

$$J_{style}^{[l]}(S,G) = \frac{1}{(2 \times n_C \times n_H \times n_W)^2} \sum _{i=1}^{n_C}\sum_{j=1}^{n_C}(G^{(S)}_{(gram)i,j} - G^{(G)}_{(gram)i,j})^2\tag{2} $$

* $G_{gram}^{(S)}$ Gram matrix of the "style" image.
* $G_{gram}^{(G)}$ Gram matrix of the "generated" image.
* Make sure you remember that this cost is computed using the hidden layer activations for a particular hidden layer in the network $a^{[l]}$

In [None]:
def getStyCost(styleGRAMs, target_outputs, style_weights):
    stylecost=0.
    for k, v in style_weights.items():
        a_G_unrolled, chw = unrollTensor(target_outputs[k])    
        G_G=getGRAMmatrix(a_G_unrolled)                           
        cost=torch.sum((styleGRAMs[k] - G_G)**2) / ((2*chw)**2)     
        stylecost+= v * cost
    return stylecost 

## Total Variation Cost

In [None]:
def getTotalVariationCost(y):
    return torch.sum(torch.abs(y[:, :, :, :-1] - y[:, :, :, 1:])) + \
           torch.sum(torch.abs(y[:, :, :-1, :] - y[:, :, 1:, :]))

## Total Cost

$$J(G) = \alpha J_{content}(C,G) + \beta J_{style}(S,G)$$

In [None]:
def getTotalCost(J_content, J_style,tvcost, alpha = 2, beta = 40):
    total=(J_content * alpha) + (beta*J_style) + tvcost
    return total 

## Initialize the generated image with random noise and content

In [None]:
noise=torch.randn(content.shape).to(device)
target =torch.autograd.Variable(content.clone()+noise, requires_grad=True)
print(f'target has shape: {target.shape} ')

## Weights for each style layer

In [None]:
style_weights={}
content_layer=''

if useLayers=='conv':
    content_layer='21'
    style_weights = {'0':0.55, '5':0.65, '10':.75, '19':0.85, '28':0.95}
elif useLayers=='relu':
    content_layer='22'
    style_weights = {'1':0.55, '6':0.65, '11':0.75, '20':0.85, '29':0.95}
featureLayers=set(style_weights.keys())
featureLayers.add(content_layer)
print(f'featureLayers has {len(featureLayers)} layers: {featureLayers} ')

## Get content and Style features

#### Before training, we obtain the necessary features for the content and style images. As these features are not changed during training, but merely used for computing losses for the output image, doing so will speed up training drastically as we don't have to keep computing them with every epoch

In [None]:
content_outputs = getModelOutputs(content, [content_layer])
a_C=content_outputs[content_layer]
a_Cunrolled, _ =unrollTensor(a_C)

style_outputs = getModelOutputs(style, featureLayers)    
styleGRAMs={}
for k, v in style_outputs.items():    
    if k in style_weights.keys(): # skip the content layer
        unrolled, _ =unrollTensor(v)
        styleGRAMs[k] = getGRAMmatrix(unrolled)

In [None]:
def getTrainingCosts(target_img):
    
    target_outputs = getModelOutputs(target_img, featureLayers)
    
    contentCost= getContentCost(a_Cunrolled, target_outputs[content_layer])            
    styleCost = getStyCost(styleGRAMs, target_outputs, style_weights)           
    tvCost=getTotalVariationCost(target_img)
    
    totalCost =getTotalCost(contentCost, styleCost, tvCost)

    return totalCost, contentCost, styleCost, tvCost

## Train with Adam optimizer

In [None]:
def trainAdam(totallosses, contentlosses, stylelosses,tvlosses,  epochs=3000, lr=1.25, show_step=500):
    opt = optim.Adam([target], lr=lr)    
    sched = torch.optim.lr_scheduler.OneCycleLR(opt,max_lr=lr*5., epochs=epochs,steps_per_epoch=1)

    for ep in tqdm(range(epochs)):        
        opt.zero_grad()
        Tloss, CLoss, SLoss, TVLoss =getTrainingCosts(target)           

        totallosses +=[Tloss.item() ]
        stylelosses +=[SLoss.item()]
        contentlosses +=[CLoss.item()]
        tvlosses += [TVLoss.item()]
        
        # update your target image    
        Tloss.backward()
        opt.step()
        sched.step()    
        
        step=ep+1
        if  step % show_step == 0 or step==1:        
            display.clear_output(wait=True)        
            print(f'epoch {step}/{epochs}: loss={total_losses[-1]}')                
            if step < epochs:
                plt.imshow(showTensorImage(target))
            else: 
                plt.imshow(showTensorImage(target, stylized_filename))
            plt.axis('off')
            plt.show()

## Train with LBFGS optimizer
### L-BFGS belongs to a class of optimizers call quasi newton optimizers. There's no need to specify learning rates for training, it will determine the best step size and adjust constantly during training. Only drawback is it has no mini batch support so it has to take the entire batch. Given the HUGE step size of NST, the 3 images required for training, this makes it very suitable for this scenario. Note the strange way of calling it, as an inner function, and it will call itself repeatedly.

In [None]:
def trainLBFGS(totallosses, contentlosses, stylelosses,tvlosses, epochs=1000):
    opt = optim.LBFGS((target,), max_iter=epochs, line_search_fn='strong_wolfe')
    step=1
    pbar=tqdm(total=epochs)

    def closure():
        nonlocal step
        nonlocal totallosses
        nonlocal contentlosses
        nonlocal stylelosses 
        nonlocal tvlosses
        
        if torch.is_grad_enabled():
            opt.zero_grad()        
            
        Tloss, CLoss, SLoss, tvloss =getTrainingCosts(target)           

        totallosses +=[Tloss.item()]
        stylelosses +=[SLoss.item()]
        contentlosses +=[CLoss.item()]
        tvlosses += [tvloss.item()]

        if Tloss.requires_grad:
            Tloss.backward()

        with torch.no_grad():
            if step==epochs:
                display.clear_output(wait=True)                        
                plt.imshow(showTensorImage(target, stylized_filename))
                plt.axis('off')
                plt.show()
        step+=1
        pbar.update(1)
        return Tloss
                     
    opt.step(closure)    

## Start Training

In [None]:
total_losses=[]
contentLosses=[]
styleLosses=[]
tvLosses=[]

if optimizer=='adam':
    trainAdam(total_losses, contentLosses, styleLosses, tvLosses)
elif optimizer=='lbfgs':
    trainLBFGS(total_losses, contentLosses, styleLosses,tvLosses)
else:
    print('unknown optimizer')

## Display the Target Image

In [None]:
plt.figure(figsize=(20,20) )
plt.subplot(1, 3, 1)
plt.imshow(showTensorImage(content))
plt.title('content')
plt.axis('off')
plt.subplot(1, 3, 2)
plt.imshow(showTensorImage(style))
plt.title('style image')
plt.axis('off')
plt.subplot(1, 3, 3)
plt.imshow(showTensorImage(target))
plt.title('stylized image')
plt.axis('off')
plt.show()

## Loss graphs

In [None]:
plt.figure(figsize=(20,15) )
plt.subplot(2, 2, 1)
plt.plot(total_losses)
plt.title("total losses")
plt.subplot(2, 2, 2)
plt.plot(contentLosses)
plt.title("content losses")
plt.subplot(2, 2, 3)
plt.plot(styleLosses)
plt.title("style losses")
plt.subplot(2, 2, 4)
plt.plot(tvLosses)
plt.title("Total Variation losses")
plt.show()

## References

- A very good pytorch neural style transfer [git repo](https://github.com/gordicaleksa/pytorch-neural-style-transfer)
- Andrew Ng's video lecture on [neural style transfer](https://www.youtube.com/watch?v=ChoV5h7tw5A)
- A good video on [pytorch neural style transfer](https://www.youtube.com/watch?v=imX4kSKDY7s&t=214s)
- Good article on BFGS: [A very gentle introduction to BFGS optimizing algorithm](https://machinelearningmastery.com/bfgs-optimization-in-python/)