# Image style transfer with CNN

The goal is to generate new image from two input images the way that the output image contains the content of the first input image (spatial features) and the style (texture) of the another input image. The actual generator is implemented using convolutional neural network (CNN).

In other words, define a style transfer process which modifies the content image style while preserving its content close to the original. You can also think it as a process of merging two images together resulting in an output image containing aspects from both input images.

This notebook is an implementation of the method used to extract the artistic style of an image described in the following white paper (https://arxiv.org/pdf/1508.06576.pdf)

## Overview

#### Initialization
- Two images (content and style) from which the feature maps are extracted
    - Content image, the one containing all the spatial features
    - Style image, contains the overall style / texture
- The feature maps are used to adjust the features of random generated initial image to be as close as possible to the extracted content and style features
- A random generated image is passed as an input to the model
- After the style transfer process, the output image is overpainted to the input image by adjusting the pixel values during the process

#### Base model
- In order to extract the correct feature maps from the input images (spatial features from the content image, style features from the style image), a general machine learning model is needed
- There are several models which can be used as a base model to extract the feature maps
    - VGG16 is arguably the most popular one, at least for this purpose
- The model is only used to get the right features, it will not be trained during the process (in other words, the model weights won't be update)

#### The problem definition
The style transfer process can be turned into a machine learning optimization problem. The optimization problem here is to minimize the loss function which is defined as follows:

Total loss = content loss + style loss

Content loss = The difference in content between initial input image and the content image
Style loss = The difference in style between the input and the style image

The smaller the total loss is, the closer the features of the output image are to the input images => matching content and style of the original images

#### The process description
- Random initial image
- Match the initial image's feature maps to the extracted feature maps at chosen feature convolutional layer => backpropagates the input image pixels instead of the model weights

In [None]:
TODOs
- Resize the input images
- Add the total variation loss
    - Total variation loss = The smoothness of the image