### What is Style Transfer?

A technique that allows you to apply the style of one image to another image of your choice with the help of CNN.

#### Style and content separation

To perform style transfer we are interested in combining the content of an image with the style of an image, merge these two into another image. 

**Where does the content come from? (Keywords: content, object)**

Essentially, what this means is that we'll use the content (features) of a model of a input image from the later layers. It's know that the later layers are focused on the content rather than the details concerning texture or colors. Some models, e.g. VGG-19 has been quite successful at this task.

![Content representation](part3_images/content-representation.png)

**What about the style? (Keywords: texture, color)** 

We can think about style as being captured by the feature maps that look at spatial correlations within the layer. So they are special. For each feature map (depth level of conv layer) we can measure how strongly it's detected feature relate to the other feature maps in that layer. Those similarities and differences are going to tell us information about **texture and color**, but not about placement or identity of the objects.

Both the style and content image pass through the network with feedforward until they reach a certain conv stack that acts as a feature extractor for both content and style. 

**How do we create the target image (combination)?**

As the target image is essentially created by combining the outputs from extractors from content and style, the target image's representation is compared to that of the content image. To formalize the comparison a loss function that compares how far the image are from each other. In the example below the MSE is used.

**How to calculate content loss?**

$$ \zeta_{content} = \frac{1}{2}\sum(T_{c}-C_{c})^2$$

The goal will be to **minimize loss** similar to the way loss is used in optimization. There is one big distinction, as the goal is to update the appearance of the target until it matches that of the content image we are not training the CNN at all. The model is used as a feature extractor with backpropagation as the method of minimizing the loss based on the function defined earlier.

This way we are making sure that the target image has the same content as the content image. Let's do the same for style.

We know that the style representation of an image relies on looking at correlations between the features in the individual layers in the model and these correlations between each layer are given by a **Gram matrix**. This is a popular mathematical method used.

1. vectorizing the values in the layer, this way we are converting from a 3D conv layer to a 2D matrix

![Gram matrix step 1](part3_images/gram-matrix1.png)

2. we multiply the matrix by its transpose, multiplying the featues in each map to get the gram matrix. The resultant gram matrix contains non-localized information about the layer because each value in the map is treated as an individual sample.

![Gram matrix step 2](part3_images/gram-matrix2.png)

! Note that the dimensions of this matrix are related only to the number of feature maps in the convolutional layers, it doesn't depend on the dimensions of the input image.

**Why does it work?**

The gram matrix computed contains dot products of the feature maps at each layer, this is a correlation operation. The entries basically encode activations that co-occur. Style exhibits strong patterns so when you capture activations that co-occur a lot you capture this pattern.

**How to calculate style loss?**

$$ \zeta_{style} = a\sum_{i}w_i(T_{s,i} - S_{s,i})^2$$ where $T_{s}$ - target image, $S_{s}$ is style image. $T_{s}$ is the only value that changes. Both are list of gram matrices.

We also need to calculate the total loss, so how do we do that?

$$ \zeta_{content} = \frac{1}{2}\sum(T_{c}-C_{c})^2 + \zeta_{style} = a\sum_{i}w_i(T_{s,i} - S_{s,i})^2$$

Typical backpropagation and optimization is used to reduce the loss iteratively.

What matters most in style transfer is the content to style weight ration, this is what allows the target image to get more style.

![Loss weights](part3_images/loss-weights.png)

The larger the beta, the more style the image gets. The smaller the ration the more stylistic image you will see. In general, those two rules apply but there might be cases where it does not.

