#### Demo

We're going to build a Neural Style Transfer model, train it in the cloud, then create an API that we can use to send images to and have an artistic filter applied to them!

- Step 1 - Building our Model
- Step 2 - Training the model on FloydHub
- Step 3 - Serving the model via an API

![alt text](https://blog.paperspace.com/content/images/2017/02/PaperspaceTransferLearningTutorial01.png "Logo Title Text 1")

#### Building the Model

![alt text](https://cdn-images-1.medium.com/max/1600/1*btAtU_VrgmKBbG1gakXV2w.png "Logo Title Text 1")


![alt text](https://raw.githubusercontent.com/sunshineatnoon/Paper-Collection/master/images/RTNS.png "Logo Title Text 1")


- There are 3 parts to the workflow! A content extractor, A style extractor , and a merger. 

![alt text](https://blog.paperspace.com/content/images/2017/02/Untitled-Diagram-4-.png "Logo Title Text 1")

###### Part 1 - The Content Extractor

![alt text](https://blog.paperspace.com/content/images/2017/02/Untitled-Diagram-10-.png "Logo Title Text 1")

- They seperated the semantic content of an image!
- They used a convolutional neural network called VGG 19. 

![alt text](https://cdn-images-1.medium.com/max/1600/1*f3wRS2crHnQ7Pu0x6FYuIQ.gif "Logo Title Text 1")

- ConvNets are neural networks that are well-suited for image classification tasks
- VGG 19 was trained on thousands of images and is capable of classifying images out of the box. 
- It looks like they used the output of one of the hidden layers as a content extractor. 
- Thats because the hidden layers of a ConvNet extract high level features of an image, and the deeper the layer, the more high level the attributes will be that the layer identifies. 

![alt text](https://cdn-images-1.medium.com/max/1600/1*eg2MRfNFxIqjG_UK3vo0DQ.png "Logo Title Text 1")

![alt text](https://cdn-images-1.medium.com/max/1600/1*GksqN5XY8HPpIddm5wzm7A.jpeg "Logo Title Text 1")

- Between taking an image as input and outputting a guess as to what it is, a CNN is doing transformations to turn the image pixels into an internal understanding of the content of the image. 

![alt text](https://cdn-images-1.medium.com/max/1600/1*EvBcni8o_O3v4RUl640TZQ@2x.png "Logo Title Text 1")

- We can use one of the intermediate semantic representations in a ConvNet to compare the content of two images. 
- If we pass 2 different images through a ConvNet after being passed through a few hidden layers, their representations will be very close in raw value. 
- If we pass both the final image and the content image and find the distance between the intermediate representation of those images, we have the content loss. 
- We make a list of layers at which we want to compute content loss. 
- We pass both images through the network until a particular layer in the list, take it out of that layer, square the difference between each corresponding value in the output and sum them all up.
- We do this for every layer in the list and sum those up. 
- We’re also multiplying each of the representations by some value alpha, called content weight after finding their differences and squaring it.

![alt text](https://cdn-images-1.medium.com/max/1600/1*Sbis79TMJ7f7qIetlEAqqA.png "Logo Title Text 1")

###### Part 2 - The Style Extractor

![alt text](https://blog.paperspace.com/content/images/2017/02/Untitled-Diagram-11-.png "Logo Title Text 1")

- Same idea as the content extractor, meaning they used the output of a hidden layer but they added an additional step. 
- It used a correlation estimator based on the gram matrix of the filters of a given hidden layer.
- This destroys semantics of the image but preserves its basic components, making an excellent texture extractor. 
- A gram matrix results from multiple a matrix with the transpose of itself. 
- And because every column is multiplied with every row in the matrix, we can think of the spatial information that was contained in the original representations to have been distributed. 
- This game matrix contains all sorts of information about the image, the texture, shapes, and style. 
- Once we have the gram matrix, we can find the distance between the gram matrices of the intermediate representations of both our image and the style image to find how similar they are in style. 
- And its all multiple by some value beta, known as the style weight. 

![alt text](https://cdn-images-1.medium.com/max/1600/1*ZJjUFPPqLZ1z48maIfInBA.png "Logo Title Text 1")

![alt text](https://cdn-images-1.medium.com/max/1600/1*R3Ler_uVVldfdRSYmeLKjw.png "Logo Title Text 1")


###### Part 3 - Blending Content + Style

![alt text](https://blog.paperspace.com/content/images/2017/02/Untitled-Diagram-1-.jpg "Logo Title Text 1")

- They of course framed it as an optimization problem as machine learning papers tend to do. 
- And in an optimization problem, some cost function is minimized iteratively during training to achieve a goal. 
- Their cost function penalized the synthesized image if its content was not equal to the desired content and its style was not equal to the desired style.  

![alt text](https://cdn-images-1.medium.com/max/1600/1*3-60SfuOkU0LMoAspntCSA.png "Logo Title Text 1")

- Both the content and style loss were added together to get the cost function. 
- They then performed back propagation to minimize the cost by getting the gradient of the final image and iteratively changing it to look more and more like a stylized content image. 
- They used an optimization technique thats terribly named L-BFGS which isn’t as popular as say stochastic gradient descent. 
- If do a bit of research it looks like its a second order optimization scheme, meaning it uses the derivative of the derivative, that gets closer to the global minimum but the iteration cost is also bigger.

![alt text](https://media.springernature.com/full/springer-static/image/art%3A10.1186%2Fs40537-017-0084-5/MediaObjects/40537_2017_84_Figa_HTML.gif "Logo Title Text 1")

#### FloydHub

![alt text](https://cdn-images-1.medium.com/max/1600/1*fD0KwPktgmf1zn8pSrrtUg.jpeg "Logo Title Text 1")

![alt text](https://cdn-images-1.medium.com/max/1203/0*DgHKkUPnT9Qa9K8L. "Logo Title Text 1")
