Skip to content

msokoloff1/Real-Time-Style-Transfer

Repository files navigation

Real-Time-Style-Transfer

This style transfer algorithm is based on the paper Perceptual Losses for Real-Time Style Transfer and Super-Resolution. The algorithm allows us to train a deep residual convolutional neural network to learn how to apply an artistic style to any arbitrary input image. The resulting network produces visually appealing images in a small fraction of a second. Our implementation deviates from the paper in three respects. First, we use instance normalization. This allows for many fewer iterations to achieve the same results in the original paper. Additionally, our process uses a single image in each batch. Finally, we employed total variation norm to encourage piecewise consistency introduced by the paper Understanding Deep Image Representations by Inverting Them. Otherwise we follow the same specifications as the paper.


The example to the right contains the original style and content side by side. Beneath them is the result of applying the style transfer. Please see the results section below for more examples.


#Results

#Architecture: ###Overview Our architecture had was mostly similar to the architecture used by Johnson et al. In both implementations, there is a generator network used to produce a stylized image and a discriminator network used to provide feedback to the generator. ####Loss Function The loss function used to train the generator consists of two parts. First, there is a content loss which describes the distance between the discriminators activations when given a real image and when given the output of the generator net. The second part of the loss uses global statistics to create an error metric for the style. We obtain these statistics by calculating the inner product of all the activations between each feature map (gram matrix). The style loss consists of the difference between the gram matrices that are produced from the generator input and the style image input.

The gradients are only applied to the generator weights, and not to the discriminator network. This is because we only care about the output of the generator. Furthermore, the prior knowledge encoded by the discriminator weights is what allows it to effectively capture image statistics. ####Network Architectures #####Discriminator The discriminator network is a pretrained VGG network. This network takes an image as input and produces statistics which we use to create our loss function. This network was created for classifying a wide variety of images. We use a stock version (see credits section below) and modify it to allow for any size image. Unlike classification networks that require a fixed number of outputs, our style network can produce statistics for arbitrarily sized images. Therefore, we remove this constraint. Additionally, we remove the max pooling and replace it with average pooling to obtain smoother results. #####Generator The generator network takes an unstylized image as input and produces a stylized output. The input is compressed as we downsample using strided convolutions. Then to return the result to the original size, fractionally strided convolutions are used. Using strided convolutions downsampling produces better results than using pooling operations. Residual connections are also used because the benefits they provide for training deep networks. Finally, we use instance normalization which leads to faster training.

#Findings We did not need nearly as many iterations as the paper claimed to use. 20,000 iterations with a batch size of 1 yielded visually pleasing results.

The hyperparameters are extremely sensitive and have to be altered depending on the input styles. The results were not visually appealing until the parameters were fine tuned. We found that different styles require significantly different weight ratios.

The total variance norm ended up being less effective than when using the gradient to directly update the pixels of the original image. It made the images look dull and didn't provide enough of a smoothing effect to be worthwhile.

##Data Requirements The following files must be downloaded on your machine and put in the savedNets folder:

Microsoft Coco dataset required for training Download Here

VGG network weights Download Here
(must be called vgg19.npy) ##Software Requirements

-Python 3.5
-Tensorflow 0.12
-Numpy
-PILLOW
-Scipy

#Project Structure

####images Contains images used for visualizing progress during training. #####results This is the folder where the output images are written to by default. #####savedNets This is where trained network weights are written to and read from. #####sourceImages Contains the style images. #####transferSource Images to be stylized are placed here. #####Generator.py Contains the generator network class #####Loss.py Used to calculate the loss function #####Runner.py This is the main python file for the project. Run this file to train or perform style transfer. A variety of flags are can be used to change settings. #####Test.py This contains the code for performing the style transfer after a network has already been trained. #####Train.py Provides logic for the training operation. #####net.py This is the VGG net implemented in tensorflow. #####utilities.py Provides functions to be used for file system and mathematical operations.

#How To Use Make sure that vgg network and the coco dataset are in the savedNets folder. ####To Train python3 Runner.py -train True ####To Perform style transfer python3 Runner.py #####To Find out about all flags python3 Runner.py -h

##Additional Credits ####Photography Patrick Mullen, Alejandro Lazare (RedLaz Media) ####VGG network https://github.com/machrisaa/tensorflow-vgg

About

Style transfer using pre-trained generator network.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages