Neural Style Transfer

Single style

Interested in understanding neural style transfer, I set out to create one. This is adapted from the tf implementation. Obviously, I am going to use a photo of my beautiful dog, Bacchus.

The results are pretty great...

Metzinger's Two Nudes
Gleizes's The Bridges of Paris
Delaunay's Window on the City
Kandinsky's Composition VII
Monet's The Water-Lily Pond
Bruegel's Tower of Babel
Bruegel's The Triumph of Death
Bosch Follower's Christ in Limbo
Bosch Follower's Tondal's Vision

What is Neural Style Transfer?

The basic idea behind neural style transfer is to take an image and transfer the artistic style of another image onto it. This can be done by the following basic process:

Take a CNN (convolutional neural network) trained for multi-purpose image examination (such as VGG19, used here)
Extracting features from some of the layers of the network for both a "content" image and "style" image
Creating a new image, "combo," that minimizes the loss from the content's deviation from "content" and the loss from the style's deviation from "style"

In a paper by Gatys et al (https://arxiv.org/pdf/1508.06576.pdf), the authors construct this method by using two loss functions:

Content loss: The "pixel" distance between the "combo" and "content" on a deep layer of the network.
Style loss: The difference in the Gram matrix of the "combo" and "style" across several layers throughout the network. In essence the Gram matrix takes a pixel image (h x w x n_f), where h and w are the height and width of the image, and n_f is the number of filters (i.e. features), and converts it into a n_f x n_f matrix that is a measure of how many and by how much each of the layer's features have been represented in that image. The difference between the gram matrix of the style image and combo image tracks how much these features have been captured. Importantly, this doesn't care so much where the features are located, only that they are present. This is polled over multiple layers of the CNN.

What is different about my implementation?

Adam optimizer: Many implementations use L-BFGS-B to minimize the loss. This is included in scipy as a wrapper to a FORTRAN function. Here, I use Adam because it is easier to implement and seems to function just fine.
Variational loss: This demands adjancent pixels in the combo image do not move too much, i.e. the image is somewhat smooth.
High tunability: The three different sources of loss are weighted by scalable parameters and raised to variable powers, allowing for an arbiratry customization of transfer. (Up until now, I have been mostly following in the footsteps of others, but here I start to deviate.)
Common weighting: I normalize the style losses so that the style image has the same contribution from all included layers. This means that the features that are present are rewarded, but more significantly, the features that do appear in the combo image that are absent in the style are punished. The effect of this is sizable, and I found that it tends to create a more interesting combined image.
Removing content loss: Surprisingly, I found starting with the content image, but not having any contribution to the content loss results in the most interesting images (i.e., I typically set the content weight to 0).
Option to start with "combo" = "content" or "combo" = noise: code allows for either option, which was heavily used in the Exploration/ folder. For the noise, I simply construct an image from random uniformly distributed values for each pixel channel.

Dual style transfer

Instead of transfering one style, we could try to transfer two styles: a fine style, which captures colors and textures from the early layers, and a coarse style, which captures more global stylistic features from the image. This is implemented in the file DualNST.ipynb. Below, we have a grid of images. The top left is the content source (Bacchus), while the rest of the top row are the coarse style source images (Composition VII, Tondal's Vision, The Bridges of Paris). The left column has the fine style source (Tower of Babel, The Water-Lily Pond, The Triumph of Death). The remaining images are the fine + coarse styles applied as indicated by the image position.

Note how the colors are textures are mostly preserved from the images on the left (fine style), while larger features (music symbols, creepy faces, sharp angles) are largely preserved from the top row. All look very "Bacchus," even though the content (his image) is only the initial condition. We could also use noise as the initial condition and apply fine and coarse features to create chaotic and beautiful art (Fine: The Water-lily Pond; Coarse: Composition VII):

Triple style transfer

As can be observed from the Exploration folder, there is a lot of difference between the 1st, 3rd, and 5th block of the network. We could instead try to transfer three styles to the image - roughly as colors, small features, and large features (A, B, C respectively). This is in TriNST.ipynb. These style transfers are definitely more tempermental - if adjacent styles are too discordant, there tend to be a lot of artifacts produced (e.g. dots of red or green in a black surface). Turning up the smoothing (v_w) can help a lot, but it isn't always enough without completely blurring out the image.

In order, these are:

	Colors	Fine Style	Coarse Style
1	Bridges of Paris	Window on the City	Christ in Limbo
2	Two Nudes	Bridges of Paris	Window on the City
3	Coast of Northumberland[1]	Christ in Limbo	Composition VII
4	Christ in Limbo	Triumph of Death	Composition VII
5	Window on the City	Triumph of Death	Composition VII
6	Bridges of Paris	Tower of Babel	Christ in Limbo
7	Triumph of Death	Tondal's Vision	Christ in Limbo
8	Bridges of Paris	Window on the City	Two Nudes

[1] J.M.W. Turner's Wreckers - Coast of Northumberland

More control over dual transfer with triple transfer

In addition to making the transfer of three styles possible, it is possible to use this triple style transfer to improve and give a finer control on a dual transfer, for instance, having one image tranfer A & B, and the second transfer C, or one do A, and the other do B & C (where A = colors; B = small features; C = large features). Below from left to right, we have Water-lily Pond + Composition VII using 1) the dual method above; 2) A & B - Water-lily, C - Composition; 3) A - Water-lily, B & C - Composition. It is clear how much of a difference these middle layers can make in the resultant image.

More control over single transfer with triple transfer

Similarly, we don't have to transfer all levels. By setting some weights to 0, we can just transfer some of the feature scales. Here we transfer only (left to right) 1) A & B; 2) A & C; 3) B & C from Albert Gleizes's Bridges of Paris.

More subtle influence with triple transfer

We can also use the content image as a choice for style image in order to preserve more features. For example, here is Tower of Babel as content, with Monet's Blue Water Lillies applied in levels A & B, while C is again Tower of Babel Style:

We can choose multiple ways to apply this. As an example, we will show Tower of Babel as content, Bridges of Paris / Babel as style - in order 1) ABC - Babel (no Paris applied, only smoothing); 2) AB - Babel, C - Paris; 3) AC - Babel, B - Paris; 4) BC - Babel, A - Paris; 5) A - Babel, BC - Paris; 6) B - Babel, AC - Paris; 7) C - Babel, AB - Paris; 8) ABC - Paris

Notice how A is really governing the color scheme (1235 vs 4678); B is capturing small details, e.g., tower windows, foreground (1246 vs 3578); and C is getting the largest features, e.g., shape of the tower, surrounding hills (1347 vs 2568).

Summary

Neural style transfer can be customized quite a bit. The transfer of styles from different regions can result in very different images. This technique can produce some fascinating visual art, and also serve to help us better understand the layers in a convnet.

Quick tips on using

The jupyter notebooks should just work out of box, once the requisite files are properly linked. With a GPU, this should take about a minute to run over an image (a bit less time for single transfer). Without a GPU - you are in for a long haul (e.g. half an hour). If you want to experiment and don't have your own, I'd encourage you to use the GPUs on Google colab or kaggle.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
DualImages		DualImages
Exploration		Exploration
Images		Images
ParentImages		ParentImages
TriImages		TriImages
DualNST.ipynb		DualNST.ipynb
NST.ipynb		NST.ipynb
README.md		README.md
TriNST.ipynb		TriNST.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Style Transfer

Single style

What is Neural Style Transfer?

What is different about my implementation?

Dual style transfer

Triple style transfer

More control over dual transfer with triple transfer

More control over single transfer with triple transfer

More subtle influence with triple transfer

Summary

Quick tips on using

About

Releases

Packages

Languages

jaredaevans/MultipleNeuralStyleTransfer

Folders and files

Latest commit

History

Repository files navigation

Neural Style Transfer

Single style

What is Neural Style Transfer?

What is different about my implementation?

Dual style transfer

Triple style transfer

More control over dual transfer with triple transfer

More control over single transfer with triple transfer

More subtle influence with triple transfer

Summary

Quick tips on using

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages