prisma seems to preserve more detail #58

xlvector · 2016-10-30T05:43:55Z

I find prisma seems to preserve more details when do style transfer. Following is an example:

origin image

udnie style

prisma use udnie style

fast-neural-style results use udnie style

There are following differences:

udnie style have many red, yellow, orange color, which also appears in fast-neural-style result, but does not exist in prisma result
prisma result have smooth sky
prisma preserve more details in bridge and tree

I do not know how does prisma achieve this, I have already tune many hyper parameters during training.

htoyryla · 2016-10-30T08:49:51Z

I guess Prisma is copying color from the original image here, which could explain some of the differences.

Somehow I feel that while both results are interesting, neither really captures much of the original style. Fast-neural-style fills the picture with a colored mesh which indeed captures the forms in the content image, using colors from the style image but not really resembling the shapes, their scale and feeling of the original style.

I may be mistaken but I think the iterative neural-style was better in real style transfer. Fast-neural-style is a great tool for creating styles but these styles tend to look very different from the originals. I have had similar experiences with texture_nets, with which I experimented for days trying get to the style reproduced in more or less the original scale, until I gave up and moved to something else. I have not yet tried the same with fast-neural-style.

By the way, it looks to me like the new style transfer methods can easily fill the canvas with decorating stylistic details, but the opposite, simplifying, is difficult. And yet, much of art is about simplifying what you see and capturing it in an image. Prisma, in this example I think, is closer, but not exactly what I am after.

PS. I think that the fast_neural_style result uses quite a lot of style weight. I am training just now with content weight 1 and style weight 5 and the result looks much more like Prisma's, without the mesh in the sky, but more simple, without detail. Actually I am quite pleased with the result.

I am not using MSCOCO dataset but a set of 2500 of my own photos, mainly places and landscapes. The dataset seems to matter, it may be worth while trying a dataset with the kind of images one intends to use with the style.

This is after 8000 iterations, so still quite early. What I wrote above was based on even earlier iterations. I wonder if the mesh in the sky is growing with the iterations. The earlier snapshots were simpler, with a clear sky with some clouds. Now there are already signs of the colored mesh.

htoyryla · 2016-10-30T12:29:54Z

Here's the result from my newly trained model after 40k iterations. The "colored mesh" did not spread out throughout the sky, as I feared, but in fact retreated and the overall look improved. But still, it has almost no similarity to the original style.

In this image I'd point out the white "ghosts" behind the bridge and the buildings on the left. Here they blend quite well into the background but in my experiments I've seen images almost totally dominated by such "ghost" shapes, especially in the sky, and especially with higher style weights.

htoyryla · 2016-10-30T16:09:45Z

I repeated the training with content_weight=1, style_weight=3, up to 40k iterations. The full command was:

th train.lua -h5_file /work/hplaces256.h5 -style_image /home/hannu/Downloads/udnie.jpg -checkpoint_name udnie-hplaces256b -style_weights 3.0 -content_weights 1.0 -gpu 0 -arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,U2,U2,c9s1-3

Here's the resulting image. I then wrote a script to copy original colors to an image (https://gist.github.com/htoyryla/147f641f2203ad01b040f4b568e98260) and using it made the second image.

I think it would be possible to get even closer to the Prisma look by still finetuning the weights, and to get a touch of detail, blend the different channels of the original image in a suitable mix into resulting image, instead of simply copying the color information.

Output from fast_neural_style.lua:

Output from original_colors.lua:

jcjohnson · 2016-10-30T16:19:24Z

Wow, nice work @htoyryla! I think your Udnie model is better than mine :)

In general I don't think that Prisma is doing anything fundamentally different from fast-neural-style; I think they have just spent a lot of time and effort carefully tuning the hyperparameters of their models to give nice effects. I think they also do some post-processing to blend the raw neural-net output with the content image; there are a lot of different ways to blend images, and I think they also tune the post-processing per style to make sure their results are nice.

htoyryla · 2016-10-30T16:30:25Z

"I think they also do some post-processing to blend the raw neural-net output with the content image; there are a lot of different ways to blend images, and I think they also tune the post-processing per style to make sure their results are nice."

That's exactly what I was thinking when I wrote about blending. I simply copied the Y channel, but if one wants a touch of detail then I think one could blend some of the other channels. And the optimum way to do this is likely to be specific to style.

I used my own dataset consisting of 2500 photos, places and landscapes. I have noticed that using it can give quite different results from mscoco. I'll check now the same training but using mscoco.

jcjohnson · 2016-10-30T16:34:54Z

I used my own dataset consisting of 2500 photos, places and landscapes. I have noticed that using it can give quite different results from mscoco. I'll check now the same training but using mscoco.

Interesting; I have only tried training with COCO but I'm pretty sure the training images are important. I think the number of training images is also important; In Dmitry's Instance Normalization paper (https://arxiv.org/abs/1607.08022) he mentions that his best results were trained with only 16 content images. I haven't done much experimentation with training sets, but this seems to be an important area for exploration.

htoyryla · 2016-10-30T19:44:11Z

Have now trained using COCO but otherwise the same parameters. Different but not too different. Looks almost as if I had used a bit higher style weight.

From fast-neural-style.lua:

After original_color.lua:

xlvector · 2016-10-31T00:12:04Z

The results looks awesome! Thanks!

I will try to do more works on post-processing.

xlvector · 2016-10-31T00:14:32Z

@jcjohnson I have tried to train with 200K images from coco, MIT space, and imageNet. Seems does not get better results. I will try this again later.

xlvector · 2016-11-01T07:00:26Z

@jcjohnson what is your parameter for the_wave style. I seems can not reproduce your results with parameter in print_options.lua.

Following is my result:

Your results

prisma

jcjohnson · 2016-11-01T14:51:54Z

My wave model does not use instance norm, so you should set -use_instance_norm 0 if you want to duplicate my results.

piteight · 2016-11-02T10:52:01Z

Is there a way to change parameters manually inside a model? Something like selecting weights that interests us, and changing their values? This script uses vgg16 model, but I also have not trained vgg19 model which gave me better results in slow neural style from fzliu. Can I switch to training vgg19 model with your script?
Here's my results in vgg16:)
trained image

th train.lua -h5_file ../Baz.h5 -style_image ../stasio.jpg -style_image_size 256 -content_weights 1.0 -style_weights 5.0 -checkpoint_name stasio -gpu 0 -use_cudnn 1 -backend cuda -batch_size 2 -checkpoint_every 100

th train.lua -h5_file ../Baz.h5 -style_image ../stasio.jpg -style_image_size 300 -content_weights 3.0 -style_weights 1.0 -checkpoint_name ztasio -gpu 0 -use_cudnn 1 -backend cuda -batch_size 2 -checkpoint_every 100 -arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,U2,U2,c9s1-3

th train.lua -h5_file ../Baz.h5 -style_image ../stasio.jpg -style_image_size 300 -content_weights 3.0 -style_weights 5.0 -checkpoint_name Xtasio -gpu 0 -use_cudnn 1 -backend cuda -batch_size 2 -checkpoint_every 100 -arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,U2,U2,c9s1-3 -max_train 1

th train.lua -h5_file ../Baz.h5 -style_image ../stasio.jpg -style_image_size 300 -content_weights 0.5 -style_weights 8.0 -checkpoint_name Ytasio -gpu 0 -use_cudnn 1 -backend cuda -batch_size 2 -checkpoint_every 100 -arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,U2,U2,c9s1-3

The first model is the best, but It have some glitches that I want to erease in future training sessions.
Everything was trained with COCO 40k images.

htoyryla · 2016-11-02T10:57:01Z

I see you have copied the arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,U2,U2,c9s1-3 from my comment. As commented by @jcjohnson in another thread, that might be a poor choice, and perhaps one should add a conv layer between the two U2 layers (even if for me it worked without).

piteight · 2016-11-02T12:37:13Z

Yes, I thought I would give it a try, to see different methods than only changing weights of style and content. The second version is quite good, because of the sky, except this noise pattern. It worket good for chicago.jpg,

but in example with person, the result was very poor:

first example gave me this output:

I will try putting the conv layer as You mentioned :)

universewill · 2019-05-31T08:57:32Z

@htoyryla can u make your dataset and pretrained models' parameters available?

htoyryla · 2019-05-31T09:00:54Z

@htoyryla can u make your dataset and pretrained models' parameters available?

After almost three years of doing other things, no, sorry.

Runescaped mentioned this issue Nov 20, 2016

What's the process of training a new style image's model? #74

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prisma seems to preserve more detail #58

prisma seems to preserve more detail #58

xlvector commented Oct 30, 2016 •

edited

htoyryla commented Oct 30, 2016 •

edited

htoyryla commented Oct 30, 2016 •

edited

htoyryla commented Oct 30, 2016 •

edited

jcjohnson commented Oct 30, 2016

htoyryla commented Oct 30, 2016

jcjohnson commented Oct 30, 2016

htoyryla commented Oct 30, 2016

xlvector commented Oct 31, 2016

xlvector commented Oct 31, 2016

xlvector commented Nov 1, 2016

jcjohnson commented Nov 1, 2016

piteight commented Nov 2, 2016

htoyryla commented Nov 2, 2016

piteight commented Nov 2, 2016

universewill commented May 31, 2019

htoyryla commented May 31, 2019

prisma seems to preserve more detail #58

prisma seems to preserve more detail #58

Comments

xlvector commented Oct 30, 2016 • edited

htoyryla commented Oct 30, 2016 • edited

htoyryla commented Oct 30, 2016 • edited

htoyryla commented Oct 30, 2016 • edited

jcjohnson commented Oct 30, 2016

htoyryla commented Oct 30, 2016

jcjohnson commented Oct 30, 2016

htoyryla commented Oct 30, 2016

xlvector commented Oct 31, 2016

xlvector commented Oct 31, 2016

xlvector commented Nov 1, 2016

jcjohnson commented Nov 1, 2016

piteight commented Nov 2, 2016

htoyryla commented Nov 2, 2016

piteight commented Nov 2, 2016

universewill commented May 31, 2019

htoyryla commented May 31, 2019

xlvector commented Oct 30, 2016 •

edited

htoyryla commented Oct 30, 2016 •

edited

htoyryla commented Oct 30, 2016 •

edited

htoyryla commented Oct 30, 2016 •

edited