Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prisma seems to preserve more detail #58

Open
xlvector opened this issue Oct 30, 2016 · 16 comments
Open

prisma seems to preserve more detail #58

xlvector opened this issue Oct 30, 2016 · 16 comments

Comments

@xlvector
Copy link

xlvector commented Oct 30, 2016

I find prisma seems to preserve more details when do style transfer. Following is an example:

origin image

free

udnie style

udnie

prisma use udnie style

free_udnie

fast-neural-style results use udnie style

th_free_udnie

There are following differences:

  1. udnie style have many red, yellow, orange color, which also appears in fast-neural-style result, but does not exist in prisma result
  2. prisma result have smooth sky
  3. prisma preserve more details in bridge and tree

I do not know how does prisma achieve this, I have already tune many hyper parameters during training.

@htoyryla
Copy link
Contributor

htoyryla commented Oct 30, 2016

I guess Prisma is copying color from the original image here, which could explain some of the differences.

Somehow I feel that while both results are interesting, neither really captures much of the original style. Fast-neural-style fills the picture with a colored mesh which indeed captures the forms in the content image, using colors from the style image but not really resembling the shapes, their scale and feeling of the original style.

I may be mistaken but I think the iterative neural-style was better in real style transfer. Fast-neural-style is a great tool for creating styles but these styles tend to look very different from the originals. I have had similar experiences with texture_nets, with which I experimented for days trying get to the style reproduced in more or less the original scale, until I gave up and moved to something else. I have not yet tried the same with fast-neural-style.

By the way, it looks to me like the new style transfer methods can easily fill the canvas with decorating stylistic details, but the opposite, simplifying, is difficult. And yet, much of art is about simplifying what you see and capturing it in an image. Prisma, in this example I think, is closer, but not exactly what I am after.

PS. I think that the fast_neural_style result uses quite a lot of style weight. I am training just now with content weight 1 and style weight 5 and the result looks much more like Prisma's, without the mesh in the sky, but more simple, without detail. Actually I am quite pleased with the result.

I am not using MSCOCO dataset but a set of 2500 of my own photos, mainly places and landscapes. The dataset seems to matter, it may be worth while trying a dataset with the kind of images one intends to use with the style.

This is after 8000 iterations, so still quite early. What I wrote above was based on even earlier iterations. I wonder if the mesh in the sky is growing with the iterations. The earlier snapshots were simpler, with a clear sky with some clouds. Now there are already signs of the colored mesh.
undie_ny-test000

@htoyryla
Copy link
Contributor

htoyryla commented Oct 30, 2016

Here's the result from my newly trained model after 40k iterations. The "colored mesh" did not spread out throughout the sky, as I feared, but in fact retreated and the overall look improved. But still, it has almost no similarity to the original style.

In this image I'd point out the white "ghosts" behind the bridge and the buildings on the left. Here they blend quite well into the background but in my experiments I've seen images almost totally dominated by such "ghost" shapes, especially in the sky, and especially with higher style weights.

undie_ny-test000h

@htoyryla
Copy link
Contributor

htoyryla commented Oct 30, 2016

I repeated the training with content_weight=1, style_weight=3, up to 40k iterations. The full command was:

th train.lua -h5_file /work/hplaces256.h5 -style_image /home/hannu/Downloads/udnie.jpg -checkpoint_name udnie-hplaces256b -style_weights 3.0 -content_weights 1.0 -gpu 0 -arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,U2,U2,c9s1-3

Here's the resulting image. I then wrote a script to copy original colors to an image (https://gist.github.com/htoyryla/147f641f2203ad01b040f4b568e98260) and using it made the second image.

I think it would be possible to get even closer to the Prisma look by still finetuning the weights, and to get a touch of detail, blend the different channels of the original image in a suitable mix into resulting image, instead of simply copying the color information.

Output from fast_neural_style.lua:
undie_ny-test001z

Output from original_colors.lua:
t001z

@jcjohnson
Copy link
Owner

Wow, nice work @htoyryla! I think your Udnie model is better than mine :)

In general I don't think that Prisma is doing anything fundamentally different from fast-neural-style; I think they have just spent a lot of time and effort carefully tuning the hyperparameters of their models to give nice effects. I think they also do some post-processing to blend the raw neural-net output with the content image; there are a lot of different ways to blend images, and I think they also tune the post-processing per style to make sure their results are nice.

@htoyryla
Copy link
Contributor

"I think they also do some post-processing to blend the raw neural-net output with the content image; there are a lot of different ways to blend images, and I think they also tune the post-processing per style to make sure their results are nice."

That's exactly what I was thinking when I wrote about blending. I simply copied the Y channel, but if one wants a touch of detail then I think one could blend some of the other channels. And the optimum way to do this is likely to be specific to style.

I used my own dataset consisting of 2500 photos, places and landscapes. I have noticed that using it can give quite different results from mscoco. I'll check now the same training but using mscoco.

@jcjohnson
Copy link
Owner

I used my own dataset consisting of 2500 photos, places and landscapes. I have noticed that using it can give quite different results from mscoco. I'll check now the same training but using mscoco.

Interesting; I have only tried training with COCO but I'm pretty sure the training images are important. I think the number of training images is also important; In Dmitry's Instance Normalization paper (https://arxiv.org/abs/1607.08022) he mentions that his best results were trained with only 16 content images. I haven't done much experimentation with training sets, but this seems to be an important area for exploration.

@htoyryla
Copy link
Contributor

Have now trained using COCO but otherwise the same parameters. Different but not too different. Looks almost as if I had used a bit higher style weight.

From fast-neural-style.lua:

undie_ny-mscoco001z

After original_color.lua:

t002z

@xlvector
Copy link
Author

The results looks awesome! Thanks!

I will try to do more works on post-processing.

@xlvector
Copy link
Author

@jcjohnson I have tried to train with 200K images from coco, MIT space, and imageNet. Seems does not get better results. I will try this again later.

@xlvector
Copy link
Author

xlvector commented Nov 1, 2016

@jcjohnson what is your parameter for the_wave style. I seems can not reproduce your results with parameter in print_options.lua.

Following is my result:

free_coco_wave7

Your results

th_free_wave

prisma

free_wave

@jcjohnson
Copy link
Owner

My wave model does not use instance norm, so you should set -use_instance_norm 0 if you want to duplicate my results.

@piteight
Copy link

piteight commented Nov 2, 2016

Is there a way to change parameters manually inside a model? Something like selecting weights that interests us, and changing their values? This script uses vgg16 model, but I also have not trained vgg19 model which gave me better results in slow neural style from fzliu. Can I switch to training vgg19 model with your script?
Here's my results in vgg16:)
trained image
stasio

th train.lua -h5_file ../Baz.h5 -style_image ../stasio.jpg -style_image_size 256 -content_weights 1.0 -style_weights 5.0 -checkpoint_name stasio -gpu 0 -use_cudnn 1 -backend cuda -batch_size 2 -checkpoint_every 100

ay

th train.lua -h5_file ../Baz.h5 -style_image ../stasio.jpg -style_image_size 300 -content_weights 3.0 -style_weights 1.0 -checkpoint_name ztasio -gpu 0 -use_cudnn 1 -backend cuda -batch_size 2 -checkpoint_every 100 -arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,U2,U2,c9s1-3
az
th train.lua -h5_file ../Baz.h5 -style_image ../stasio.jpg -style_image_size 300 -content_weights 3.0 -style_weights 5.0 -checkpoint_name Xtasio -gpu 0 -use_cudnn 1 -backend cuda -batch_size 2 -checkpoint_every 100 -arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,U2,U2,c9s1-3 -max_train 1
azz

th train.lua -h5_file ../Baz.h5 -style_image ../stasio.jpg -style_image_size 300 -content_weights 0.5 -style_weights 8.0 -checkpoint_name Ytasio -gpu 0 -use_cudnn 1 -backend cuda -batch_size 2 -checkpoint_every 100 -arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,U2,U2,c9s1-3
azzz

The first model is the best, but It have some glitches that I want to erease in future training sessions.
Everything was trained with COCO 40k images.

@htoyryla
Copy link
Contributor

htoyryla commented Nov 2, 2016

I see you have copied the arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,U2,U2,c9s1-3 from my comment. As commented by @jcjohnson in another thread, that might be a poor choice, and perhaps one should add a conv layer between the two U2 layers (even if for me it worked without).

@piteight
Copy link

piteight commented Nov 2, 2016

Yes, I thought I would give it a try, to see different methods than only changing weights of style and content. The second version is quite good, because of the sky, except this noise pattern. It worket good for chicago.jpg,
chicagoz
but in example with person, the result was very poor:
ztkaw

first example gave me this output:
out
tkaw

I will try putting the conv layer as You mentioned :)

@universewill
Copy link

@htoyryla can u make your dataset and pretrained models' parameters available?

@htoyryla
Copy link
Contributor

@htoyryla can u make your dataset and pretrained models' parameters available?

After almost three years of doing other things, no, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants