New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing features from the "Controlling Perceptual Factors in Neural Style Transfer" research paper #376

Open
ProGamerGov opened this Issue Feb 10, 2017 · 228 comments

Comments

Projects
None yet
5 participants
@ProGamerGov
Copy link

ProGamerGov commented Feb 10, 2017

I have been trying to implement the features described in the "Controlling Perceptual Factors in Neural Style Transfer" research paper.

The code that used for the research paper can be found here: https://github.com/leongatys/NeuralImageSynthesis

The code from Leon Gatys' NeuralImageSynthesis is written in Lua, and operated with an iPython notebook interface.


So far, my attempts to transfer the features into Neural-Style have failed. Has anyone else had success in transferring the features?

Looking at the code, I think that:

In order to make NeuralImageSynthesis alongside your Neural-Style install, you must replace every instance of /usr/local/torch/install/bin/th with /home/ubuntu/torch/install/bin/th. You must also install hdf5 with luarocks install hdf5, matplotlib with sudo apt-get install python-matplotlib, skimage with sudo apt-get install python-skimage, and scipy with sudo pip install scipy. And of course you need to install and setup jupyter if you want to use the notebooks.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 11, 2017

Ok, I think I have gotten the new -reflectance parameter working, though I don't know what it does: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua

Though it seems to alter the output.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 12, 2017

Multires without -reflectance: https://i.imgur.com/LvpXgaW.png

Multires with -reflectance: https://i.imgur.com/YIiqsOx.png

The -reflectance command increases the GPU usage.

Content image: https://i.imgur.com/sgLtFDi.png

Style image: https://i.imgur.com/PsXIJLM.jpg

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Feb 12, 2017

It seems to me that your code inserts the new padding layer after the convolution layer which already has done padding, so that padding is done twice (first with zeroes in nn.SpatialConvolution and the by reflection in nn.SpatialReflectionPadding). It is like first adding an empty border and the another one which acts as if a mirror. It would seem to me that the mirror then only reflects the empty border that was added first.

If you look closely at Gatys' code in https://github.com/leongatys/NeuralImageSynthesis/blob/master/ImageSynthesis.lua#L85-L94 you'll notice that the new padding layer is inserted first, and then the convolution layer without padding.

Your code also increases the size of the layer output, as padding is done twice, which might give size mismatch errors.

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Feb 12, 2017

In my previous comment, I overlooked the fact that it is possible to change the layer parameters after the layer has been added to the model. Thus the lines https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L140-L141 in fact remove the padding from the already inserted convolution layer, so the double padding does not happen and the size of the output is not changed.

Thus the main difference between your code and Gatys' is that you do padding after the convolution, while the normal practice is to do padding before convolution.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 12, 2017

@htoyryla

Thus the main difference between your code and Gatys' is that you do padding after the convolution, while the normal practice is to do padding before convolution.

So the reflectance padding works correctly, though I have placed it in the wrong location?

This code here is the convolution: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L131-L142 ?

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 12, 2017

And for implementing the masks, Gatys' implementation uses hdf5 files, though Neural-Style does not:

cmd:option('-mask_file', 'path/to/HDF5file', 'Spatial mask to constrain the gradient descent to specific region')

    -- Load mask if specified
    local mask = nil
    if params.mask_file ~= 'path/to/HDF5file' then
        local f = hdf5.open(params.mask_file, 'r')
        mask = f:all()['mask']
        f:close()
        mask = set_datatype(mask, params.gpu)
    end

I have been trying to figure out how to modify the above code for Neural-Style masks, but non of my attempts to replace the hdf5 requirement have worked thus far. Any ideas?

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Feb 12, 2017

The code you now linked looks better, now the padding is inserted (line #127) before the convolution (line #141). Most of what you have highlighted is NOT the convolution but related to selecting between max and avg pooling. But if you follow the if logic, if the layer is convolution it will be inserted to the model in line 141 of your present code.

I cannot guarantee that it now works but now the padding and convolution come in the correct order.

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Feb 12, 2017

"I have been trying to figure out how to modify the above code for Neural-Style masks, but non of my attempts to replace the hdf5 requirement have worked thus far. Any ideas?"

The code you cited does not implement any mask functionality, it only loads a mask from an existing hdf5 file.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 12, 2017

I ran a quick test with the -reflectance option. The change is not particularly obvious at first glance, but it does appear to cause a change. More testing, and different parameter combinations could be needed to farther understand it's affect on artistic outputs.

On the left is the control test with -reflectance false, and on the right is -reflectance true:

Direct link to the comparison: https://i.imgur.com/YGCOCiu.png

False: https://i.imgur.com/0oQNsxl.png

True: https://i.imgur.com/a7fQTLb.png

Command used:

th neural_style.lua -seed 876 -reflectance -num_iterations 1500 -init image -image_size 640 -print_iter 50 -save_iter 50 -content_image examples/inputs/hoovertowernight.jpg -style_image examples/inputs/starry_night.jpg -backend cudnn -cudnn_autotune

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 12, 2017

Are Gatys' Grad related functions different that Neural-Styles? I'm looking for where the style masks come into play. Or should I be looking at different functions for implementing these features like masks?

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 12, 2017

From what I can see, luminescence style transfer requires the LUV color space, which unlike YUV, it has no easy to use function in the image library.

Style masks seem to require a modifying deeper levels of the Neural-Style code.


For the independent style_scale control with multiple style images, it seems like we only need a way to disable content loss:

From the research paper:

We initialise the optimisation procedure with the coarse-scale image and omit the content loss entirely, so that the fine-scale texture from the coarse-style image will be fully replaced.

And then a simple sh script similar to multires.sh should do the trick. That runs your style images through Neural-Style first should do the trick, but such a script needs a way to disable content loss.

I am thinking that a parameter like:

cmd:option('-content_loss', true, 'if set to false, content loss will be disabled')

if params.reflectance then 

content loss code

end

@htoyryla Which part of the content loss code should this be implemented on to achieve the desired effect?

https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L461-L497

Or: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L109

Edit: I figured it out and now the content loss module can be disabled.

Currently testing different parameters alongside the new -content_loss parameter: https://gist.github.com/ProGamerGov/7f3d2b6656e02a7a4a23071bd0999b31

I edited this part of the neural_style.lua script: https://gist.github.com/ProGamerGov/7f3d2b6656e02a7a4a23071bd0999b31#file-neural_style-lua-L148-L151

Though I think that I need to find a way to transfer the color from the intended content image, to this first Neural-Style run with the two style images. Seeing as -init image includes, content as well, maybe I need to add another new parameter, or maybe using -original_color 1 on step two will solve this problem?

Second Edit:

It seems that -content_layers relu1_1,relu2_1 and the default style layers work the best, Though the research paper only specified layers relu1_1 and relu2_1, not whether you should use those values for content or style layers.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 13, 2017

I must be missing something when trying to replicate the "Naive scale combination" from here: https://github.com/leongatys/NeuralImageSynthesis/blob/master/ExampleNotebooks/ScaleControl.ipynb

Following the steps on the research paper:


Should result in something like this output that I made running Gatys' iPython code: https://i.imgur.com/boz8PhW.jpg

And the styled style image from his code: https://i.imgur.com/6xEumk0.jpg


But instead I get this:

The styled style image: https://i.imgur.com/30HUeOH.png

And here is the final output: https://i.imgur.com/SWhzMn0.png

I tried this code to create the styled style image: https://gist.github.com/ProGamerGov/53979447d09fe6098d4b00fc8e924109

And then ran:

th neural_style_c.lua -original_colors 1 -output_image out.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out7.png -image_size 640 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune


The final content image: https://raw.githubusercontent.com/leongatys/NeuralImageSynthesis/master/Images/ControlPaper/fig4_content.jpg

The two style images:

https://raw.githubusercontent.com/leongatys/NeuralImageSynthesis/master/Images/ControlPaper/fig4_style3.jpg

https://raw.githubusercontent.com/leongatys/NeuralImageSynthesis/master/Images/ControlPaper/fig4_style2.jpg


What am I doing wrong here?

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 13, 2017

Ok, so analyzing the styled style image from Gatys' code:

The outputs have the parameters used, and the values used, in the name:

[scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_hrpt_layer_relu4_1_hrsz_1024_model_norm_pad_ptw_1.0E+05]

I think was used to make this: https://i.imgur.com/6xEumk0.jpg


From another experiment using his code:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_Amazing-Nature_3840x2160.jpg_simg_raime.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg

The enlarged version (I think 1 step multires?):

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_raime.jpg_simg_Amazing-Nature_3840x2160.jpg_pt_layer_relu2_1_sz_512_hrsz_1024_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg.filepart


And:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_raime.jpg_simg_Amazing-Nature_3840x2160.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg


The layers used are: relu2_1 and relu4_1

Style weight is: sw_2.0E+08

Content weight is: cw_1.0E+05

The Normalized VGG-19 model is used: model_norm

Not sure what this is: ptw_1.0E+05

Naive Scale mix is the best version, and also the styled style image: naive_scalemix.jpg

Not sure if pt_layer refers to both style_layers and content_layers, or just one of them?

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 13, 2017

On the subject of Gram Matrices (Leon Gatys said this would be important for transferring features to Neural-Style):

Neural-Style is normalising the Gram Matrices differently, as it additionally divides by the number of features, when compared with Gatys' code. This means that the style loss weights for the different layers in Neural-Style and Gatys' code are a little different:

In a layer l with n_l = 64 features, a style loss weight of 1 in Neural-Style, is a style loss weight of 1/64^2 in Gatys' code.

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Feb 13, 2017

"Neural-Style is normalising the Gram Matrices differently, as it additionally divides by the number of features, when compared with Gatys' code. This means that the style loss weights for the different layers in Neural-Style and Gatys' code are a little different:

In a layer l with n_l = 64 features, a style loss weight of 1 in Neural-Style, is a style loss weight of 1/64^2 in Gatys' code."

I am not familiar with Gatys's code, but what you wrote is confusing. First you say that Neural_style divides the Gram matrix by the number of features, but in your example you don't do this division.

If Gatys' normalizes by 1/C^2 where C is the number of features, it makes sense to me as the size of the Gram matrix is CxC.

In neural_style, the gram matrix is normalized for style loss in the line https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L534
Here, input:nElements() is not C but CxHxW, where C,H,W are the dimensions of the layer to which the Gram matrix is added, so that in practice neural-style ends up with a smaller value for the normalized style loss than 1/C^2.

Dividing instead by self.G:nElements() would implement division by C^2 so if that's what you want, try it.

I don't know if this use of input:nElement() instead of self.G:nElements() here is intentional or an accident. @jcjohnson ?

There has been an earlier discussion about this division but there was nothing on this in particular: #90

PS. I checked the corresponding code in fast-neural-style https://github.com/jcjohnson/fast-neural-style/blob/master/fast_neural_style/GramMatrix.lua#L46-L49 which also normalizes the Gram matrix by 1/(CHW), so I guess this is done on purpose. After all, normalizing by 1/C^2 would favor the lower layers too much.

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Feb 13, 2017

I ran a quick test with the -reflectance option. The change is not particularly obvious at first glance, but it does appear to cause a change.

As padding only means adding a few pixels around the image I wouldn't expect large changes. Mostly this should be visible close to the edges, and indeed there appears to be a difference along the left hand side.

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Feb 13, 2017

Changing line https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L534 to divide by self.G:nElement(), I ran neural-style with defaults and got this.

outcxc

whereas with the original the resulting image was

outchw

Now, they are obviously different but as the style weight has been effectively increased, we should not read too much into this difference. Anyway, this is worth more testing and the idea of normalizing this way makes intuitively sense to me.

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Feb 13, 2017

Concerning YUV... I was under the impression that Y is the luminance.

When you want to disable content_loss, why not simply set content_weight to 0?

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Feb 13, 2017

It looks like the 1/C^2 style normalization favors the lowest layers which have smaller C (64 for conv1 as opposed to 512 for conv5). The original neural-style behavior 1/(CxHxW) penalizes less the higher layers because H and W decrease when going to higher layers.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 13, 2017

When you want to disable content_loss, why not simply set content_weight to 0?

I will try that as well later today. I think my settings from before were to different from Gatys' settings.

The other issue is that I think transferring the color from a third image, might be needed, as I would imagine that Gatys' would have used something similar to -original_colors 1 if it were the better solution.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 14, 2017

I think I figure out the style combination:

The styled style image: https://i.imgur.com/G1eZerW.png

This was used to produce the final image:

th neural_style.lua -original_colors 1 -style_weight 10000 -output_image out3.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out1_200.png -image_size 512 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

And this was used to produce the styled style image:

th neural_style_c.lua -content_weight 0 -style_weight 10000 -output_image out1.png -num_iterations 200 -content_image fig4_style3.jpg -style_image fig4_style1.jpg -image_size 2800 -content_layers relu2_1 -style_layers relu2_1 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune


I wonder if something similar could be accomplished by being able to control the layers each style image uses?


I am unable to produce a larger version like Gatys was able to do, Any larger images seem to be blurry, and the shapes begin to fade. The darkness of Seated Nude seems to make this harder as the dark areas seem to take over areas on the new style image in my experiments.

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Feb 14, 2017

A note on 1/C^2 gram matrix normalization: this line also needs to be changed https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L553 so that the backward pass too will use the normalized matrix.

This will require quite different weights, like content_weight 1e3 and style_weight 1, it can take some 300 iterations before the image starts really to develop, but to me the results look good. I am talking about plain neural_style with modifed Gram matrix normalization. Haven't really looked deeper into the Gatys project.

@VaKonS

This comment has been minimized.

Copy link

VaKonS commented Feb 14, 2017

ProGamerGov, just a little suggestion: since GPU handling is already implemented in "function setup_gpu(params)" (line 324), maybe it's possible to use that function instead of new "set_datatype(data, gpu)"?

It could make the code more maintainable – in case of any changes someone will have to modify only one function instead of two.

For example: pad_layer = nn.SpatialReflectionPadding(padW, padW, padH, padH):type(dtype)
(see how nn.SpatialAveragePooling(kW, kH, dW, dH):type(dtype) is added in line 136).

Currently I can not test it on GPU, but I can confirm that it does work on CPU.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 14, 2017

@VaKonS

I'll take a look. I originally pasted in Gatys GPU handling code at the time because I couldn't get the reflection function to work with this line of code:

pad_layer = set_datatype(pad_layer, params.gpu)

As I couldn't figure out how to use function setup_gpu with the code.

Are you saying to change this line:

https://github.com/ProGamerGov/neural-style/blob/6814479c8ebcc11498b7c123ee2ba7ef9f0fe09f/neural_style.lua#L125

to this:

local pad_layer = nn.SpatialReflectionPadding(padW, padW, padH, padH):type(dtype)

And then delete this line:

pad_layer = set_datatype(pad_layer, params.gpu)

?

@VaKonS

This comment has been minimized.

Copy link

VaKonS commented Feb 14, 2017

@ProGamerGov, yes.
And to delete function set_datatype(data, gpu) at line 611, as it will not be needed anymore.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Feb 15, 2017

@VaKonS , I made a version that contains other padding types: https://gist.github.com/ProGamerGov/0e7523e221935442a6a899bdfee033a8

When using -padding, you can try 5 different types of padding: default, reflect, zero, replication, or pad. In my testing, the pad option seems to leave untouched edges on other either side of the image.

Edit: Modified version with htoyryla's suggestions: https://gist.github.com/ProGamerGov/5b9c9f133cfb14cf926ca7b580ea3cc8

The modified version only has two 3 options, default, reflect, or replicate.

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Feb 15, 2017

Types 'reflect' and 'replication' make sense, although with the typical padding width = 1 as in VGG19 the result is identical.

Type 'zero' is superfluous as the convolution layer already pads with zeroes.

Type 'pad' only pads in one dimension so it hardly makes sense.

You should read nn documentation when using the nn layers. The nn.Spatial.... layers are meant to work with two-dimensional data like images. nn.Padding provides a lower level access for padding of tensors, you need to specify which dimension, which side, which value, and if one wants to use it to pad an image one needs to apply it several times with different settings.

But frankly, with the 1-pixel padding in VGG there are not so many ways to pad. We should also remember that the main reason for padding in the convolution layers is to get the correct output size. Without padding convolution tends to shrink the size.

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Feb 15, 2017

The code could also be structured like this (to avoid duplicating code and making the same checks several times). Here I used 'reflect' and 'replicate' as they are shorter, you may prefer 'replication' and 'reflection' as in the layer names. But having one as a verb and the other as a noun is maybe not a good idea.

local is_convolution = (layer_type == 'cudnn.SpatialConvolution' or layer_type == 'nn.SpatialConvolution')   
if is_convolution and params.padding ~= 'default' then
    local padW, padH = layer.padW, layer.padH
    if params.padding == 'reflect' then
        local pad_layer = nn.SpatialReflectionPadding(padW, padW, padH, padH):type(dtype)
    elseif params.padding == 'replicate' then 
        local pad_layer = nn.SpatialReplicationPadding(padW, padW, padH, padH):type(dtype)
    else
        error('Unknown padding type')
   end	
   net:add(pad_layer)
   layer.padW = 0
   layer.padH = 0
end
@VaKonS

This comment has been minimized.

Copy link

VaKonS commented Feb 15, 2017

@htoyryla, reflective padding probably takes pixels starting from 1 pixel distance: [ x-2, x-1, x ] [ x-1, x-2 ].
And replication duplicates the edge: [ x-2, x-1, x ] [ x, x ].

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented May 14, 2017

I made a test to use only two channels per layer, zeroing all the others. This did not work well, style losses were very small, style did not have much effect and increasing style weight did not really help.

This worked better. Instead of zeroing channels, multiply them by e.g. 0.2. Then multiply the rest of the channels by a factor greater than 1. In effect we are now applying channel specific weights.

I am now using a modified input for style only; for hallusinations one might try the same method for content, too.

function inputMask(C, H, W)
	local t = torch.Tensor(C,H,W):fill(1)
	local m = torch.Tensor(C,H,W):fill(1)
	for i=1,C do
          if i < 3 then
            m[i] = m[i]*3
          else
	    m[i] = m[i]*0.2
          end
	end
	return t:cmul(m):cuda()
end 	

By using this on relu3_1,relu4_1 with style weight 1e6 I get this.

nscw002-2ch

Having

          if i < 3 then
            m[i] = m[i]*30
          else
	    m[i] = m[i]*0.2
          end

and again relu3_1,relu4_1 with style weight 2e4 gives this.

nscw002b-2ch

This means we can achieve different results by selectively emphasize specific channels. My interest is in style, so I have been tampering with how the channels influence the style generation. For hallusinative dreaming, doing the same with content loss might give results?

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented May 14, 2017

@VaKonS wrote
"Did you mean to resize the mask to the size of "input"? Then maybe image.scale could help?
But if masks are already image-sized, then transformations are not needed?"

I have not looked into this new code of @ProGamerGov's . But style loss modules do have to adapt to the input size (meaning the size of the input to the style loss module) because it depends not only on the size of the input image but also on the layer to which the module is attached. Also, as I noticed today when working on the code posted just above, the first time StyleLoss:updateOutput gets called before capture mode has been set, and the input has different dimensions than on all later occasions. Probably as a result of this line https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L218. So the style loss module absolutely has to take care of adapting to the size of input, or otherwise a size mismatch will occur.

Despite this, I don't immediately see how maskedStyleLoss is intended to work either.

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented May 14, 2017

I put together a script to allow experimenting giving emphasis (*5) to specific channels on a style layer. Other channels will be attenuated (*0.2). Use param -style_channels (e.g.-style_layers 3_1 -style_channels 17,23).

Note that this is likely to reduce average output from the layers, so style_weight may have to be adjusted.

https://gist.github.com/htoyryla/fe6dd64611638b3db89755d68735d3e3

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented May 17, 2017

I assume these are the are the correct amount of channels for each layer on the default VGG-19 model?

Layer Number Of Channels
conv1_1 64
conv1_2 64
conv2_1 128
conv2_2 128
conv3_1 256
conv3_2 256
conv3_3 256
conv3_4 256
conv4_1 512
conv4_2 512
conv4_3 512
conv4_4 512
conv5_1 512
conv5_2 512
conv5_3 512
conv5_4 512
fc6 4096
fc7 4096
fc8 4096

Neural-Style Console Output:

conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000

I also have found this site helpful for generating large ranges of sequential numbers, for testing: http://textmechanic.com/text-tools/numeration-tools/generate-list-numbers/

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented May 17, 2017

The conv layers look ok to me.

FC layers do not have channels (in the sense of a HxW map corresponding to an image), only single numeric outputs correlating with features. Yet, masking unwanted outputs may also give interesting results.

@VaKonS

This comment has been minimized.

Copy link

VaKonS commented May 26, 2017

Speaking of multi-resolution: it looks like transferring style at lower resolution keeps more complex features of style (for example below, not simply shapes of plasticine pieces, but whole flowers).

The second good thing here is that reprocessing at full resolution preserves almost all features of low resolution pass.
It makes possible to process images of any size, I think.

dual_transfer dual_transfer_2

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented May 26, 2017

"it looks like transferring style at lower resolution keeps more complex features of style (for example below, not simply shapes of plasticine pieces, but whole flowers)"

Sounds like the basic scalability problem in neural-style transfer, the result changes as with higher resolution each neuron sees sees only a smaller part of the image. It has been discussed a lot and multi-resolution processing appears to solve it.

Slightly off topic: three of my neurally assisted works are just now on display at Art Fair Suomi, a large contemporary art event in Helsinki: https://twitter.com/htoyryla/status/867460046665523200

@rodmcfall

This comment has been minimized.

Copy link

rodmcfall commented Jun 12, 2017

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Jun 13, 2017

"Congrats on the Art Fair. Please tell us how they were received, very
curious what the general public think."

It was a big event, 300 artists, altogether around 1000 widely different works of contemporary art. Only very few artists received individual attention. As far as I know, my works were the only ones made using neural techniques, but this fact was not advertised. For me the main thing was that my works fit well together with the rest (which was really varied). Also, when talking with other artists, they showed a clear interest in the methods I had used. I guess that is because most artists who participated somehow work with alternative or experimental forms of art.

@VaKonS

This comment has been minimized.

Copy link

VaKonS commented Jul 22, 2017

Btw, @ProGamerGov, just wanted to thank you for visual examples for different models on Wiki page.
p.s. Updated the link for DeepLab there, they seem to moved the page.

@htoyryla, maybe you'll find it interesting: Network Dissection project has Places model, trained at different iterations – from 1 to 2400818.
Maybe it's possible to discover how training affects stylization.

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Jul 23, 2017

@VaKonS thanks for the link to the Network Dissection site. Also their findings about channels that respond to specific features could be useful, now that I have a version of neural-style which can use speficic channels only.

BTW the places model which is available from multiple iterations is not the VGG16 based places model that I have used, but a smaller one with only 5 conv layers.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Aug 23, 2017

@VaKonS I was able to influence a model's style transfer ability with finetuning here: #292 (comment)

I provided a saved model at every 1000 iterations, so that one could see the change over time caused by the fine-tuning.

@htoyryla

I noticed that on some of the channel examples like this, and this, from my Protobuf-Dreamer project, seem to have "dead channels". These dead channels don't cause any visible hallucinations. So I was wondering if we would find similar "dead channels" by creating hallucinations of different channels from Neural-Style models? And if we can, would style transfer be improved by removing these "dead channels", or would it be improved by only using the "dead channels"?

I have also figured out that one can use a model's image classification abilities to identify what each channel hallucination contains (like dogs, plants, cars, etc...). If we used the model to attach the applicable category names to each channel, and then had some kind of search/selection function, could we create better stylized outputs? Like for example, I would select the "dog" related channels, and potentially some other channels (for artistic effects), to stylize a dog content image.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Aug 25, 2017

@htoyryla @VaKonS Have either of you ever come across shapes that resemble animals, or the classic heart shape? And these shapes did not appear to be in any content or style images that you used?

  • This rooster showed up in one of my style transfer outputs, and I could not find a similar shape in either the content image or the style image.

  • The heart shape seems to be more common than any other shapes. And there were no hearts in either the content or style images used.

Are these shapes and objects produced by the model being as some kind of error or hallucination? Is it just my brain's pattern/shape recognition abilities being tricked? Or is the geometry of these shapes/objects simple and common enough that they can show up in a combination that tricks the brain?

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Aug 25, 2017

@htoyryla @jcjohnson A related thing I experienced was that a classification neural network thought this image was a picture of a dog:

I know dogs were obviously part of the training data used to create the model, so I am wondering if there are dog related shapes hidden in the image. If this is the case, then it seems like Neural-Style might be creating Adversarial Example Images, which are meant to trick classification networks into making incorrect predictions, with details that are normally hidden to the naked eye.

@VaKonS

This comment has been minimized.

Copy link

VaKonS commented Aug 27, 2017

@ProGamerGov, regarding the "dog": maybe this "torch-visbox" project could help to identify a part, sensed a "dog" there (together with @htoyryla's "convis"). I didn't try any of them, though.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Sep 9, 2017

Could anyone point me in the right direction for making the -style_image parameter accept a directory as an input?

@VaKonS

This comment has been minimized.

Copy link

VaKonS commented Sep 12, 2017

@ProGamerGov, you can try this.

  1. Before the line 64 (params.style_image:split(',')), insert this code:

print( "---" )
print( "-style_image:" )
print( params.style_image or "" )
if params.style_image ~= nil then
    print( "---" )
    print( "Bytes of 'params.style_image':" )
    print( string.byte(params.style_image, 1, -1) )
    local style_images_list_with_commas, style_images_list_separators_count = string.gsub(params.style_image, "\10", ",")
    print( "Merged", style_images_list_separators_count, "lines, result:", style_images_list_with_commas )
    params.style_image = style_images_list_with_commas
end
print( "---" )
print( "New 'style_image':" )
print( params.style_image )
print( "---" )

Print operators are just for check and can be removed, of course):

if params.style_image ~= nil then local style_images_list_with_commas, style_images_list_separators_count = string.gsub(params.style_image, "\10", ",") ; params.style_image = style_images_list_with_commas end


  1. Run neural-style with "find" output as the "-style_image" argument:

th neural_style.lua -style_image "$( find DIRECTORY_NAME/* -type f )"

2nd step will pass the directory listing, and 1st step will convert it to a comma-separated list.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Sep 13, 2017

@VaKonS Thanks, I'll try that out. Experiments involving hundreds, or even thousands of style images at once can become really tedious when they have to be added via the normal way.

@VaKonS

This comment has been minimized.

Copy link

VaKonS commented Sep 13, 2017

@ProGamerGov, speaking of "hundreds, or even thousands" – I didn't try that many images with this method (3 works :) ).
If it will hang or something, here is another way, by passing a directory name:
https://github.com/rtqichen/style-swap

There is a parameter --contentBatch for passing a directory, then files other than images are sorted out in "ImageLoader.lua", and as a result you should have a table "files" with images list (it should be similar to "style_image_list", I guess).

This should work with large number of images.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Sep 20, 2017

@htoyryla Have you done any farther experimentation with neural-channels.lua? And if so, what are your findings?

@htoyryla

This comment has been minimized.

Copy link

htoyryla commented Sep 21, 2017

@ProGamerGov No, I haven't really tested or used it any further. Could be interesting, but there is so much else to do.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Sep 28, 2017

@VaKonS I tried your version of neural_style.lua that's been modified to have built in tiling, but it didn't like me using a -style_scale value of 0.5 for any of the scale related parameters you added. It does look promising though in that it blends the tile edges together better than tiling with an external script.

@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Sep 28, 2017

@VaKonS Thanks for the link to the lua project with batch image collection.

I think that using the code below:

local function styleLoader(dir)
    local style_image_files = paths.dir(dir)
    local i=1
    while i <= #style_image_files do
        if not string.find(style_image_files[i], 'jpg$') 
            and not string.find(style_image_files[i], 'png$')
	    and not string.find(style_image_files[i], 'pgm$')
            and not string.find(style_image_files[i], 'ppm$')then
            table.remove(style_image_files, i)
        else
            i = i +1
        end
    end
end

I could simply create a string that style_image_list would accept. Where each time a valid image is found, I append it's name to end to the end of the specified path, and then add a comma for separation (or add the comma first, and then the image and it's path). The style_image_list variable is string based, and thus using a table seems like it could be more complicated than just adding onto the end of a chain of strings.

I'd also have to use the Torch Paths library via: require 'paths', to know how many times to loop for adding a new image to the string. But it would be better if I didn't need to use an extra library.

I also wonder if I can do things in such a way that I only need the existing -style_image parameter, and not a new parameter.


Edit:

This seems to work:

require 'paths'

  local style_image_list
  if is_dir(params.style_image) then     
    style_image_list = styleLoader(params.style_image):split(',')
  else 
    style_image_list = params.style_image:split(',')
  end 

These are the functions used:

function is_dir(path)
    f = io.open(path)
    return not f:read(0) and f:seek("end") ~= 0
end


function styleLoader(dir)
    local style_list
    local style_image_files = paths.dir(dir)
    local i=1
    while i <= #style_image_files do
        if not string.find(style_image_files[i], 'jpg$') 
	    and not string.find(style_image_files[i], 'JPG$')
            and not string.find(style_image_files[i], 'jpeg$')
	    and not string.find(style_image_files[i], 'JPEG$')
	    and not string.find(style_image_files[i], 'png$')
	    and not string.find(style_image_files[i], 'pgm$')
            and not string.find(style_image_files[i], 'ppm$')then
            table.remove(style_image_files, i)
        else
            style_image_files[i] = dir .. style_image_files[i]
            i = i +1
        end
    end
    style_list = table.concat(style_image_files,",")
    return style_list
end

The only issue now is that the image names in the directory lack their associated path. I fixed the issue with: style_image_files[i] = dir .. style_image_files[i]

I found the list of image formats that the Torch Image package accepts, here

The gist file for the above code: https://gist.github.com/ProGamerGov/86c80c74f5e403748659e013447b2499/6de58d6bde6f0fb9ff9b9e20d2ea4294b7d030be

Lines 69-74 and 426-455, implement the new code.


There are some errors when trying to use the -style_image parameter normally:

/home/ubuntu/torch/install/bin/luajit: neural_style_dir.lua:429: attempt to index global 'f' (a nil value)
stack traceback:
        neural_style_dir.lua:429: in function 'is_dir'
        neural_style_dir.lua:70: in function 'main'
        neural_style_dir.lua:644: in main chunk
        [C]: in function 'dofile'
        ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00405d50
ubuntu@ip-Address:~/neural-style$

It appears that my way of checking for a directory or a specific image, can't handle using multiple specific images with a directory on the front like: examples/inputs/style.png,examples/inputs/style2.png for the -style image parameter.

I'm not sure how to fix this issue while retaining the desired functionality?

Maybe it could be done in such a way that a user could specify multiple different directories and specific images, in the same command? Like for example: -style image examples/inputs/style.png,examples/inputs/style_dir1/,examples/inputs/style2.png,examples/inputs/style_dir2/. It might also be nice to find a way to add a "/" to the end of a specified directory, if the user forgets to do so.

Second Edit:

This should work, while eliminating the unnecessary is_dir function:

local style_image_list, style_input_sorted
local p = 1
local style_input = params.style_image:split(',')

  while p <= #style_input do
    if not paths.dirp(style_input[p])then
      style_input_sorted = (style_input[p]) 
      p = p + 1
    elseif paths.dirp(style_input[p])then
      style_input_sorted = styleLoader(style_input[p]) 
      p = p + 1
    end
    if p == 2 then
      style_image_list = style_input_sorted
    else
      style_image_list = style_image_list .. "," .. style_input_sorted
    end
  end

  style_image_list = style_image_list:split(',')

You can now specify multiple directories, and specific images, all at once, like for example:

-style_image style_dir1/,style_dir2/style_set1/,style1.png,style_dir3/style2.png

And you don't need to follow a specific order:

-style_image style_dir1/,style1.png,style_dir3/style2.png,style_dir2/style_set1/

The current version: https://gist.github.com/ProGamerGov/86c80c74f5e403748659e013447b2499

@VaKonS

This comment has been minimized.

Copy link

VaKonS commented Sep 30, 2017

@ProGamerGov, I tried something like this – first it attempts to open the parameter as directory, then as file, and if everything failed, simply splits it "as is".
p.s. I'll look what's wrong with tiled neural-style version, thanks for checking.

require "paths"

local cmd = torch.CmdLine()
cmd:option('-style_image', 'examples/inputs/seated-nude.jpg',
           'Style target image/directory.')
local params = cmd:parse(arg)

print( "-params.style_image: \"" .. params.style_image .. "\".")
local style_image_list = paths.dir(params.style_image)
if style_image_list ~= nil then -- directory
  print( "Path mode." )
  if string.sub(params.style_image, -1, -1) == "/" then -- remove trailing slash if present
    params.style_image = string.sub(params.style_image, 1, -2)
  end
  local i = 1
  while i <= #style_image_list do
    local fl = string.lower(style_image_list[i])
    if not string.find(fl, 'jpg$')
       and not string.find(fl, 'jpeg$')
       and not string.find(fl, 'png$')
       and not string.find(fl, 'pgm$')
       and not string.find(fl, 'ppm$') then
      table.remove(style_image_list, i)
    else
      style_image_list[i] = params.style_image .. "/" .. style_image_list[i]
      i = i + 1
    end
  end
  assert(#style_image_list > 0, "All files skipped, style images list is empty. Stop.")
else -- file or list
  local is_file = io.open( params.style_image )
  if is_file ~= nil then -- single file, pass name as given
    io.close(is_file)
    print( "File mode." )
    style_image_list = { params.style_image }
  else -- split as comma separated list, don't check items
    print( "List mode." )
    style_image_list = params.style_image:split(',')
  end
end
print( style_image_list )
@ProGamerGov

This comment has been minimized.

Copy link
Author

ProGamerGov commented Oct 2, 2017

@VaKonS Thanks for sharing a more refined version of the directory/image handling code!

You should also consider enabling the issues option on your tiled version of neural_style.lua, so that issues can be reported directly on the project page. Things like saving iterations and initialization images don't work, and it seems odd that the size of the content image dictates the final end size of the tiled output.

Despite the issues, it's really promising! I assume it works similar to crowsonkb's style_transfer?

VaKonS added a commit to VaKonS/neural-style that referenced this issue Oct 7, 2017

@VaKonS

This comment has been minimized.

Copy link

VaKonS commented Oct 7, 2017

@ProGamerGov, yes, it's something like @crowsonkb's or maybe @mtyka's style transfer, but with simple fragments overlaying, without modifications of optimizers.

It's strange that style scaling doesn't work for you, it works here. Maybe I forgot to put some type conversions, could you please check it with "-gpu -1 -backend nn" options?
Also, the initialization image seems to work for me – it adopts features from both content and style, yet still noticeable in the result:

cc5sb100ie

Iterations saving is a limitation at the moment, because I can not make stylization work with 1 iteration precision.
You can set "-save_iter" to 1, then it will save every row of processed fragments (instead of every iteration).
p.s. You're welcome to open issues. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment