-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing features from the "Controlling Perceptual Factors in Neural Style Transfer" research paper #376
Comments
Ok, I think I have gotten the new -reflectance parameter working, though I don't know what it does: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua Though it seems to alter the output. |
Multires without Multires with The Content image: https://i.imgur.com/sgLtFDi.png Style image: https://i.imgur.com/PsXIJLM.jpg |
It seems to me that your code inserts the new padding layer after the convolution layer which already has done padding, so that padding is done twice (first with zeroes in nn.SpatialConvolution and the by reflection in nn.SpatialReflectionPadding). It is like first adding an empty border and the another one which acts as if a mirror. It would seem to me that the mirror then only reflects the empty border that was added first. If you look closely at Gatys' code in https://github.com/leongatys/NeuralImageSynthesis/blob/master/ImageSynthesis.lua#L85-L94 you'll notice that the new padding layer is inserted first, and then the convolution layer without padding. Your code also increases the size of the layer output, as padding is done twice, which might give size mismatch errors. |
In my previous comment, I overlooked the fact that it is possible to change the layer parameters after the layer has been added to the model. Thus the lines https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L140-L141 in fact remove the padding from the already inserted convolution layer, so the double padding does not happen and the size of the output is not changed. Thus the main difference between your code and Gatys' is that you do padding after the convolution, while the normal practice is to do padding before convolution. |
So the reflectance padding works correctly, though I have placed it in the wrong location? This code here is the convolution: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L131-L142 ? |
And for implementing the masks, Gatys' implementation uses hdf5 files, though Neural-Style does not:
I have been trying to figure out how to modify the above code for Neural-Style masks, but non of my attempts to replace the hdf5 requirement have worked thus far. Any ideas? |
The code you now linked looks better, now the padding is inserted (line #127) before the convolution (line #141). Most of what you have highlighted is NOT the convolution but related to selecting between max and avg pooling. But if you follow the if logic, if the layer is convolution it will be inserted to the model in line 141 of your present code. I cannot guarantee that it now works but now the padding and convolution come in the correct order. |
"I have been trying to figure out how to modify the above code for Neural-Style masks, but non of my attempts to replace the hdf5 requirement have worked thus far. Any ideas?" The code you cited does not implement any mask functionality, it only loads a mask from an existing hdf5 file. |
I ran a quick test with the On the left is the control test with Direct link to the comparison: https://i.imgur.com/YGCOCiu.png False: https://i.imgur.com/0oQNsxl.png True: https://i.imgur.com/a7fQTLb.png Command used:
|
Are Gatys' Grad related functions different that Neural-Styles? I'm looking for where the style masks come into play. Or should I be looking at different functions for implementing these features like masks? |
From what I can see, luminescence style transfer requires the LUV color space, which unlike YUV, it has no easy to use function in the Style masks seem to require a modifying deeper levels of the Neural-Style code. For the independent style_scale control with multiple style images, it seems like we only need a way to disable content loss: From the research paper: And then a simple sh script similar to multires.sh should do the trick. That runs your style images through Neural-Style first should do the trick, but such a script needs a way to disable content loss. I am thinking that a parameter like:
@htoyryla Which part of the content loss code should this be implemented on to achieve the desired effect? https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L461-L497 Or: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L109 Edit: I figured it out and now the content loss module can be disabled. Currently testing different parameters alongside the new I edited this part of the neural_style.lua script: https://gist.github.com/ProGamerGov/7f3d2b6656e02a7a4a23071bd0999b31#file-neural_style-lua-L148-L151 Though I think that I need to find a way to transfer the color from the intended content image, to this first Neural-Style run with the two style images. Seeing as Second Edit: It seems that |
I must be missing something when trying to replicate the "Naive scale combination" from here: https://github.com/leongatys/NeuralImageSynthesis/blob/master/ExampleNotebooks/ScaleControl.ipynb Following the steps on the research paper: Should result in something like this output that I made running Gatys' iPython code: https://i.imgur.com/boz8PhW.jpg And the styled style image from his code: https://i.imgur.com/6xEumk0.jpg But instead I get this: The styled style image: https://i.imgur.com/30HUeOH.png And here is the final output: https://i.imgur.com/SWhzMn0.png I tried this code to create the styled style image: https://gist.github.com/ProGamerGov/53979447d09fe6098d4b00fc8e924109 And then ran:
The final content image: https://raw.githubusercontent.com/leongatys/NeuralImageSynthesis/master/Images/ControlPaper/fig4_content.jpg The two style images: What am I doing wrong here? |
Ok, so analyzing the styled style image from Gatys' code: The outputs have the parameters used, and the values used, in the name:
I think was used to make this: https://i.imgur.com/6xEumk0.jpg From another experiment using his code:
The enlarged version (I think 1 step multires?):
And:
The layers used are: Style weight is: Content weight is: The Normalized VGG-19 model is used: Not sure what this is: Naive Scale mix is the best version, and also the styled style image: Not sure if |
On the subject of Gram Matrices (Leon Gatys said this would be important for transferring features to Neural-Style):
|
I am not familiar with Gatys's code, but what you wrote is confusing. First you say that Neural_style divides the Gram matrix by the number of features, but in your example you don't do this division. If Gatys' normalizes by 1/C^2 where C is the number of features, it makes sense to me as the size of the Gram matrix is CxC. In neural_style, the gram matrix is normalized for style loss in the line https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L534 Dividing instead by self.G:nElements() would implement division by C^2 so if that's what you want, try it. I don't know if this use of input:nElement() instead of self.G:nElements() here is intentional or an accident. @jcjohnson ? There has been an earlier discussion about this division but there was nothing on this in particular: #90 PS. I checked the corresponding code in fast-neural-style https://github.com/jcjohnson/fast-neural-style/blob/master/fast_neural_style/GramMatrix.lua#L46-L49 which also normalizes the Gram matrix by 1/(CHW), so I guess this is done on purpose. After all, normalizing by 1/C^2 would favor the lower layers too much. |
As padding only means adding a few pixels around the image I wouldn't expect large changes. Mostly this should be visible close to the edges, and indeed there appears to be a difference along the left hand side. |
Changing line https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L534 to divide by self.G:nElement(), I ran neural-style with defaults and got this. whereas with the original the resulting image was Now, they are obviously different but as the style weight has been effectively increased, we should not read too much into this difference. Anyway, this is worth more testing and the idea of normalizing this way makes intuitively sense to me. |
Concerning YUV... I was under the impression that Y is the luminance. When you want to disable content_loss, why not simply set content_weight to 0? |
It looks like the 1/C^2 style normalization favors the lowest layers which have smaller C (64 for conv1 as opposed to 512 for conv5). The original neural-style behavior 1/(CxHxW) penalizes less the higher layers because H and W decrease when going to higher layers. |
I will try that as well later today. I think my settings from before were to different from Gatys' settings. The other issue is that I think transferring the color from a third image, might be needed, as I would imagine that Gatys' would have used something similar to |
I think I figure out the style combination: The styled style image: https://i.imgur.com/G1eZerW.png This was used to produce the final image:
And this was used to produce the styled style image:
I wonder if something similar could be accomplished by being able to control the layers each style image uses? I am unable to produce a larger version like Gatys was able to do, Any larger images seem to be blurry, and the shapes begin to fade. The darkness of Seated Nude seems to make this harder as the dark areas seem to take over areas on the new style image in my experiments. |
A note on 1/C^2 gram matrix normalization: this line also needs to be changed https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L553 so that the backward pass too will use the normalized matrix. This will require quite different weights, like content_weight 1e3 and style_weight 1, it can take some 300 iterations before the image starts really to develop, but to me the results look good. I am talking about plain neural_style with modifed Gram matrix normalization. Haven't really looked deeper into the Gatys project. |
ProGamerGov, just a little suggestion: since GPU handling is already implemented in "function setup_gpu(params)" (line 324), maybe it's possible to use that function instead of new " It could make the code more maintainable – in case of any changes someone will have to modify only one function instead of two. For example: Currently I can not test it on GPU, but I can confirm that it does work on CPU. |
I'll take a look. I originally pasted in Gatys GPU handling code at the time because I couldn't get the reflection function to work with this line of code:
As I couldn't figure out how to use function Are you saying to change this line: to this:
And then delete this line:
? |
@ProGamerGov, yes. |
@VaKonS , I made a version that contains other padding types: https://gist.github.com/ProGamerGov/0e7523e221935442a6a899bdfee033a8 When using Edit: Modified version with htoyryla's suggestions: https://gist.github.com/ProGamerGov/5b9c9f133cfb14cf926ca7b580ea3cc8 The modified version only has two 3 options, |
Types 'reflect' and 'replication' make sense, although with the typical padding width = 1 as in VGG19 the result is identical. Type 'zero' is superfluous as the convolution layer already pads with zeroes. Type 'pad' only pads in one dimension so it hardly makes sense. You should read nn documentation when using the nn layers. The nn.Spatial.... layers are meant to work with two-dimensional data like images. nn.Padding provides a lower level access for padding of tensors, you need to specify which dimension, which side, which value, and if one wants to use it to pad an image one needs to apply it several times with different settings. But frankly, with the 1-pixel padding in VGG there are not so many ways to pad. We should also remember that the main reason for padding in the convolution layers is to get the correct output size. Without padding convolution tends to shrink the size. |
The code could also be structured like this (to avoid duplicating code and making the same checks several times). Here I used 'reflect' and 'replicate' as they are shorter, you may prefer 'replication' and 'reflection' as in the layer names. But having one as a verb and the other as a noun is maybe not a good idea.
|
@htoyryla, reflective padding probably takes pixels starting from 1 pixel distance: [ x-2, x-1, x ] [ x-1, x-2 ]. |
I made a test to use only two channels per layer, zeroing all the others. This did not work well, style losses were very small, style did not have much effect and increasing style weight did not really help. This worked better. Instead of zeroing channels, multiply them by e.g. 0.2. Then multiply the rest of the channels by a factor greater than 1. In effect we are now applying channel specific weights. I am now using a modified input for style only; for hallusinations one might try the same method for content, too.
By using this on relu3_1,relu4_1 with style weight 1e6 I get this. Having
and again relu3_1,relu4_1 with style weight 2e4 gives this. This means we can achieve different results by selectively emphasize specific channels. My interest is in style, so I have been tampering with how the channels influence the style generation. For hallusinative dreaming, doing the same with content loss might give results? |
@VaKonS wrote I have not looked into this new code of @ProGamerGov's . But style loss modules do have to adapt to the input size (meaning the size of the input to the style loss module) because it depends not only on the size of the input image but also on the layer to which the module is attached. Also, as I noticed today when working on the code posted just above, the first time StyleLoss:updateOutput gets called before capture mode has been set, and the input has different dimensions than on all later occasions. Probably as a result of this line https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L218. So the style loss module absolutely has to take care of adapting to the size of input, or otherwise a size mismatch will occur. Despite this, I don't immediately see how maskedStyleLoss is intended to work either. |
I put together a script to allow experimenting giving emphasis (*5) to specific channels on a style layer. Other channels will be attenuated (*0.2). Use param -style_channels (e.g.-style_layers 3_1 -style_channels 17,23). Note that this is likely to reduce average output from the layers, so style_weight may have to be adjusted. https://gist.github.com/htoyryla/fe6dd64611638b3db89755d68735d3e3 |
I assume these are the are the correct amount of channels for each layer on the default VGG-19 model?
Neural-Style Console Output:
I also have found this site helpful for generating large ranges of sequential numbers, for testing: http://textmechanic.com/text-tools/numeration-tools/generate-list-numbers/ |
The conv layers look ok to me. FC layers do not have channels (in the sense of a HxW map corresponding to an image), only single numeric outputs correlating with features. Yet, masking unwanted outputs may also give interesting results. |
Speaking of multi-resolution: it looks like transferring style at lower resolution keeps more complex features of style (for example below, not simply shapes of plasticine pieces, but whole flowers). The second good thing here is that reprocessing at full resolution preserves almost all features of low resolution pass. |
"it looks like transferring style at lower resolution keeps more complex features of style (for example below, not simply shapes of plasticine pieces, but whole flowers)" Sounds like the basic scalability problem in neural-style transfer, the result changes as with higher resolution each neuron sees sees only a smaller part of the image. It has been discussed a lot and multi-resolution processing appears to solve it. Slightly off topic: three of my neurally assisted works are just now on display at Art Fair Suomi, a large contemporary art event in Helsinki: https://twitter.com/htoyryla/status/867460046665523200 |
Congrats on the Art Fair. Please tell us how they were received, very
curious what the general public think.
…On Sun, Jun 4, 2017 at 8:52 PM, ProGamerGov ***@***.***> wrote:
@htoyryla <https://github.com/htoyryla> Congratulations on getting your
artwork in an art fair!
Do you know of any neural_style.lua experiments involving a denoiser?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#376 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHAt1TCq0ERB7CB008RgJuvklxunTZBDks5sAwsWgaJpZM4L94hh>
.
|
"Congrats on the Art Fair. Please tell us how they were received, very It was a big event, 300 artists, altogether around 1000 widely different works of contemporary art. Only very few artists received individual attention. As far as I know, my works were the only ones made using neural techniques, but this fact was not advertised. For me the main thing was that my works fit well together with the rest (which was really varied). Also, when talking with other artists, they showed a clear interest in the methods I had used. I guess that is because most artists who participated somehow work with alternative or experimental forms of art. |
Btw, @ProGamerGov, just wanted to thank you for visual examples for different models on Wiki page. @htoyryla, maybe you'll find it interesting: Network Dissection project has Places model, trained at different iterations – from 1 to 2400818. |
@VaKonS thanks for the link to the Network Dissection site. Also their findings about channels that respond to specific features could be useful, now that I have a version of neural-style which can use speficic channels only. BTW the places model which is available from multiple iterations is not the VGG16 based places model that I have used, but a smaller one with only 5 conv layers. |
@VaKonS I was able to influence a model's style transfer ability with finetuning here: #292 (comment) I provided a saved model at every 1000 iterations, so that one could see the change over time caused by the fine-tuning. I noticed that on some of the channel examples like this, and this, from my Protobuf-Dreamer project, seem to have "dead channels". These dead channels don't cause any visible hallucinations. So I was wondering if we would find similar "dead channels" by creating hallucinations of different channels from Neural-Style models? And if we can, would style transfer be improved by removing these "dead channels", or would it be improved by only using the "dead channels"? I have also figured out that one can use a model's image classification abilities to identify what each channel hallucination contains (like dogs, plants, cars, etc...). If we used the model to attach the applicable category names to each channel, and then had some kind of search/selection function, could we create better stylized outputs? Like for example, I would select the "dog" related channels, and potentially some other channels (for artistic effects), to stylize a dog content image. |
@htoyryla @VaKonS Have either of you ever come across shapes that resemble animals, or the classic heart shape? And these shapes did not appear to be in any content or style images that you used?
Are these shapes and objects produced by the model being as some kind of error or hallucination? Is it just my brain's pattern/shape recognition abilities being tricked? Or is the geometry of these shapes/objects simple and common enough that they can show up in a combination that tricks the brain? |
@htoyryla @jcjohnson A related thing I experienced was that a classification neural network thought this image was a picture of a dog: I know dogs were obviously part of the training data used to create the model, so I am wondering if there are dog related shapes hidden in the image. If this is the case, then it seems like Neural-Style might be creating Adversarial Example Images, which are meant to trick classification networks into making incorrect predictions, with details that are normally hidden to the naked eye. |
@ProGamerGov, regarding the "dog": maybe this "torch-visbox" project could help to identify a part, sensed a "dog" there (together with @htoyryla's "convis"). I didn't try any of them, though. |
Could anyone point me in the right direction for making the |
@ProGamerGov, you can try this.
Print operators are just for check and can be removed, of course):
2nd step will pass the directory listing, and 1st step will convert it to a comma-separated list. |
@VaKonS Thanks, I'll try that out. Experiments involving hundreds, or even thousands of style images at once can become really tedious when they have to be added via the normal way. |
@ProGamerGov, speaking of "hundreds, or even thousands" – I didn't try that many images with this method (3 works :) ). There is a parameter --contentBatch for passing a directory, then files other than images are sorted out in "ImageLoader.lua", and as a result you should have a table "files" with images list (it should be similar to "style_image_list", I guess). This should work with large number of images. |
@htoyryla Have you done any farther experimentation with neural-channels.lua? And if so, what are your findings? |
@ProGamerGov No, I haven't really tested or used it any further. Could be interesting, but there is so much else to do. |
@VaKonS I tried your version of neural_style.lua that's been modified to have built in tiling, but it didn't like me using a |
@VaKonS Thanks for the link to the lua project with batch image collection. I think that using the code below:
I could simply create a string that I'd also have to use the Torch Paths library via: I also wonder if I can do things in such a way that I only need the existing Edit: This seems to work:
These are the functions used:
I found the list of image formats that the Torch Image package accepts, here The gist file for the above code: https://gist.github.com/ProGamerGov/86c80c74f5e403748659e013447b2499/6de58d6bde6f0fb9ff9b9e20d2ea4294b7d030be Lines 69-74 and 426-455, implement the new code. There are some errors when trying to use the
It appears that my way of checking for a directory or a specific image, can't handle using multiple specific images with a directory on the front like: I'm not sure how to fix this issue while retaining the desired functionality? Maybe it could be done in such a way that a user could specify multiple different directories and specific images, in the same command? Like for example: Second Edit: This should work, while eliminating the unnecessary
You can now specify multiple directories, and specific images, all at once, like for example:
And you don't need to follow a specific order:
The current version: https://gist.github.com/ProGamerGov/86c80c74f5e403748659e013447b2499 |
@ProGamerGov, I tried something like this – first it attempts to open the parameter as directory, then as file, and if everything failed, simply splits it "as is". require "paths"
local cmd = torch.CmdLine()
cmd:option('-style_image', 'examples/inputs/seated-nude.jpg',
'Style target image/directory.')
local params = cmd:parse(arg)
print( "-params.style_image: \"" .. params.style_image .. "\".")
local style_image_list = paths.dir(params.style_image)
if style_image_list ~= nil then -- directory
print( "Path mode." )
if string.sub(params.style_image, -1, -1) == "/" then -- remove trailing slash if present
params.style_image = string.sub(params.style_image, 1, -2)
end
local i = 1
while i <= #style_image_list do
local fl = string.lower(style_image_list[i])
if not string.find(fl, 'jpg$')
and not string.find(fl, 'jpeg$')
and not string.find(fl, 'png$')
and not string.find(fl, 'pgm$')
and not string.find(fl, 'ppm$') then
table.remove(style_image_list, i)
else
style_image_list[i] = params.style_image .. "/" .. style_image_list[i]
i = i + 1
end
end
assert(#style_image_list > 0, "All files skipped, style images list is empty. Stop.")
else -- file or list
local is_file = io.open( params.style_image )
if is_file ~= nil then -- single file, pass name as given
io.close(is_file)
print( "File mode." )
style_image_list = { params.style_image }
else -- split as comma separated list, don't check items
print( "List mode." )
style_image_list = params.style_image:split(',')
end
end
print( style_image_list ) |
@VaKonS Thanks for sharing a more refined version of the directory/image handling code! You should also consider enabling the issues option on your tiled version of neural_style.lua, so that issues can be reported directly on the project page. Things like saving iterations and initialization images don't work, and it seems odd that the size of the content image dictates the final end size of the tiled output. Despite the issues, it's really promising! I assume it works similar to crowsonkb's style_transfer? |
@ProGamerGov, yes, it's something like @crowsonkb's or maybe @mtyka's style transfer, but with simple fragments overlaying, without modifications of optimizers. It's strange that style scaling doesn't work for you, it works here. Maybe I forgot to put some type conversions, could you please check it with " Iterations saving is a limitation at the moment, because I can not make stylization work with 1 iteration precision. |
I have been trying to implement the features described in the "Controlling Perceptual Factors in Neural Style Transfer" research paper.
The code that used for the research paper can be found here: https://github.com/leongatys/NeuralImageSynthesis
The code from Leon Gatys' NeuralImageSynthesis is written in Lua, and operated with an iPython notebook interface.
So far, my attempts to transfer the features into Neural-Style have failed. Has anyone else had success in transferring the features?
Looking at the code, I think that:
ImageSynthesis.lua is responsible for the luminescence style transfer.
ComputeActivations.lua and ImageSynthesis.lua are responsible for scale control
ComputeActivations.lua and ImageSynthesis.lua are responsible for spatial control.
In order to make NeuralImageSynthesis alongside your Neural-Style install, you must replace every instance of
/usr/local/torch/install/bin/th
with/home/ubuntu/torch/install/bin/th
. You must also install hdf5 withluarocks install hdf5
, matplotlib withsudo apt-get install python-matplotlib
, skimage withsudo apt-get install python-skimage
, and scipy withsudo pip install scipy
. And of course you need to install and setupjupyter
if you want to use the notebooks.The text was updated successfully, but these errors were encountered: