Changes made to the example to make it more consistent with documented usage of VGG16 weights. Detailed discussion can be found at #3340 . The following changes are made to images before passing through VGG,
Are there any good ways to improve the performance of this? I was playing today with moving this over to use VGG19 but have not been successful yet. I find that the loss value improvement flattens out pretty quickly, with very little improvement after about the 60th epoch, and the image is pretty much as good as its going to get by about 20 iterations. Do larger resolutions help vs. the 400x400 default? Is the only option really to go back and train a larger VGG16 or 19 cov-net on Imagenet with more parameters?, and perhaps more dropout?
Hi @kfinla I haven't tried VGG19. But my experience was the different loss weights can be played with to generate some different results. You may also want to check work by @titu1994 for other parameters.
@kfinla I have tested the code in my project on over a 100 images, while altering various parameters, models (VGG-16 / 19) and even switched between Network.py (replica of the neural_style_transfer.py) and INetwork.py (same as above, with 4 of the major improvements suggested in paper Improving the Neural Algorithm of Artistic Style.
Some insights I have derived from these experiments:
In the case of average pooling, loss value drops drastically over the first 15 epochs, and then starts flattening out over the next 20. After 35-45 epochs, loss value decreases in only the last 4 digits. The actual loss value if often less 1/10th or even 1/25th of the loss value when using max pooling.
In the case of max pooling, the style is applied more gradually, and often more accurately. The loss value drops drastically over the first 22-25 epochs, and then flattens out over the next 35-45 epochs. After roughly 65 epochs, loss value decreases only in the last 4 digits. However the actual loss value is far higher than the loss value obtained from average pooling, even though the results are often better for max pooling.
content weight = 0.025
style weight = 1
content : style weight ratio
1 : 0.1
1 : 0.05
1 : 0.01
As a side note, I have also tried using the style image as the initial image, but have found that while the end result is similar, it is not the same, and has a slower convergence rate than using content image as the initial image.
Sorry for the long post. Hope it helps you in obtaining slightly better images from the script.
Thank you guys, this is all super useful and interesting information. Long post appreciated! I did a 500 epoch test with a set of images I had decent results with, with slightly modified weights based on my own limited observations and did get an improvement. I am trying now using some of the suggested weights ratio's and using conv5_2. Again thanks.
So, playing some more. I need to study your INetwok.py script, I augmenting my script, basically the origional Keras style-transfer script to use the conv 5_2 but the results never seem to converge, or extremely slowly after 500 epochs. I see you have a "style scale" which is missing in mine.
I started to play with your (Titu1994's) INework.py and Network.py scripts but they both hang for me at ("Start of iteration, 1"). I tried commenting out the theano.config lines at the top because my gpu always says cuDNN not available, but still no luck. Any ideas? I have been running it like the original script .. script location, content image, style image, image result location. I get no errors, it just doesn't complete iteration 1 unless it is several orders of magnitude slower than the original script. PS. What is the "windows" helper script about? I am trying this on windows 8.1 .. i might try on OSX without gpu to see if i can isolate the issue there. Thanks for any help.
@kfinla Both Network.py and INetwork.py offer the best image far more quickly due to use of the conv5_2 layer as the content layer. I have seen very little gains on either script with over 200 epochs. 500 or 1000 is definitely overkill.
Style scale simply multiplies the original style loss value by a float value. It simply strengthens or weakens the style strength. There is no need for it if you simply adjust the style weight manually.
The Theano.config lines can be safely removed, since they are there to enforce default behaviour. I noticed that the script crashed when using time_once in the dnn.fwd_algo flag.
Cudnn is definitely very important in this script, since it has faster convnet implementations. Especially CUDNN 5 is very useful for VGG like nets since it is a few times faster for stacks of CNNs.
Network.py performs exactly at the same speed as the neural_style_transfer.py script, since it is in essence just exposing the variables in the script. INetwork.py is 1-2 seconds slower than the Network.py script, since it uses a lot more information from the VGG network to deliver better results.
In fact, this script requires enormous amounts of time on CPU for every iteration. Definitely GPU with CuDNN is recommended.
Windows helper folder is not a script actually. It is a program written in C# using Windows Forms to execute the Python scripts directly from a program. Advantages are that it is extremely easy to alter all the parameters of the script and achieve different results. If you instead prefer the terminal anyway, it can also copy the argument list onto the windows clipboard so you can simply paste the argument list.
So I did some more playing. I tried with the CPU on Keras 1.0.6 on my MacPro (6 core xeon) and it takes about 2047 seconds per iteration, though even after 1 iteration it looks pretty nice. Then I tried again on my old Dell with the very old stock GPU Nvida GT 635 but no cuDNN available and it took 1150 seconds per iteration. So I needed to be more patient. The older 4_2 setups take about 48 seconds per iteration on the same GPU. So I need a better graphics card which I know, but I also need to continue to try and get cuDNN support working. As I was under the impression it will cut that time in half.
BTW the results with the INetwork script is very exciting, and awesome to have a decent resolution as the result.
@kfinla I am surprised to see such a vast difference between the time taken for conv4_2 and conv5_2 on the same gpu.
On my gpu, with CuDNN, both conv4_2 and cokv5_2 require exactly the same amount of time. Perhaps CuDNN is the deciding factor here ? I will try and disable CuDNN to see if I can replicate this issue.
CuDNN support is extremely usefull for VGG network, as in version 5 they say that it gives a speed up of 2.5x compared to earlier CuDNN, which was already faster to begin with.
As to the INetwork script, I to have found that with the additional time that it takes, it provides a far higher resolution image even through gram matrix size is still 400 x 400. I've implemented only 4 of the major improvements listed in the paper, and am still working on getting MRF and Analogy loss incorporated.
It seems adding MRF loss drastically undermines the results obtained from the INetwork script. Most probably my implementation is wrong. I will keep working on it to see if all 3 can work in tandem.
So I got cuDNN finally working, and v5 no less. Theano and Windows is a dark art, 14th times a charm I guess. So I have seen a significant speed up. With INetwork From 1100 seconds down to 62 seconds. And using the older 4_2 setup from 48 seconds UP to about 50s. So there is clearly massive cudnn optimizations in the newer INetwork script. Once I get a gtx 1060 I should probably be able to enable CNMeM properly too.
@kfinla Sorry for the late response, I was travelling. Your results match mine, without cuDNN, the INetwork.py script requires 700+ seconds per epoch on my laptop. However, I did not see an increase in time required for the older 4_2 setup.
The reason for this huge speed difference is blatantly obvious. The INetwork script uses all layers of the VGG network to compute style loss. On top of that, it computes the difference between adjacent layers, thus adding even more execution time to the already large VGG network. cuDNN is pretty much a requirement for the INetwork script.
In any case, I have updated the scripts recently to speed them up even more. They no longer use the various ZeroPadding layers, and now there is no longer a requirement to download the full 500+ MB weights. The network will automatically download a subset of the weights for just the Convolutional layers (~57 MB) and use them instead. This requires at least Keras 1.0.7 since it uses a few new additions to the keras util scripts.
With cuDNN, on a 980M GPU, the Network.py script requires just 6-8 seconds per epoch, and the INetwork.py script needs just 10-11 seconds per epoch.
@titu1994 No worries. I have a GTX 1060 now, so the INetwork script is often under 7 seconds to process. I am sure CNMeM helps. I am curious how adding MRF is going? I see there is a MRFNetwork.py script now (or I just had not noticed it) I will try out. Have you noticed any reduction in quality between the new versions of the scripts using full VGG weights and the smaller subset weights? At this point I am really focussed on the final result quality not so much speed. I have been doing some side by sides comparing the Prisma app with the same image inputs with INetwork.
*PS. as a side note, do you know if cuDNN 5.1 works with theano? I have 0.9.0 dev2 and keras 1.0.6 at the moment along with CUDA 8.0 RC v5.0 (5005). I have not tried 5.1 just because getting everything to play nice on WIN (10 now) is a chore.
@kfinla Sadly I had to pause working on the MRF network for the time being, since it is producing useless results and requiring exorbitant amount of time, both to develop and to execute. I'm quite sure my implementation is wrong somewhere. MRFNetwork.py is currently non functional, and I should note that somewhere. I welcome any contributions to implementing MRF or Analogy loss.
The subset weights are in fact the entire Convolutional layer weights, which total around 57MB. The remainder of the 550+ MB file is in fact for the useless Fully Connected Dense Layers which were unused. So I simply extracted the conv weights and eliminated the ZeroPadding layers to obtain a small speedup. The results will be exactly the same as if they were with the entire weights.
The results for Prisma and this network will probably never match, for the simple reason that they are using pre trained Texture Networks, possibly similar to the ones from the paper Perceptual Losses for Real-Time Style Transfer and Super-Resolution. They require just one forward pass through the VGG network to create a stylized image. The obvious drawback is that for each style, you must train a new Neural network for roughly 4-5 hours on the MS COCO dataset using that style image and that random styles images cannot be used. For an implementation, look to chainer fast neural style.
This network on the other hand requires no pertaining of the specific style, but requires exorbitant amount of time in forward and backward passes through the VGG network. Benefits include control over how strongly the style is applied and the fact that any image can become a style. The added benefit is that the generated image can then be passed through an external Image Super resolution implementation to scale the image to any size with near lossless results.
In fact I recently added the parameters to perform style transfer without transferring color. It seems to work well and produces good images in the same color as the content image without destroying the content. I still want to attempt to implement Style transfer to Videos as well, but my GPU is already looking dated.. If only I could switch GPUs on a laptop, but that's a pipedream for now.
I myself am currently using cuDNN 5.1 (5103) on Windows 10 with CUDA 7.5, Theano 0.9.0 dev2 and bleeding edge Keras from github (BatchNorm had a small bug that was causing nan's but was fixed post release of 1.0.7). Feel free to update since 5.1 has better support for the Pascal Architecture of the 1060-1080 GPUs.
@titu1994 Thanks for the info and the update. I got cuDNN 5.1 (5105) going with WIN 10, CUDA 8.0 RC, and Theano 0.90 dev2, and updated to Keras 1.0.7. On a Cifar-10 test I'm only seeing about a 7% speedup.. vs cudnn 5004. Not the 2.7x I read about. Maybe if I turned off all the image augmentation I'd see a more noticeable gain. Or maybe my CNN's are not the right kind. Anyways, thanks for the background on Prisma and the latest with the INetwork. I will certainly look up that paper.
@kfinla The 2.7x speed up is mainly for VGG networks which contain multiple stacks of Conv layers with 3x3 pool sizes. So other architectures may not see that great a performance improvement.