Where should I start if I want to train a model for usage with Neural-Style? #292

ProGamerGov · 2016-07-24T21:22:22Z

Where should I start if I want to train a model for usage with Neural-Style?

Are Network In Network (NIN) models easier to train than VGG models?

Does anyone know of any guides that cover training a model that is compatible with Neural-Style from start to finish? If not, then what do I need to look for in order to make sure the model I am learning to train is compatible with Neural-Style?

What is the easiest way to train a model for use with neural-style? Are there any AMIs available that will let me start messing around with training right away?

htoyryla · 2016-07-25T06:33:37Z

There are at least two parts to this question:

training a model that works technically in neural-style
creating models which produce adequate quality images using neural-style

One has to start from the technical part. Caffe http://caffe.berkeleyvision.org is a good choice to start with. It is not too difficult to install, no coding is needed to use it and it directly produces caffemodel files. To train a model, one needs

the training and testing datasets, in practice images and a label for each image; these are then converted into a LMDB database
a training prototxt file describing the architecture of the model etc.
a solver configuration file

With these in place, training using caffe will create a model initialized with random weights (according to what is stated in the prototxt file) and start training it using the dataset.

Training a deep network from scratch can be difficult and time-consuming. One might start with a small model first, with only a limited number of convolutional layers, or one might try finetuning an existing model. Finetuning means taking an existing, already trained model and training it further using a different dataset. Like in this example http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html .

Either way, one can without much difficulty create models that work with neural-style, in the sense that the model loads, iterations start and even the losses may start diminishing. The visual results are often a disappointment, however. I have done this several times already, using wikiart, my own photo library and a programmatically created dataset of geometrical images. Nothing really useful yet, but learning all the time.

Some more detailed notes: For VGG networks, it looks like that training prototxt files are not available in the web, but I managed to piece together one that works. Training a VGG network from scratch is not really recommended. From what I have heard, the creators of the model couldn't train the deeper models from scratch, but had to train smaller models first and then add layers for a new training round. But maybe a VGG with only 1st and 2nd conv layers levels as a first try. Or a VGG finetuned on one's own dataset.

ProGamerGov · 2016-07-25T06:41:02Z

I sucessfully trained a model that is similar to NIN but with less layers and produced the following images after training it for 70,000 iterations:

https://imgur.com/a/sYRhV

I used the CIFAR10 data set and this github page along with the supplied scripts in home/ubuntu/caffe/examples/cifar10.

https://gist.github.com/mavenlin/d802a5849de39225bcc6

I am currently wondering if there is a data set of artwork available at the moment that I could use for training?

I found this data set: http://people.bath.ac.uk/hc551/dataset.html but that's it from what I have been able to find thus far for artwork data sets. I was also considering grabbing all the images posted to /r/art/ on Reddit for use in training. Maybe also using my massive collection of styles as well.

htoyryla · 2016-07-25T06:55:37Z

Your results look familiar to me. They can be interesting as such, but if the model does not respond to the different styles, then it is very limited what it can achieve.

I cannot now locate the example from where I obtained the wikiart materials. It was not a caffe example if I remember correctly. More like someone's python project, from which I got a list of wikiart urls with label data. Not all urls worked, but out of those which did I put together an LMDB. I'll look further if I find something.

htoyryla · 2016-07-25T07:28:21Z

Here's one of my results:

Only the colors derive from the style. Changing layers, weights and style image produces a number of variation, but quite limited.

Another model I trained produced mainly clouds or blobs of color:

It seems to me that these limitations derive from a too small dataset and too few training iterations. One needs also to consider the contents of the dataset. Even if the training is successful, the model only learns to recognize such features that stand out in the dataset. To work well, it should recognize the features that are essential in both content and style images. My geometrical shapes dataset resulted in clouds of color, then clearly the model failed to recognize essential features in the images.

I have not used CIFAR10, but I assume that the small size of the images might be a handicap. In another thread here, a hypothesis was raised that a model in neural style works best with images of the size of the training images.

Roaming a bit further, I have recently been interested in unsupervised training, using a model which first crunches the image into a vector (such as FC6 output) and then reconstructs the image using deconvolutional and unpooling layers. With this approach, we don't need labels, as the model will learn by comparing the input and output images.

htoyryla · 2016-07-25T07:33:28Z

The material about finetuning using wikiart can be found here https://computing.ece.vt.edu/~f15ece6504/homework2/ . I see it mainly useful for the image urls and labels, as a basis for making LMDB for caffe. And for neural-style, forget Alexnet, it requires GROUP which is not supported by loadcaffe.

htoyryla · 2016-07-25T07:38:51Z

For anyone who is interested, here's one of my VGG16 train prototxt files. Some configuration will be needed if you want to use it.

name: "VGG_hplaces_16_layers"
layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {
    source: "/home/hannu/caffe/hplaces/hplaces_train_lmdb"
    backend: LMDB
    batch_size: 28
  }
  transform_param {
    crop_size: 224
    #mirror: true
    mean_file: "/home/hannu/caffe/hplaces/hplaces_train_mean.binaryproto"
  }
  include: { phase: TRAIN }
}
layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {

    source: "/home/hannu/caffe/hplaces/hplaces_val_lmdb/"
    backend: LMDB
    batch_size: 10
  }
  transform_param {
    crop_size: 224
    #mirror: false
    mean_file: "/home/hannu/caffe/hplaces/hplaces_val_mean.binaryproto"
  }
  include: { phase: TEST }
}
layers {
  bottom: "data"
  top: "conv1_1"
  name: "conv1_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv1_1"
  top: "conv1_1"
  name: "relu1_1"
  type: RELU
}
layers {
  bottom: "conv1_1"
  top: "conv1_2"
  name: "conv1_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv1_2"
  top: "conv1_2"
  name: "relu1_2"
  type: RELU
}
layers {
  bottom: "conv1_2"
  top: "pool1"
  name: "pool1"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool1"
  top: "conv2_1"
  name: "conv2_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv2_1"
  top: "conv2_1"
  name: "relu2_1"
  type: RELU
}
layers {
  bottom: "conv2_1"
  top: "conv2_2"
  name: "conv2_2"
  type: CONVOLUTION
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv2_2"
  top: "conv2_2"
  name: "relu2_2"
  type: RELU
}
layers {
  bottom: "conv2_2"
  top: "pool2"
  name: "pool2"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool2"
  top: "conv3_1"
  name: "conv3_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv3_1"
  top: "conv3_1"
  name: "relu3_1"
  type: RELU
}
layers {
  bottom: "conv3_1"
  top: "conv3_2"
  name: "conv3_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv3_2"
  top: "conv3_2"
  name: "relu3_2"
  type: RELU
}
layers {
  bottom: "conv3_2"
  top: "conv3_3"
  name: "conv3_3"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv3_3"
  top: "conv3_3"
  name: "relu3_3"
  type: RELU
}
layers {
  bottom: "conv3_3"
  top: "pool3"
  name: "pool3"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool3"
  top: "conv4_1"
  name: "conv4_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv4_1"
  top: "conv4_1"
  name: "relu4_1"
  type: RELU
}
layers {
  bottom: "conv4_1"
  top: "conv4_2"
  name: "conv4_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv4_2"
  top: "conv4_2"
  name: "relu4_2"
  type: RELU
}
layers {
  bottom: "conv4_2"
  top: "conv4_3"
  name: "conv4_3"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv4_3"
  top: "conv4_3"
  name: "relu4_3"
  type: RELU
}
layers {
  bottom: "conv4_3"
  top: "pool4"
  name: "pool4"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool4"
  top: "conv5_1"
  name: "conv5_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv5_1"
  top: "conv5_1"
  name: "relu5_1"
  type: RELU
}
layers {
  bottom: "conv5_1"
  top: "conv5_2"
  name: "conv5_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv5_2"
  top: "conv5_2"
  name: "relu5_2"
  type: RELU
}
layers {
  bottom: "conv5_2"
  top: "conv5_3"
  name: "conv5_3"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv5_3"
  top: "conv5_3"
  name: "relu5_3"
  type: RELU
}
layers {
  bottom: "conv5_3"
  top: "pool5"
  name: "pool5"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  name: "fc6"
  type: INNER_PRODUCT
  bottom: "pool5"
  top: "fc6"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layers {
  name: "relu6"
  type: RELU
  bottom: "fc6"
  top: "fc6"
}
layers {
  name: "drop6"
  type: DROPOUT
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layers {
  name: "fc7"
  type: INNER_PRODUCT
  bottom: "fc6"
  top: "fc7"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layers {
  name: "relu7"
  type: RELU
  bottom: "fc7"
  top: "fc7"
}
layers {
  name: "drop7"
  type: DROPOUT
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}

layers {
  bottom: "fc7"
  top: "fc8_places"
  name: "fc8_places"
  type: INNER_PRODUCT
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 205
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.05
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "fc8_places"
  top: "prob"
  name: "prob"
  type: SOFTMAX
}
layers {
  name: "accuracy"
  type: ACCURACY
  bottom: "fc8_places"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}
layers {
  bottom: "fc8_places"
  bottom: "label"
  name: "loss"
  type: SOFTMAX_LOSS
  include: { phase: TRAIN }
}

You need to change the pointers to your dataset and mean files, as well as the batch sizes maybe. You may also want to comment out the prob layer to have cleaner output using training.

3DTOPO · 2016-07-25T07:52:13Z

If you want a big image set for training, you can download the imagenet database. It is what was used to train the default vgg-19 model.

http://image-net.org

htoyryla · 2016-07-25T08:19:22Z

Imagenet is certainly a good choice if one wants to train with a general image set and has the computing platform for large scale training. I am planning to get another linux machine dedicated for training but for the moment I cannot tie up my linux computer long enough for other than small experiments (which are good for learning anyway).

ProGamerGov · 2016-07-25T21:52:06Z

@htoyryla
As far as I understand, fine-tuning an already trained model means that you can use a smaller data set.

So I have this data set here with art images:

I just posted a few examples but every category seems to have between 50 and 80 images. People-Art has multiple areas such as Annotations and JPEG images where as Photo-Art does not. Would the wiki-art data set be better or would the People-Art/Photo-Art-50 data set be better for training?


People-Art: 

People-Art\Annotations\Academicism\albert-anker_b-ckligumpen-1866.jpg.xml
People-Art\Annotations\Academicism\albert-joseph-moore_amber.jpg.xml

People-Art\JPEGImages\Academicism\albert-anker_b-ckligumpen-1866.jpg
People-Art\JPEGImages\Academicism\albert-joseph-moore_amber.jpg

People-Art\matlab_funcs\demo_show_anno.m
People-Art\matlab_funcs\VOCevaldet_cai.m

People-Art\test.txt
People-Art\train.txt
People-Art\trainval.txt
People-Art\trainval_only_fg_ims.txt
People-Art\val.txt




Photo-Art-50:

Photo-Art-50\016.boom-box\016a_0001.jpg
Photo-Art-50\101.head-phones\101a_0001.jpg
Photo-Art-50\101.head-phones\101a_0002.jpg

And this previously fine tuned model here that already produces good images in neural-style:

https://gist.github.com/jimmie33/509111f8a00a9ece2c3d5dde6a750129#file-readme-md

How would I step by step, convert this data set into the lmdb files and then how would I exactly use your prototxt to train the already made caffemodel? What train.prototxt and solver.txt files do I need and which ones do I modify? What modifications do I make? I have tried modifying ones that were unclear based on the naming, which file I should to replace it. I tried making a NIN model like the one in Neural-Style using the CIFAR10 data set, but it had the exact same amount of layers that my previous CIFAR10 model had and not the same layers as Neural-Style's NIN model has.

I found this fine tuning command on the Berkeley site:

./build/tools/caffe train -solver models/finetune_flickr_style/solver.prototxt -weights models//bvlc_reference_caffenet.caffemodel -gpu 0

I can easily modify the paths and filenames, but is it the right command to use?

With the wiki-art data set, how exactly do I convert it to the lmdb files that I need? This lmdb part is probably the most confusing part of neural networks for me because I have not found any guides that let me make sense of what exactly I have to do.

And @htoyryla , if possible, could you post the lmdb files and mean files you made from the wiki-art data set for me to download?

ProGamerGov · 2016-07-26T00:47:11Z

So I tried to fine-tune the VGG16 SOD model on the CIFAR10 data set, and received the following error:

ubuntu@ip-Address:~/caffe$ ./build/tools/caffe train -solver models/vgg16_finetune/solver.prototxt -weights models/vgg16_finetune/VGG16_SOD_finetune.caffemodel -gpu 0

I0726 00:44:44.228581  1820 layer_factory.hpp:74] Creating layer data
I0726 00:44:44.228623  1820 net.cpp:84] Creating Layer data
I0726 00:44:44.228648  1820 net.cpp:338] data -> data
I0726 00:44:44.228682  1820 net.cpp:338] data -> label
I0726 00:44:44.228709  1820 net.cpp:113] Setting up data
I0726 00:44:44.228801  1820 db.cpp:34] Opened lmdb /home/ubuntu/caffe/examples/cifar10/cifar10_train_lmdb
I0726 00:44:44.228873  1820 data_layer.cpp:67] output data size: 28,3,224,224
I0726 00:44:44.228899  1820 data_transformer.cpp:22] Loading mean file from: /home/ubuntu/caffe/data/cifar10/cifar10_train_mean.binaryproto
I0726 00:44:44.234645  1820 net.cpp:120] Top shape: 28 3 224 224 (4214784)
I0726 00:44:44.234693  1820 net.cpp:120] Top shape: 28 (28)
I0726 00:44:44.234710  1820 layer_factory.hpp:74] Creating layer conv1_1
I0726 00:44:44.234742  1820 net.cpp:84] Creating Layer conv1_1
I0726 00:44:44.234756  1820 net.cpp:380] conv1_1 <- data
I0726 00:44:44.234807  1820 net.cpp:338] conv1_1 -> conv1_1
I0726 00:44:44.234838  1820 net.cpp:113] Setting up conv1_1
F0726 00:44:44.241438  1825 data_transformer.cpp:138] Check failed: height <= datum_height (224 vs. 32)
*** Check failure stack trace: ***
    @     0x7f38355c4daa  (unknown)
    @     0x7f38355c4ce4  (unknown)
    @     0x7f38355c46e6  (unknown)
    @     0x7f38355c7687  (unknown)
    @     0x7f38359303c1  caffe::DataTransformer<>::Transform()
    @     0x7f38359eb4f8  caffe::DataLayer<>::InternalThreadEntry()
    @     0x7f382d2e5a4a  (unknown)
    @     0x7f382b73c182  start_thread
    @     0x7f3834baf47d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
ubuntu@ip-Address:~/caffe$

I was also using this solver.prototxt: https://github.com/ruimashita/caffe-train/blob/master/vgg.solver.prototxt and htoyryla's train_val.prototxt

Same error on the normal VGG-16 model:

ubuntu@ip-Address:~/caffe$ ./build/tools/caffe train -solver models/vgg16/solver.prototxt -weights models/vgg16/VGG_ILSVRC_16_layers.caffemodel -gpu 0

layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss/loss"
}
I0726 00:55:56.447276  1872 layer_factory.hpp:74] Creating layer data
I0726 00:55:56.447317  1872 net.cpp:84] Creating Layer data
I0726 00:55:56.447342  1872 net.cpp:338] data -> data
I0726 00:55:56.447377  1872 net.cpp:338] data -> label
I0726 00:55:56.447404  1872 net.cpp:113] Setting up data
I0726 00:55:56.447495  1872 db.cpp:34] Opened lmdb /home/ubuntu/caffe/examples/cifar10/cifar10_train_lmdb
I0726 00:55:56.447563  1872 data_layer.cpp:67] output data size: 64,3,224,224
I0726 00:55:56.458580  1872 net.cpp:120] Top shape: 64 3 224 224 (9633792)
I0726 00:55:56.458628  1872 net.cpp:120] Top shape: 64 (64)
I0726 00:55:56.458647  1872 layer_factory.hpp:74] Creating layer conv1_1
I0726 00:55:56.458678  1872 net.cpp:84] Creating Layer conv1_1
I0726 00:55:56.458693  1872 net.cpp:380] conv1_1 <- data
I0726 00:55:56.458720  1872 net.cpp:338] conv1_1 -> conv1_1
I0726 00:55:56.458788  1872 net.cpp:113] Setting up conv1_1
F0726 00:55:56.465386  1877 data_transformer.cpp:138] Check failed: height <= datum_height (224 vs. 32)
*** Check failure stack trace: ***
    @     0x7f22574a2daa  (unknown)
    @     0x7f22574a2ce4  (unknown)
    @     0x7f22574a26e6  (unknown)
    @     0x7f22574a5687  (unknown)
    @     0x7f225780e3c1  caffe::DataTransformer<>::Transform()
    @     0x7f22578c94f8  caffe::DataLayer<>::InternalThreadEntry()
    @     0x7f224f1c3a4a  (unknown)
    @     0x7f224d61a182  start_thread
    @     0x7f2256a8d47d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
ubuntu@ip-Address:~/caffe$

ProGamerGov · 2016-07-26T02:02:27Z

I took the Cubo-Futurism jpg files from the people art data set. I then tried and failed to successfully create the val and train lmdb files.

htoyryla · 2016-07-26T04:08:06Z

You get the error because my training VGG16 prototxt (and any imagenet based prototxt) expects 256x256 images (then cropped accroding to the prototxt to 224x224) and CIFAR is 32x32.

Check failed: height <= datum_height (224 vs. 32)

I can help with LMDB and prototxt but for a few days I am terribly busy with other things and mostly not even near a computer.

LMDB is created using a script like in caffe/examples/imagenet/create_imagenet.sh, but the script usually needs to be adjusted for paths etc. It can take some time to get used to it and get everything to match, so that the script finds the train.txt and val.txt files as well as the images referred to in them, the image sizes are correct, then it creates two LMDB files. Then you calculate the mean images based on the LMDBs using caffe/examples/imagenet/make_imagenet_mean.sh (or something like that). Then modify the training prototxt to point to your LMDBs and binaryproto files. And make sure the solver.prototxt points to the correct training prototxt.

The train.txt and val.txt for the LMDB creation contain lines like

path_to_an_image label

where label is an integer from 0 .. number_of_categories-1

The handling of paths can be a bit tricky. They are relative to paths set in create_imagenet.sh, but it took me some time to get the paths right.

This is all I can contribute right now. After a few days I will have better time to respond. I am not sure if I have my wikiart LMDB any more, I have other LMDBs but they are usually quite large files.

PS. See also the caffe imagenet example for the LMDB part (never mind if the page talks about leveldb instead of lmdb, it is an alternative option). http://caffe.berkeleyvision.org/gathered/examples/imagenet.html
You might also try the example as such, then the paths should match readily.

ProGamerGov · 2016-07-27T00:19:44Z

So I have my images at:

/home/ubuntu/caffe/data/People-Art/JPEGImages/Academicism
/home/ubuntu/caffe/data/People-Art/JPEGImages/AnalyticalRealism
/home/ubuntu/caffe/data/People-Art/JPEGImages/ArtDeco
/home/ubuntu/caffe/data/People-Art/JPEGImages/ArtNouveau(Modern)
/home/ubuntu/caffe/data/People-Art/JPEGImages/Biedermeier
/home/ubuntu/caffe/data/People-Art/JPEGImages/cartoon
/home/ubuntu/caffe/data/People-Art/JPEGImages/Classicism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Constructivism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Cubism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Cubo-Futurism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Divisionism
/home/ubuntu/caffe/data/People-Art/JPEGImages/EnvironmentalArt
/home/ubuntu/caffe/data/People-Art/JPEGImages/FantasticRealism
/home/ubuntu/caffe/data/People-Art/JPEGImages/FeministArt
/home/ubuntu/caffe/data/People-Art/JPEGImages/HighRenaissance
/home/ubuntu/caffe/data/People-Art/JPEGImages/Impressionism
/home/ubuntu/caffe/data/People-Art/JPEGImages/InternationalGothic
/home/ubuntu/caffe/data/People-Art/JPEGImages/Japonism
/home/ubuntu/caffe/data/People-Art/JPEGImages/LowbrowArt
/home/ubuntu/caffe/data/People-Art/JPEGImages/MagicRealism
/home/ubuntu/caffe/data/People-Art/JPEGImages/MechanisticCubism

etc...

Full list of the folders containing images and ls of cd People-Art: https://gist.github.com/ProGamerGov/4627306588e9d232aa0431c4e26b9687

Each folder of images has a "gt.txt" file. This is what the gt.txt file looks like:

https://gist.github.com/ProGamerGov/2339b815b9e462cb69cd5bb7d156ee9a

Though I believe this may be part of the Cross-Depiction aspect of the data set.

My train.txt and val.txt at:

/home/ubuntu/caffe/data/People-Art/train.txt 
/home/ubuntu/caffe/data/People-Art/val.txt

train.txt: https://gist.github.com/ProGamerGov/1be5afe398c825cfc3ea119005af71fb
val.txt: https://gist.github.com/ProGamerGov/08b121968b28e9f09ddf3e096f424944

My create_imagenet.sh file: https://gist.github.com/ProGamerGov/5f92bdc8e7d83756268f438cf15261eb

located at: /home/ubuntu/caffe/create_imagenet_2.sh

The prototxt of the model I want to fine tune has crop_size: 224, do I need to make the resize value in my create_imagenet_2.sh script the same value?

RESIZE_HEIGHT=256
RESIZE_WIDTH=256

I then run:

ubuntu@ip-Address:~/caffe$ ./create_imagenet_2.sh

Creating train lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:17:01.271579  2440 convert_imageset.cpp:79] Shuffling data
I0727 00:17:01.660755  2440 convert_imageset.cpp:82] A total of 0 images.
I0727 00:17:01.661175  2440 db.cpp:34] Opened lmdb examples/imagenet/people-art_train_lmdb
Creating val lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:17:01.971226  2451 convert_imageset.cpp:79] Shuffling data
I0727 00:17:02.378626  2451 convert_imageset.cpp:82] A total of 0 images.
I0727 00:17:02.379034  2451 db.cpp:34] Opened lmdb examples/imagenet/people-art_val_lmdb
Done.
ubuntu@ip-Address:~/caffe$

This creates two folders:

/home/ubuntu/caffe/examples/imagenet/people-art_train_lmdb
/home/ubuntu/caffe/examples/imagenet/people-art_val_lmdb

Inside both folders are data.mdb and lock.mdb files. They are all 8 KB each in both folders.

Trying to run the script again results in this:

ubuntu@ip-Address:~/caffe$ ./create_imagenet_2.sh
Creating train lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:19:56.326292  2482 convert_imageset.cpp:79] Shuffling data
I0727 00:19:56.722890  2482 convert_imageset.cpp:82] A total of 0 images.
F0727 00:19:56.723007  2482 db.cpp:27] Check failed: mkdir(source.c_str(), 0744) == 0 (-1 vs. 0) mkdir examples/imagenet/people-art_train_lmdbfailed
*** Check failure stack trace: ***
    @     0x7f5be1af4daa  (unknown)
    @     0x7f5be1af4ce4  (unknown)
    @     0x7f5be1af46e6  (unknown)
    @     0x7f5be1af7687  (unknown)
    @     0x7f5be1e54eee  caffe::db::LMDB::Open()
    @           0x403122  main
    @     0x7f5be0d04ec5  (unknown)
    @           0x403e5c  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
Creating val lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:19:56.955780  2491 convert_imageset.cpp:79] Shuffling data
I0727 00:19:57.348181  2491 convert_imageset.cpp:82] A total of 0 images.
F0727 00:19:57.348299  2491 db.cpp:27] Check failed: mkdir(source.c_str(), 0744) == 0 (-1 vs. 0) mkdir examples/imagenet/people-art_val_lmdbfailed
*** Check failure stack trace: ***
    @     0x7fcbeb0cedaa  (unknown)
    @     0x7fcbeb0cece4  (unknown)
    @     0x7fcbeb0ce6e6  (unknown)
    @     0x7fcbeb0d1687  (unknown)
    @     0x7fcbeb42eeee  caffe::db::LMDB::Open()
    @           0x403122  main
    @     0x7fcbea2deec5  (unknown)
    @           0x403e5c  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
Done.
ubuntu@ip-Address:~/caffe$

This is the readme.txt that came with the data set: https://gist.github.com/ProGamerGov/dfc8652f3db5bc91acdf34ff22c86bd2

I am not exactly sure what is causing my issue, but could it be that the script is not accounting for the structure of my data set?

htoyryla · 2016-07-27T03:16:08Z

You need to put all the information into train.txt and val.txt. That is where caffe expects to find the urls and the labels. Like this:

/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/egon-schiele_seated-girl-1910.jpg 2
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/salvador-dali_still-life-pulpo-y-scorpa.jpg 2
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/orest-kiprensky_young-gardener-1817.jpg 7
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/david-burliuk_in-the-park.jpg 5
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/giovanni-battista-piranesi_vedute-di-roma-30.jpg 4
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/basuki-abdullah_bocah.jpg 6

" A total of 0 images." means that caffe does not find the image files.

Setting the paths in the train.txt versus create_imagenet.sh can be a bit confusing. Unfortunately I don't have the script file for wikiart anymore. But I think what worked for me was to use full path in the train.txt and set the paths in the script as follows:

EXAMPLE=<full path where to place the lmdb> 
DATA=<full path where to find the train.txt and val.txt>
TOOLS=/home/hannu/caffe/build/tools

TRAIN_DATA_ROOT=/  
VAL_DATA_ROOT=/

The root paths are set to / because the train.txt contains full paths. It should also work so that one sets the data root path to directory and has relative urls in the txt files, but I remember having some difficulty with that.

I usually write small python scripts to manipulate or create the txt files in the correct format. For my geometrical shapes test I had image files name rect000001.png, ellipse000001.png and so on, then I wrote a python script like this:

from os import listdir
from os.path import isfile, join

mypath = "/home/hannu/work/Geom/data/train/data/"
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

for file in onlyfiles:
  output = mypath + file
  if "rect" in file:
    output = output + " 0"
  elif "ellipse" in file:
    output = output + " 1"
  elif "triangle" in file:
    output = output + " 2"
  elif "xtrap" in file:
    output = output + " 3"
  elif "ytrap" in file:
    output = output + " 4"
  elif "ashape" in file:
    output = output + " 5"
  elif "lshape" in file:
    output = output + " 6"
  elif "oshape" in file:
    output = output + " 7"
  elif "ushape" in file:
    output = output + " 8"
  elif "vshape" in file:
    output = output + " 9" 
  print output

and run the output into train.txt. Nothing fancy but it worked.

htoyryla · 2016-07-27T03:30:05Z

You might have a problem with your caffe installation, too, as you had this error message:

libdc1394 error: Failed to initialize libdc1394

I haven't seen this. As far as I understand, this library is for FireWire connection which should not be needed. Found this on google https://kradnangel.gitbooks.io/caffe-study-guide/content/caffe_errors.html

ProGamerGov · 2016-07-27T03:31:49Z

I usually write small python scripts to manipulate or create the txt files in the correct format.

https://stackoverflow.com/questions/11003761/notepad-add-to-every-line

I just used this trick to fix my train and val files quickly.

You might have a problem with your caffe installation, too, as you had this error message:

libdc1394 is for video camera usage and not critical to Caffe as far as I understand. I have a few times disabled it and everything still works fine.

htoyryla · 2016-07-27T03:36:48Z

Perhaps you can manage with notepad but for instance for Wikiart, I think I created the txt files from a downloaded csv file which had all the paths and labels but not in the correct format. Also once I needed to change the label numbering starting from zero instead of one.

htoyryla · 2016-07-27T03:43:06Z

One more thing if you are planning to finetune. You should change the dimension of fc8 layer (assuming training a VGG) to match the number of categories in your dataset. Also, change the name of fc8 to something else, so that caffe will not try to initialize the weights from the original caffemodel which would fail because of the size mismatch. It is typical to use a name like fc8-10 if you have ten categories.

Like this in the training prototxt:

layers {
  bottom: "fc7"
  top: "fc8_168"
  name: "fc8_168"
  type: INNER_PRODUCT
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 168
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  name: "accuracy"
  type: ACCURACY
  bottom: "fc8_168"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}
layers {
  bottom: "fc8_168"
  bottom: "label"
  name: "loss"
  type: SOFTMAX_LOSS
  include: { phase: TRAIN }
}

ProGamerGov · 2016-07-27T03:43:46Z

The changes to my create_imagenet_2.sh file, val.txt, train.txt:

https://gist.github.com/ProGamerGov/8267d29262f1bd6570e5918719600695

Still result in the same error.

ProGamerGov · 2016-07-27T03:44:54Z

@htoyryla Thanks, I'll make the modifications to my train_val.prototxt.

htoyryla · 2016-07-27T03:46:27Z

Changing the fc8 layer will not solve the LMDB creation. It is another issue which you'll face once you get the LMDB and start finetuning.

htoyryla · 2016-07-27T03:47:08Z

I still don't see the labels in your train.txt, only the image paths.

ProGamerGov · 2016-07-27T03:58:36Z

For the labels, do I put it as a different number value for each category?

htoyryla · 2016-07-27T04:02:27Z

Yes, the labels should be integers from 0 to number_of_categories - 1 as I wrote earlier.

During training, caffe will feed each image into the model and, as there are outputs for each labels, train the model to activate the correct output for each image. Without the labels, there is nothing to guide the training and the model will not learn anything. Also, if all images have the same label, the model simply learns to always output that label regardless of the image, so it will not learn anything about the images. It is only when the labels tell something essential about the images that meaningful learning is possible.

ProGamerGov · 2016-07-27T04:11:22Z

Ok, I think I got it now. Change the fc8_168 to fc8_43 because I have 43 categories. Then change it to fcpa_43. Even with scripts in Notepad, it will take me a little while to label all the categories. Do I need to do this for both the train and val txt files, or just the one?

htoyryla · 2016-07-27T04:20:16Z

train.txt and val.txt both have to conform to this format. They also should not include same files, as the val.txt is used to crosscheck that the model really learns to generalize and not simply remember the individual images. I usually first make a train.txt containing all images & labels and then use a script to move every tenth entry to val.txt.

I might first make very short txt files to test if the lmdb creation succeeds. There may still be an issue in the create_imagenet.sh, too. I have sometimes struggled with the paths, everything looked ok but 0 images found, until suddenly after changing something back and forth it worked.

htoyryla · 2016-07-27T04:23:15Z

I didn't understand your "Then change it to fcpa_43". It should be enough to change to fc8_43, so that the layer name is not fc8 which is in the caffemodel which you will finetune.

ProGamerGov · 2016-07-27T04:30:52Z

@htoyryla Ok, thanks for the help!

ProGamerGov · 2016-07-27T05:42:59Z

So I successfully create the lmdb files!

https://gist.github.com/ProGamerGov/d0038f7e3186d057bb7b26398bd764f9

It seems that a few of the images listed in the train.txt and val.txt files, did not exist in the actual data set.

htoyryla · 2016-07-27T06:26:07Z

It happened to me too, now that you mention. Many (most?) datasets do not contain the actual images, only links for downloading from the original location. Probably the wikiart urls no longer work for some files, those files don't get downloaded. It is like broken links, not unusual in internet.

ProGamerGov · 2016-10-04T16:48:44Z

@htoyryla

From previous testing, I found this interesting: https://i.imgur.com/XHg8CPA.jpg
It appears that after 200 iterations, the edge detection starts being damaged by the fine-tuning. On all my training experiments, I have seen very similar results. Though usually the iterations where edge detection begins to break down are near the half way point, as opposed to so close to the start.

ProGamerGov · 2016-11-26T22:23:18Z

@htoyryla The difference between "layers" and "layers" is that "layers" is the outdated version of the prototxt. You can use upgrade_net_proto_text to update the prototxt file to the newer version.

cd ~ 

cd caffe

./build/tools/upgrade_net_proto_text vgg16_finetuned_train_val.prototxt vgg16_finetuned_train_val_out.prototxt

./build/tools/upgrade_net_proto_binary VGG16_SOD_finetune.caffemodel VGG16_SOD_finetune_out.caffemodel

ProGamerGov · 2016-11-27T01:41:08Z

I seem to have figured out how to change the output in an neutral manor that only affects the seed value in Neural-Style by fine-tuning the VGG-16 SOD Finetune model. Interestingly enough my data set was composed of art produced by neural networks.

Edit:

On closer inspection, it appears like the differences between the original and the fine-tuned version are in terms of smaller details. I only ran it for 600 iterations as I have to use AWS spot instances for this kind of stuff, but it looks like the newly fine-tuned model version produces more intricate details than the original model.

If I have achieved settings that result in an almost neutral change, then I can now theoretically change single parameters, target layers, etc... to achieve better artistic outputs.

ProGamerGov · 2016-11-27T08:00:28Z

So targeting specific layers seems to produce different output that are not worse than the original model's outputs. Really wish I had the resources to fully flesh this out, as it looks really promising for enhancing Neural-Style's outputs.

I think that by targeting different combinations of the default layers that Neural-Style uses, one can improve the model's ability in specific areas with the proper data set.

ProGamerGov · 2016-11-27T08:12:26Z

This prototxt here has been configured to stop learning on all layers by default: https://gist.github.com/ProGamerGov/1514d74dc6b799389875ce1764c1a12e

I was using the VGG16_SOD_finetune model: https://gist.github.com/jimmie33/509111f8a00a9ece2c3d5dde6a750129

And I ran ./build/tools/upgrade_net_proto_binary VGG16_SOD_finetune.caffemodel VGG16_SOD_finetune_out.caffemodel to convert the model to the latest version of Caffe.

You can allow learning on your layer of choice by changing the following lines of code on the desired layer:

  param {
    lr_mult: 0
    decay_mult: 1
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }

To:

param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }

The learning related values are from this Caffe guide here for training certain layers exclusively: https://github.com/BVLC/caffe/wiki/Fine-Tuning-or-Training-Certain-Layers-Exclusively

Another note is that edge detection abilities of the model do not seem to be positively or negatively impacted by this layer specific training.

I can also provide my two category Deepart.io and Ostagram data set which contains aproximately 3000 images for each of the two categories, if you want.

ProGamerGov · 2016-11-27T22:10:49Z

crowsonkb's style_transfer has an updated Amazon AMI, which has the latest version of Caffe already installed.

ProGamerGov · 2016-11-28T02:40:40Z

It looks like training a specific layer, or the default Neural-Style layers, requires a lot longer training time to notice major differences between the original and fine-tuned model.

Here are the results from some small scale experiments I ran using the newly found neutral training parameters on the upgraded model and protoxt files: https://i.imgur.com/k0jxvtv.png

ProGamerGov · 2016-11-28T19:48:21Z

So, just in case I am making the wrong assumptions, as per the prototxt file and Neural-Style's default layer related settings, the -content_layers and -style_layers map to the following prototxt layer names.

Prototxt	Neural-Style
conv1_1	relu1_1
conv2_1	relu2_1
conv3_1	relu3_1
conv4_1	relu4_1
conv4_2	relu4_2
conv5_1	relu5_1

Or is Neural-Style using the part below each "conv" layer which has "relu" instead of "conv"?

Example of the prototxt layout:

layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1_1"
  type: "ReLU"
  bottom: "conv1_1"
  top: "conv1_1"
}

The prototxt I was using can be found here: https://gist.github.com/ProGamerGov/1514d74dc6b799389875ce1764c1a12e

htoyryla · 2016-11-29T07:12:44Z

I am not fully sure I understand your question. Especially when you say 'is Neural-Style using the part below each "conv" layer which has "relu" instead of "conv"' Below in the sense "below in the prototxt file" or "in a lower layer".

But never mind. ReLU is really nothing more than an add-on function on top of a conv layer which sets all negative values to zero. This is why it is also called a rectifier. So in theory convx_y can output both negative and positive values, but after relux_y all negative values have been replaced by zero.

Furthermore, this discussion #93 hints that in an implementation such as Torch, the ReLU layer is actually perfomed in-place, which I read to mean that the ReLU directly modifies the memory containing the output of the convlayer. If this is true then there is actually no difference whether one uses conv or relu layers in neural_style, the ReLU function is there anyway, even if you access the conv layer.

jcjohnson · 2016-11-29T17:08:43Z

@htoyryla You are correct that ReLU is performed in-place in Torch so after a forward pass it doesn't matter whether you pick a conv layer or its associated ReLU layer; they will both have the same value. However there will be a difference during the backward pass: when you backprop through a ReLU layer, the upstream gradients will be zeroed in the same places the activations were zeroed during the forward pass; if you ask neural-style to work with a conv layer then it will not backprop through the ReLU during the backward pass.

This means that when you ask neural-style to use activations on a conv layer, ReLU gets used during the forward pass but not during the backward pass, so the backward pass will not be correct in this case. You can still get nice style transfer effects even when the gradients are incorrect in this way, but for this reason I'd generally expect better results using ReLU layers.

htoyryla · 2016-11-29T17:38:56Z

@jcjohnson, good point, I did not think about the backward pass.

ProGamerGov · 2016-12-14T20:41:55Z

I suspect that image quality affects training accuracy . This research paper seems to show the effects of image quality on training neural network models: "Understanding How Image Quality Affects Deep Neural Networks"

ProGamerGov · 2017-05-08T20:31:51Z

I recently trained a NIN model on a roughly sorted custom data set of about 40,000 faces. There appear to be direct improvements to how the model handles faces in terms of content images. But style images which do not have faces, do not work as well. I think that if one could train the model on artwork, in addition to common content images, it would help the model understand both.

htoyryla · 2017-05-09T05:43:57Z

I have sometimes been thinking about using two models, one for style, one for content, both trained with limited material. Don't know if it would work though, and memory usage would certainly be a problem. Yet it could be an interesting exercise.

ProGamerGov · 2017-05-09T18:58:57Z

@htoyryla That idea could be more resource efficient by using two small NIN like models that are trained on one target category each only.

So it turns out that at least for the NIN model, it still has the knowledge required for style transfer, in addition to the newer face related knowledge that I gave it.

The unmodified NIN model is on the right, and the fine tuned NIN model is on the left:

I used a DeepDream project based on Neural-Style to try and determine why things had changed in the modified NIN model. Below are the DeepDream layer activation tests for all 29 layers used by the NIN model:

The original model:

The modified model:

These DeepDream images helped me figure out that by simply changing the -content_layers and -style_layers, I could utilize the improved facial feature detection abilities of my fine tuned NIN model.

The NIN model itself that I created, had 15700 iterations during training, and seemed to maintain 86-96% accuracy during the last couple thousand iterations. With around 40k training images, I calculated around 24-25 epochs occurred during the training session? I also stopped the training 11600 iterations, in order to lower the learning rate so that the loss would continue going down. I'm not sure if I was over-fitting the model, but it seemed to have improved abilities on an image that it was not apart of the training data set.

After the NIN experiments, I attempted to fine tuned a VGG-16 model on my rough faces data set. It's a lot slower to fine tune VGG-16 models than it is to fine tune NIN models. From iterations 1000 to 8000, it seems that the model is actually improving on it's ability to recognize facial features:

The output from the non fine tuned SOD_FINETUNE model can be found here: https://i.imgur.com/wWtWysT.png

Obviously for my experiments I used the exact same parameters, seed values, etc... to eliminate any other things that might cause different outputs.

An album with the full versions of the images I posted in this comment can be found here:
https://imgur.com/a/njDJ1

Edit:

To clarify, the VGG-16 model that I fine tuned is called the "VGG-16 SOD Finetune" model. The "finetune" in the original model's name is because it was fine tuned for salient object detection from the regular VGG-16 model. I have now fine tuned this previously fine tuned model, with a new data set.

ProGamerGov · 2017-05-10T05:20:27Z

Trying to train a NIN model from scratch with my data set did not work, and only produces blurry style transfer images, and broken DeepDream images. Maybe there are certain classes that help the model learn other classes? Or maybe I just choose bad training parameters?

Edit:

Analyzing the training loss (idk what graphing tool to use), it appears like the NIN model from scratch had the loss decrease quickly, and then stay constant. For the fine-tuned NIN model, the training loss dropped quickly and seems to have very slowly decreased/maybe stayed the same. Though it must have worked better than when I tried to train from scratch, seeing as it does appear to have better facial feature detection abilities.

The fine tuned SOD model has the loss drop continuously over time, which I imagine looks like what one should expect with good training parameters. So I think the results from my fine tuned NIN model are questionable and needs better training parameters, but the VGG-16 SOD model seems to actually be improved in a way that is appropriately reflected in the loss values.

Second Edit:

After some more testing on my fine tuned SOD model, it appears that I may have actually improved the model with very little change to the model's other abilities. It now more accurately deals with faces, and possibly other parts of the human body (upper portion of the body I think).

I wonder if the "roughly sorted" nature of my data set helps the model's new abilities, or weakens the model's new abilities?

ProGamerGov · 2017-05-10T21:48:13Z

The training loss graphs seem to support my results.

The NIN model from scratch is on the left, and the fine tuned NIN model is on the right:

The fine tuned VGG-16 model:

I think using a larger batch size (64 instead of less than 10) compared to earlier experiments, is part of the reason for this recent training success.

ProGamerGov · 2017-05-11T05:21:31Z

I think I might be onto something here as my fine tuned model appears to be better at facial feature preservation:

An album with the full images can be found here: https://imgur.com/a/tArrY

It looks as though my fine tuned model is more accurately detecting the eyes, and mouth of the person in the photo.

The mouth is more "horizontal" in the image produced by my fine tuned model, just like in the original photo, while the unmodified model is curving the mouth in an extreme way.
The chin in my image seems to be more "separated" from the neck and background than the control image. I believe this is from a technique people use to create better looking photographs of themselves via making their chin stand out from their neck more.
The eyes in my fine tuned model are outlined, while the original model does not outline the eyes.
The eyebrows are darker in my image, than in the control image.

The solver.prototxt file and the train_val.prototxt can be found here: https://gist.github.com/ProGamerGov/2bdf7659ee14dac03269a3ec3a7f1fcd

ProGamerGov · 2017-06-18T21:05:19Z

Imagemagick seems to be slow for resizing large data sets of images (especially when using the -resize option), but using parallel like this makes it faster:

parallel -j 8 convert {} -resize '256x256^' -gravity Center {} ::: *.png

parallel -j 8 mogrify {} -format png {} ::: *.jpg

You can get Parallel via sudo apt-get install parallel.

Source: https://stackoverflow.com/questions/26168783/imagemagick-convert-and-gnu-parallel-together

ProGamerGov · 2017-08-23T04:55:40Z

I uploaded the Rough Faces model and added a link to download it on the alternative models wiki page: https://github.com/jcjohnson/neural-style/wiki/Using-Other-Neural-Models

Hopefully it can help those seeking better facial preservation with Neural-Style.

htoyryla · 2017-08-30T07:52:47Z

For me, the Rough Faces model didn't work well. For a faces specific model, I would expect it to have strong activations for features like head/face shape, hair, eyes, nose, mouth etc. Here, it mainly picked up the merengue desert in the background :)

content image

style image

result

ProGamerGov · 2018-01-21T06:23:42Z

I did test your content image with my fine tuned model, and I think the issue may be that the "rough faces" training data was not very diverse and as a result it performs best with certain images. My example image for testing, was also a part of training data, so that may skew the results (though I did test it on other images that I think were not part of the training data).

ProGamerGov · 2018-02-08T06:54:50Z

I was looking through my old experiments, and I see that I didn't seem to actually share the two successfully fine-tuned models that I had created. The one model in particular (The "Plaster" model) creates a very different output than the non fine-tuned version.

Some experimentation with parameters may be required to achieve satisfactory results as like with all the models I trained and fine-tuned, I would only test them in Neural-Style with certain parameter values.

I'm not sure if the "Low Noise" model is actually different than the non fine-tuned model in a way that that's useful for certain styles like the "Plaster" model is, so it can be removed if it's not useful.

I posted the models on the wiki page here: https://github.com/jcjohnson/neural-style/wiki/Using-Other-Neural-Models

Seeing as both models are from 2016, I am going to test them with a bunch of more "modern" Neural-Style parameters, like setting the TV weight to 0, using the Adam parameters I discovered in addition to L-BFGS, and using multiscale resolution.

Nusrat12 · 2018-10-15T05:03:30Z

I want to change iteration numbers.Where I have to change?

ProGamerGov · 2018-10-16T04:56:49Z

@Nusrat12 You control the maximum number of iterations by setting the max_iter: value in your solver.prototxt. You can see an example of it here.

This was referenced Jan 18, 2017

how train vgg16 on my own data jcjohnson/fast-neural-style#95

Open

VGG19 jcjohnson/fast-neural-style#89

Open

This was referenced Jun 21, 2017

Has anyone been able to match Dreamscope results? #257

Open

About people face not good #380

Open

ProGamerGov mentioned this issue Aug 23, 2017

Implementing features from the "Controlling Perceptual Factors in Neural Style Transfer" research paper #376

Open

Where should I start if I want to train a model for usage with Neural-Style? #292

Where should I start if I want to train a model for usage with Neural-Style? #292

Comments

ProGamerGov commented Jul 24, 2016 • edited

htoyryla commented Jul 25, 2016

ProGamerGov commented Jul 25, 2016 • edited

htoyryla commented Jul 25, 2016

htoyryla commented Jul 25, 2016

htoyryla commented Jul 25, 2016

htoyryla commented Jul 25, 2016

3DTOPO commented Jul 25, 2016

htoyryla commented Jul 25, 2016

ProGamerGov commented Jul 25, 2016 • edited

ProGamerGov commented Jul 26, 2016 • edited

ProGamerGov commented Jul 26, 2016

htoyryla commented Jul 26, 2016 • edited

ProGamerGov commented Jul 27, 2016 • edited

htoyryla commented Jul 27, 2016 • edited

htoyryla commented Jul 27, 2016 • edited

ProGamerGov commented Jul 27, 2016 • edited

htoyryla commented Jul 27, 2016

htoyryla commented Jul 27, 2016 • edited

ProGamerGov commented Jul 27, 2016

ProGamerGov commented Jul 27, 2016

htoyryla commented Jul 27, 2016 • edited

htoyryla commented Jul 27, 2016 • edited

ProGamerGov commented Jul 27, 2016

htoyryla commented Jul 27, 2016 • edited

ProGamerGov commented Jul 27, 2016

htoyryla commented Jul 27, 2016 • edited

htoyryla commented Jul 27, 2016 • edited

ProGamerGov commented Jul 27, 2016

ProGamerGov commented Jul 27, 2016

htoyryla commented Jul 27, 2016 • edited

ProGamerGov commented Oct 4, 2016

ProGamerGov commented Nov 26, 2016 • edited

ProGamerGov commented Nov 27, 2016 • edited

ProGamerGov commented Nov 27, 2016 • edited

ProGamerGov commented Nov 27, 2016 • edited

ProGamerGov commented Nov 27, 2016

ProGamerGov commented Nov 28, 2016 • edited

ProGamerGov commented Nov 28, 2016 • edited

htoyryla commented Nov 29, 2016 • edited

jcjohnson commented Nov 29, 2016 • edited

htoyryla commented Nov 29, 2016

ProGamerGov commented Dec 14, 2016

ProGamerGov commented May 8, 2017

htoyryla commented May 9, 2017

ProGamerGov commented May 9, 2017 • edited

ProGamerGov commented May 10, 2017 • edited

ProGamerGov commented May 10, 2017 • edited

ProGamerGov commented May 11, 2017 • edited

ProGamerGov commented Jun 18, 2017

ProGamerGov commented Aug 23, 2017

htoyryla commented Aug 30, 2017

ProGamerGov commented Jan 21, 2018

ProGamerGov commented Feb 8, 2018 • edited

Nusrat12 commented Oct 15, 2018

ProGamerGov commented Oct 16, 2018

ProGamerGov commented Jul 24, 2016 •

edited

ProGamerGov commented Jul 25, 2016 •

edited

ProGamerGov commented Jul 25, 2016 •

edited

ProGamerGov commented Jul 26, 2016 •

edited

htoyryla commented Jul 26, 2016 •

edited

ProGamerGov commented Jul 27, 2016 •

edited

htoyryla commented Jul 27, 2016 •

edited

htoyryla commented Jul 27, 2016 •

edited

ProGamerGov commented Jul 27, 2016 •

edited

htoyryla commented Jul 27, 2016 •

edited

htoyryla commented Jul 27, 2016 •

edited

htoyryla commented Jul 27, 2016 •

edited

htoyryla commented Jul 27, 2016 •

edited

htoyryla commented Jul 27, 2016 •

edited

htoyryla commented Jul 27, 2016 •

edited

htoyryla commented Jul 27, 2016 •

edited

ProGamerGov commented Nov 26, 2016 •

edited

ProGamerGov commented Nov 27, 2016 •

edited

ProGamerGov commented Nov 27, 2016 •

edited

ProGamerGov commented Nov 27, 2016 •

edited

ProGamerGov commented Nov 28, 2016 •

edited

ProGamerGov commented Nov 28, 2016 •

edited

htoyryla commented Nov 29, 2016 •

edited

jcjohnson commented Nov 29, 2016 •

edited

ProGamerGov commented May 9, 2017 •

edited

ProGamerGov commented May 10, 2017 •

edited

ProGamerGov commented May 10, 2017 •

edited

ProGamerGov commented May 11, 2017 •

edited

ProGamerGov commented Feb 8, 2018 •

edited