Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where should I start if I want to train a model for usage with Neural-Style? #292

Open
ProGamerGov opened this issue Jul 24, 2016 · 131 comments

Comments

@ProGamerGov
Copy link

ProGamerGov commented Jul 24, 2016

Where should I start if I want to train a model for usage with Neural-Style?

Are Network In Network (NIN) models easier to train than VGG models?

Does anyone know of any guides that cover training a model that is compatible with Neural-Style from start to finish? If not, then what do I need to look for in order to make sure the model I am learning to train is compatible with Neural-Style?

What is the easiest way to train a model for use with neural-style? Are there any AMIs available that will let me start messing around with training right away?

@htoyryla
Copy link

There are at least two parts to this question:

  • training a model that works technically in neural-style
  • creating models which produce adequate quality images using neural-style

One has to start from the technical part. Caffe http://caffe.berkeleyvision.org is a good choice to start with. It is not too difficult to install, no coding is needed to use it and it directly produces caffemodel files. To train a model, one needs

  • the training and testing datasets, in practice images and a label for each image; these are then converted into a LMDB database
  • a training prototxt file describing the architecture of the model etc.
  • a solver configuration file

With these in place, training using caffe will create a model initialized with random weights (according to what is stated in the prototxt file) and start training it using the dataset.

Training a deep network from scratch can be difficult and time-consuming. One might start with a small model first, with only a limited number of convolutional layers, or one might try finetuning an existing model. Finetuning means taking an existing, already trained model and training it further using a different dataset. Like in this example http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html .

Either way, one can without much difficulty create models that work with neural-style, in the sense that the model loads, iterations start and even the losses may start diminishing. The visual results are often a disappointment, however. I have done this several times already, using wikiart, my own photo library and a programmatically created dataset of geometrical images. Nothing really useful yet, but learning all the time.

Some more detailed notes: For VGG networks, it looks like that training prototxt files are not available in the web, but I managed to piece together one that works. Training a VGG network from scratch is not really recommended. From what I have heard, the creators of the model couldn't train the deeper models from scratch, but had to train smaller models first and then add layers for a new training round. But maybe a VGG with only 1st and 2nd conv layers levels as a first try. Or a VGG finetuned on one's own dataset.

@ProGamerGov
Copy link
Author

ProGamerGov commented Jul 25, 2016

I sucessfully trained a model that is similar to NIN but with less layers and produced the following images after training it for 70,000 iterations:

https://imgur.com/a/sYRhV

I used the CIFAR10 data set and this github page along with the supplied scripts in home/ubuntu/caffe/examples/cifar10.

https://gist.github.com/mavenlin/d802a5849de39225bcc6

I am currently wondering if there is a data set of artwork available at the moment that I could use for training?

I found this data set: http://people.bath.ac.uk/hc551/dataset.html but that's it from what I have been able to find thus far for artwork data sets. I was also considering grabbing all the images posted to /r/art/ on Reddit for use in training. Maybe also using my massive collection of styles as well.

@htoyryla
Copy link

Your results look familiar to me. They can be interesting as such, but if the model does not respond to the different styles, then it is very limited what it can achieve.

I cannot now locate the example from where I obtained the wikiart materials. It was not a caffe example if I remember correctly. More like someone's python project, from which I got a list of wikiart urls with label data. Not all urls worked, but out of those which did I put together an LMDB. I'll look further if I find something.

@htoyryla
Copy link

Here's one of my results:

sh3-i19800-paasikivi-feininger-cl23sl124-cw200sw100_150

Only the colors derive from the style. Changing layers, weights and style image produces a number of variation, but quite limited.

sh3-i12000-paasikivi-kahvila-cl234sl124-cw200sw40000_150

sh3-i12000-paasikivi-feininger-cl234sl124-cw200sw40000_150

Another model I trained produced mainly clouds or blobs of color:

sibir-sh86000g_310

It seems to me that these limitations derive from a too small dataset and too few training iterations. One needs also to consider the contents of the dataset. Even if the training is successful, the model only learns to recognize such features that stand out in the dataset. To work well, it should recognize the features that are essential in both content and style images. My geometrical shapes dataset resulted in clouds of color, then clearly the model failed to recognize essential features in the images.

I have not used CIFAR10, but I assume that the small size of the images might be a handicap. In another thread here, a hypothesis was raised that a model in neural style works best with images of the size of the training images.

Roaming a bit further, I have recently been interested in unsupervised training, using a model which first crunches the image into a vector (such as FC6 output) and then reconstructs the image using deconvolutional and unpooling layers. With this approach, we don't need labels, as the model will learn by comparing the input and output images.

@htoyryla
Copy link

The material about finetuning using wikiart can be found here https://computing.ece.vt.edu/~f15ece6504/homework2/ . I see it mainly useful for the image urls and labels, as a basis for making LMDB for caffe. And for neural-style, forget Alexnet, it requires GROUP which is not supported by loadcaffe.

@htoyryla
Copy link

For anyone who is interested, here's one of my VGG16 train prototxt files. Some configuration will be needed if you want to use it.

name: "VGG_hplaces_16_layers"
layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {
    source: "/home/hannu/caffe/hplaces/hplaces_train_lmdb"
    backend: LMDB
    batch_size: 28
  }
  transform_param {
    crop_size: 224
    #mirror: true
    mean_file: "/home/hannu/caffe/hplaces/hplaces_train_mean.binaryproto"
  }
  include: { phase: TRAIN }
}
layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {

    source: "/home/hannu/caffe/hplaces/hplaces_val_lmdb/"
    backend: LMDB
    batch_size: 10
  }
  transform_param {
    crop_size: 224
    #mirror: false
    mean_file: "/home/hannu/caffe/hplaces/hplaces_val_mean.binaryproto"
  }
  include: { phase: TEST }
}
layers {
  bottom: "data"
  top: "conv1_1"
  name: "conv1_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv1_1"
  top: "conv1_1"
  name: "relu1_1"
  type: RELU
}
layers {
  bottom: "conv1_1"
  top: "conv1_2"
  name: "conv1_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv1_2"
  top: "conv1_2"
  name: "relu1_2"
  type: RELU
}
layers {
  bottom: "conv1_2"
  top: "pool1"
  name: "pool1"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool1"
  top: "conv2_1"
  name: "conv2_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv2_1"
  top: "conv2_1"
  name: "relu2_1"
  type: RELU
}
layers {
  bottom: "conv2_1"
  top: "conv2_2"
  name: "conv2_2"
  type: CONVOLUTION
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv2_2"
  top: "conv2_2"
  name: "relu2_2"
  type: RELU
}
layers {
  bottom: "conv2_2"
  top: "pool2"
  name: "pool2"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool2"
  top: "conv3_1"
  name: "conv3_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv3_1"
  top: "conv3_1"
  name: "relu3_1"
  type: RELU
}
layers {
  bottom: "conv3_1"
  top: "conv3_2"
  name: "conv3_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv3_2"
  top: "conv3_2"
  name: "relu3_2"
  type: RELU
}
layers {
  bottom: "conv3_2"
  top: "conv3_3"
  name: "conv3_3"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv3_3"
  top: "conv3_3"
  name: "relu3_3"
  type: RELU
}
layers {
  bottom: "conv3_3"
  top: "pool3"
  name: "pool3"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool3"
  top: "conv4_1"
  name: "conv4_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv4_1"
  top: "conv4_1"
  name: "relu4_1"
  type: RELU
}
layers {
  bottom: "conv4_1"
  top: "conv4_2"
  name: "conv4_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv4_2"
  top: "conv4_2"
  name: "relu4_2"
  type: RELU
}
layers {
  bottom: "conv4_2"
  top: "conv4_3"
  name: "conv4_3"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv4_3"
  top: "conv4_3"
  name: "relu4_3"
  type: RELU
}
layers {
  bottom: "conv4_3"
  top: "pool4"
  name: "pool4"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool4"
  top: "conv5_1"
  name: "conv5_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv5_1"
  top: "conv5_1"
  name: "relu5_1"
  type: RELU
}
layers {
  bottom: "conv5_1"
  top: "conv5_2"
  name: "conv5_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv5_2"
  top: "conv5_2"
  name: "relu5_2"
  type: RELU
}
layers {
  bottom: "conv5_2"
  top: "conv5_3"
  name: "conv5_3"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv5_3"
  top: "conv5_3"
  name: "relu5_3"
  type: RELU
}
layers {
  bottom: "conv5_3"
  top: "pool5"
  name: "pool5"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  name: "fc6"
  type: INNER_PRODUCT
  bottom: "pool5"
  top: "fc6"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layers {
  name: "relu6"
  type: RELU
  bottom: "fc6"
  top: "fc6"
}
layers {
  name: "drop6"
  type: DROPOUT
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layers {
  name: "fc7"
  type: INNER_PRODUCT
  bottom: "fc6"
  top: "fc7"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layers {
  name: "relu7"
  type: RELU
  bottom: "fc7"
  top: "fc7"
}
layers {
  name: "drop7"
  type: DROPOUT
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}

layers {
  bottom: "fc7"
  top: "fc8_places"
  name: "fc8_places"
  type: INNER_PRODUCT
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 205
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.05
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "fc8_places"
  top: "prob"
  name: "prob"
  type: SOFTMAX
}
layers {
  name: "accuracy"
  type: ACCURACY
  bottom: "fc8_places"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}
layers {
  bottom: "fc8_places"
  bottom: "label"
  name: "loss"
  type: SOFTMAX_LOSS
  include: { phase: TRAIN }
}

You need to change the pointers to your dataset and mean files, as well as the batch sizes maybe. You may also want to comment out the prob layer to have cleaner output using training.

@3DTOPO
Copy link

3DTOPO commented Jul 25, 2016

If you want a big image set for training, you can download the imagenet database. It is what was used to train the default vgg-19 model.

http://image-net.org

@htoyryla
Copy link

Imagenet is certainly a good choice if one wants to train with a general image set and has the computing platform for large scale training. I am planning to get another linux machine dedicated for training but for the moment I cannot tie up my linux computer long enough for other than small experiments (which are good for learning anyway).

@ProGamerGov
Copy link
Author

ProGamerGov commented Jul 25, 2016

@htoyryla
As far as I understand, fine-tuning an already trained model means that you can use a smaller data set.

So I have this data set here with art images:

I just posted a few examples but every category seems to have between 50 and 80 images. People-Art has multiple areas such as Annotations and JPEG images where as Photo-Art does not. Would the wiki-art data set be better or would the People-Art/Photo-Art-50 data set be better for training?


People-Art: 

People-Art\Annotations\Academicism\albert-anker_b-ckligumpen-1866.jpg.xml
People-Art\Annotations\Academicism\albert-joseph-moore_amber.jpg.xml

People-Art\JPEGImages\Academicism\albert-anker_b-ckligumpen-1866.jpg
People-Art\JPEGImages\Academicism\albert-joseph-moore_amber.jpg

People-Art\matlab_funcs\demo_show_anno.m
People-Art\matlab_funcs\VOCevaldet_cai.m

People-Art\test.txt
People-Art\train.txt
People-Art\trainval.txt
People-Art\trainval_only_fg_ims.txt
People-Art\val.txt




Photo-Art-50:

Photo-Art-50\016.boom-box\016a_0001.jpg
Photo-Art-50\101.head-phones\101a_0001.jpg
Photo-Art-50\101.head-phones\101a_0002.jpg

And this previously fine tuned model here that already produces good images in neural-style:

https://gist.github.com/jimmie33/509111f8a00a9ece2c3d5dde6a750129#file-readme-md

How would I step by step, convert this data set into the lmdb files and then how would I exactly use your prototxt to train the already made caffemodel? What train.prototxt and solver.txt files do I need and which ones do I modify? What modifications do I make? I have tried modifying ones that were unclear based on the naming, which file I should to replace it. I tried making a NIN model like the one in Neural-Style using the CIFAR10 data set, but it had the exact same amount of layers that my previous CIFAR10 model had and not the same layers as Neural-Style's NIN model has.

I found this fine tuning command on the Berkeley site:

./build/tools/caffe train -solver models/finetune_flickr_style/solver.prototxt -weights models//bvlc_reference_caffenet.caffemodel -gpu 0

I can easily modify the paths and filenames, but is it the right command to use?


With the wiki-art data set, how exactly do I convert it to the lmdb files that I need? This lmdb part is probably the most confusing part of neural networks for me because I have not found any guides that let me make sense of what exactly I have to do.

And @htoyryla , if possible, could you post the lmdb files and mean files you made from the wiki-art data set for me to download?

@ProGamerGov
Copy link
Author

ProGamerGov commented Jul 26, 2016

So I tried to fine-tune the VGG16 SOD model on the CIFAR10 data set, and received the following error:

ubuntu@ip-Address:~/caffe$ ./build/tools/caffe train -solver models/vgg16_finetune/solver.prototxt -weights models/vgg16_finetune/VGG16_SOD_finetune.caffemodel -gpu 0

I0726 00:44:44.228581  1820 layer_factory.hpp:74] Creating layer data
I0726 00:44:44.228623  1820 net.cpp:84] Creating Layer data
I0726 00:44:44.228648  1820 net.cpp:338] data -> data
I0726 00:44:44.228682  1820 net.cpp:338] data -> label
I0726 00:44:44.228709  1820 net.cpp:113] Setting up data
I0726 00:44:44.228801  1820 db.cpp:34] Opened lmdb /home/ubuntu/caffe/examples/cifar10/cifar10_train_lmdb
I0726 00:44:44.228873  1820 data_layer.cpp:67] output data size: 28,3,224,224
I0726 00:44:44.228899  1820 data_transformer.cpp:22] Loading mean file from: /home/ubuntu/caffe/data/cifar10/cifar10_train_mean.binaryproto
I0726 00:44:44.234645  1820 net.cpp:120] Top shape: 28 3 224 224 (4214784)
I0726 00:44:44.234693  1820 net.cpp:120] Top shape: 28 (28)
I0726 00:44:44.234710  1820 layer_factory.hpp:74] Creating layer conv1_1
I0726 00:44:44.234742  1820 net.cpp:84] Creating Layer conv1_1
I0726 00:44:44.234756  1820 net.cpp:380] conv1_1 <- data
I0726 00:44:44.234807  1820 net.cpp:338] conv1_1 -> conv1_1
I0726 00:44:44.234838  1820 net.cpp:113] Setting up conv1_1
F0726 00:44:44.241438  1825 data_transformer.cpp:138] Check failed: height <= datum_height (224 vs. 32)
*** Check failure stack trace: ***
    @     0x7f38355c4daa  (unknown)
    @     0x7f38355c4ce4  (unknown)
    @     0x7f38355c46e6  (unknown)
    @     0x7f38355c7687  (unknown)
    @     0x7f38359303c1  caffe::DataTransformer<>::Transform()
    @     0x7f38359eb4f8  caffe::DataLayer<>::InternalThreadEntry()
    @     0x7f382d2e5a4a  (unknown)
    @     0x7f382b73c182  start_thread
    @     0x7f3834baf47d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
ubuntu@ip-Address:~/caffe$

I was also using this solver.prototxt: https://github.com/ruimashita/caffe-train/blob/master/vgg.solver.prototxt and htoyryla's train_val.prototxt

Same error on the normal VGG-16 model:

ubuntu@ip-Address:~/caffe$ ./build/tools/caffe train -solver models/vgg16/solver.prototxt -weights models/vgg16/VGG_ILSVRC_16_layers.caffemodel -gpu 0

layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss/loss"
}
I0726 00:55:56.447276  1872 layer_factory.hpp:74] Creating layer data
I0726 00:55:56.447317  1872 net.cpp:84] Creating Layer data
I0726 00:55:56.447342  1872 net.cpp:338] data -> data
I0726 00:55:56.447377  1872 net.cpp:338] data -> label
I0726 00:55:56.447404  1872 net.cpp:113] Setting up data
I0726 00:55:56.447495  1872 db.cpp:34] Opened lmdb /home/ubuntu/caffe/examples/cifar10/cifar10_train_lmdb
I0726 00:55:56.447563  1872 data_layer.cpp:67] output data size: 64,3,224,224
I0726 00:55:56.458580  1872 net.cpp:120] Top shape: 64 3 224 224 (9633792)
I0726 00:55:56.458628  1872 net.cpp:120] Top shape: 64 (64)
I0726 00:55:56.458647  1872 layer_factory.hpp:74] Creating layer conv1_1
I0726 00:55:56.458678  1872 net.cpp:84] Creating Layer conv1_1
I0726 00:55:56.458693  1872 net.cpp:380] conv1_1 <- data
I0726 00:55:56.458720  1872 net.cpp:338] conv1_1 -> conv1_1
I0726 00:55:56.458788  1872 net.cpp:113] Setting up conv1_1
F0726 00:55:56.465386  1877 data_transformer.cpp:138] Check failed: height <= datum_height (224 vs. 32)
*** Check failure stack trace: ***
    @     0x7f22574a2daa  (unknown)
    @     0x7f22574a2ce4  (unknown)
    @     0x7f22574a26e6  (unknown)
    @     0x7f22574a5687  (unknown)
    @     0x7f225780e3c1  caffe::DataTransformer<>::Transform()
    @     0x7f22578c94f8  caffe::DataLayer<>::InternalThreadEntry()
    @     0x7f224f1c3a4a  (unknown)
    @     0x7f224d61a182  start_thread
    @     0x7f2256a8d47d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
ubuntu@ip-Address:~/caffe$

@ProGamerGov
Copy link
Author

I took the Cubo-Futurism jpg files from the people art data set. I then tried and failed to successfully create the val and train lmdb files.

@htoyryla
Copy link

htoyryla commented Jul 26, 2016

You get the error because my training VGG16 prototxt (and any imagenet based prototxt) expects 256x256 images (then cropped accroding to the prototxt to 224x224) and CIFAR is 32x32.

Check failed: height <= datum_height (224 vs. 32)

I can help with LMDB and prototxt but for a few days I am terribly busy with other things and mostly not even near a computer.

LMDB is created using a script like in caffe/examples/imagenet/create_imagenet.sh, but the script usually needs to be adjusted for paths etc. It can take some time to get used to it and get everything to match, so that the script finds the train.txt and val.txt files as well as the images referred to in them, the image sizes are correct, then it creates two LMDB files. Then you calculate the mean images based on the LMDBs using caffe/examples/imagenet/make_imagenet_mean.sh (or something like that). Then modify the training prototxt to point to your LMDBs and binaryproto files. And make sure the solver.prototxt points to the correct training prototxt.

The train.txt and val.txt for the LMDB creation contain lines like

path_to_an_image label

where label is an integer from 0 .. number_of_categories-1

The handling of paths can be a bit tricky. They are relative to paths set in create_imagenet.sh, but it took me some time to get the paths right.

This is all I can contribute right now. After a few days I will have better time to respond. I am not sure if I have my wikiart LMDB any more, I have other LMDBs but they are usually quite large files.

PS. See also the caffe imagenet example for the LMDB part (never mind if the page talks about leveldb instead of lmdb, it is an alternative option). http://caffe.berkeleyvision.org/gathered/examples/imagenet.html
You might also try the example as such, then the paths should match readily.

@ProGamerGov
Copy link
Author

ProGamerGov commented Jul 27, 2016

So I have my images at:

/home/ubuntu/caffe/data/People-Art/JPEGImages/Academicism
/home/ubuntu/caffe/data/People-Art/JPEGImages/AnalyticalRealism
/home/ubuntu/caffe/data/People-Art/JPEGImages/ArtDeco
/home/ubuntu/caffe/data/People-Art/JPEGImages/ArtNouveau(Modern)
/home/ubuntu/caffe/data/People-Art/JPEGImages/Biedermeier
/home/ubuntu/caffe/data/People-Art/JPEGImages/cartoon
/home/ubuntu/caffe/data/People-Art/JPEGImages/Classicism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Constructivism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Cubism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Cubo-Futurism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Divisionism
/home/ubuntu/caffe/data/People-Art/JPEGImages/EnvironmentalArt
/home/ubuntu/caffe/data/People-Art/JPEGImages/FantasticRealism
/home/ubuntu/caffe/data/People-Art/JPEGImages/FeministArt
/home/ubuntu/caffe/data/People-Art/JPEGImages/HighRenaissance
/home/ubuntu/caffe/data/People-Art/JPEGImages/Impressionism
/home/ubuntu/caffe/data/People-Art/JPEGImages/InternationalGothic
/home/ubuntu/caffe/data/People-Art/JPEGImages/Japonism
/home/ubuntu/caffe/data/People-Art/JPEGImages/LowbrowArt
/home/ubuntu/caffe/data/People-Art/JPEGImages/MagicRealism
/home/ubuntu/caffe/data/People-Art/JPEGImages/MechanisticCubism

etc...

Full list of the folders containing images and ls of cd People-Art: https://gist.github.com/ProGamerGov/4627306588e9d232aa0431c4e26b9687

Each folder of images has a "gt.txt" file. This is what the gt.txt file looks like:

https://gist.github.com/ProGamerGov/2339b815b9e462cb69cd5bb7d156ee9a

Though I believe this may be part of the Cross-Depiction aspect of the data set.

My train.txt and val.txt at:

/home/ubuntu/caffe/data/People-Art/train.txt 
/home/ubuntu/caffe/data/People-Art/val.txt 

train.txt: https://gist.github.com/ProGamerGov/1be5afe398c825cfc3ea119005af71fb
val.txt: https://gist.github.com/ProGamerGov/08b121968b28e9f09ddf3e096f424944

My create_imagenet.sh file: https://gist.github.com/ProGamerGov/5f92bdc8e7d83756268f438cf15261eb

located at: /home/ubuntu/caffe/create_imagenet_2.sh

The prototxt of the model I want to fine tune has crop_size: 224, do I need to make the resize value in my create_imagenet_2.sh script the same value?

RESIZE_HEIGHT=256
RESIZE_WIDTH=256

I then run:

ubuntu@ip-Address:~/caffe$ ./create_imagenet_2.sh

Creating train lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:17:01.271579  2440 convert_imageset.cpp:79] Shuffling data
I0727 00:17:01.660755  2440 convert_imageset.cpp:82] A total of 0 images.
I0727 00:17:01.661175  2440 db.cpp:34] Opened lmdb examples/imagenet/people-art_train_lmdb
Creating val lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:17:01.971226  2451 convert_imageset.cpp:79] Shuffling data
I0727 00:17:02.378626  2451 convert_imageset.cpp:82] A total of 0 images.
I0727 00:17:02.379034  2451 db.cpp:34] Opened lmdb examples/imagenet/people-art_val_lmdb
Done.
ubuntu@ip-Address:~/caffe$

This creates two folders:

/home/ubuntu/caffe/examples/imagenet/people-art_train_lmdb
/home/ubuntu/caffe/examples/imagenet/people-art_val_lmdb

Inside both folders are data.mdb and lock.mdb files. They are all 8 KB each in both folders.

Trying to run the script again results in this:

ubuntu@ip-Address:~/caffe$ ./create_imagenet_2.sh
Creating train lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:19:56.326292  2482 convert_imageset.cpp:79] Shuffling data
I0727 00:19:56.722890  2482 convert_imageset.cpp:82] A total of 0 images.
F0727 00:19:56.723007  2482 db.cpp:27] Check failed: mkdir(source.c_str(), 0744) == 0 (-1 vs. 0) mkdir examples/imagenet/people-art_train_lmdbfailed
*** Check failure stack trace: ***
    @     0x7f5be1af4daa  (unknown)
    @     0x7f5be1af4ce4  (unknown)
    @     0x7f5be1af46e6  (unknown)
    @     0x7f5be1af7687  (unknown)
    @     0x7f5be1e54eee  caffe::db::LMDB::Open()
    @           0x403122  main
    @     0x7f5be0d04ec5  (unknown)
    @           0x403e5c  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
Creating val lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:19:56.955780  2491 convert_imageset.cpp:79] Shuffling data
I0727 00:19:57.348181  2491 convert_imageset.cpp:82] A total of 0 images.
F0727 00:19:57.348299  2491 db.cpp:27] Check failed: mkdir(source.c_str(), 0744) == 0 (-1 vs. 0) mkdir examples/imagenet/people-art_val_lmdbfailed
*** Check failure stack trace: ***
    @     0x7fcbeb0cedaa  (unknown)
    @     0x7fcbeb0cece4  (unknown)
    @     0x7fcbeb0ce6e6  (unknown)
    @     0x7fcbeb0d1687  (unknown)
    @     0x7fcbeb42eeee  caffe::db::LMDB::Open()
    @           0x403122  main
    @     0x7fcbea2deec5  (unknown)
    @           0x403e5c  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
Done.
ubuntu@ip-Address:~/caffe$


This is the readme.txt that came with the data set: https://gist.github.com/ProGamerGov/dfc8652f3db5bc91acdf34ff22c86bd2

I am not exactly sure what is causing my issue, but could it be that the script is not accounting for the structure of my data set?

@htoyryla
Copy link

htoyryla commented Jul 27, 2016

You need to put all the information into train.txt and val.txt. That is where caffe expects to find the urls and the labels. Like this:

/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/egon-schiele_seated-girl-1910.jpg 2
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/salvador-dali_still-life-pulpo-y-scorpa.jpg 2
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/orest-kiprensky_young-gardener-1817.jpg 7
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/david-burliuk_in-the-park.jpg 5
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/giovanni-battista-piranesi_vedute-di-roma-30.jpg 4
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/basuki-abdullah_bocah.jpg 6

" A total of 0 images." means that caffe does not find the image files.

Setting the paths in the train.txt versus create_imagenet.sh can be a bit confusing. Unfortunately I don't have the script file for wikiart anymore. But I think what worked for me was to use full path in the train.txt and set the paths in the script as follows:

EXAMPLE=<full path where to place the lmdb> 
DATA=<full path where to find the train.txt and val.txt>
TOOLS=/home/hannu/caffe/build/tools

TRAIN_DATA_ROOT=/  
VAL_DATA_ROOT=/ 

The root paths are set to / because the train.txt contains full paths. It should also work so that one sets the data root path to directory and has relative urls in the txt files, but I remember having some difficulty with that.

I usually write small python scripts to manipulate or create the txt files in the correct format. For my geometrical shapes test I had image files name rect000001.png, ellipse000001.png and so on, then I wrote a python script like this:

from os import listdir
from os.path import isfile, join

mypath = "/home/hannu/work/Geom/data/train/data/"
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

for file in onlyfiles:
  output = mypath + file
  if "rect" in file:
    output = output + " 0"
  elif "ellipse" in file:
    output = output + " 1"
  elif "triangle" in file:
    output = output + " 2"
  elif "xtrap" in file:
    output = output + " 3"
  elif "ytrap" in file:
    output = output + " 4"
  elif "ashape" in file:
    output = output + " 5"
  elif "lshape" in file:
    output = output + " 6"
  elif "oshape" in file:
    output = output + " 7"
  elif "ushape" in file:
    output = output + " 8"
  elif "vshape" in file:
    output = output + " 9" 
  print output

and run the output into train.txt. Nothing fancy but it worked.

@htoyryla
Copy link

htoyryla commented Jul 27, 2016

You might have a problem with your caffe installation, too, as you had this error message:

libdc1394 error: Failed to initialize libdc1394

I haven't seen this. As far as I understand, this library is for FireWire connection which should not be needed. Found this on google https://kradnangel.gitbooks.io/caffe-study-guide/content/caffe_errors.html

@ProGamerGov
Copy link
Author

ProGamerGov commented Jul 27, 2016

I usually write small python scripts to manipulate or create the txt files in the correct format.

https://stackoverflow.com/questions/11003761/notepad-add-to-every-line

I just used this trick to fix my train and val files quickly.

You might have a problem with your caffe installation, too, as you had this error message:

libdc1394 is for video camera usage and not critical to Caffe as far as I understand. I have a few times disabled it and everything still works fine.

@htoyryla
Copy link

Perhaps you can manage with notepad but for instance for Wikiart, I think I created the txt files from a downloaded csv file which had all the paths and labels but not in the correct format. Also once I needed to change the label numbering starting from zero instead of one.

@htoyryla
Copy link

htoyryla commented Jul 27, 2016

One more thing if you are planning to finetune. You should change the dimension of fc8 layer (assuming training a VGG) to match the number of categories in your dataset. Also, change the name of fc8 to something else, so that caffe will not try to initialize the weights from the original caffemodel which would fail because of the size mismatch. It is typical to use a name like fc8-10 if you have ten categories.

Like this in the training prototxt:

layers {
  bottom: "fc7"
  top: "fc8_168"
  name: "fc8_168"
  type: INNER_PRODUCT
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 168
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  name: "accuracy"
  type: ACCURACY
  bottom: "fc8_168"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}
layers {
  bottom: "fc8_168"
  bottom: "label"
  name: "loss"
  type: SOFTMAX_LOSS
  include: { phase: TRAIN }
}

@ProGamerGov
Copy link
Author

The changes to my create_imagenet_2.sh file, val.txt, train.txt:

https://gist.github.com/ProGamerGov/8267d29262f1bd6570e5918719600695

Still result in the same error.

@ProGamerGov
Copy link
Author

@htoyryla Thanks, I'll make the modifications to my train_val.prototxt.

@htoyryla
Copy link

htoyryla commented Jul 27, 2016

Changing the fc8 layer will not solve the LMDB creation. It is another issue which you'll face once you get the LMDB and start finetuning.

@htoyryla
Copy link

htoyryla commented Jul 27, 2016

I still don't see the labels in your train.txt, only the image paths.

@ProGamerGov
Copy link
Author

For the labels, do I put it as a different number value for each category?

@htoyryla
Copy link

htoyryla commented Jul 27, 2016

Yes, the labels should be integers from 0 to number_of_categories - 1 as I wrote earlier.

During training, caffe will feed each image into the model and, as there are outputs for each labels, train the model to activate the correct output for each image. Without the labels, there is nothing to guide the training and the model will not learn anything. Also, if all images have the same label, the model simply learns to always output that label regardless of the image, so it will not learn anything about the images. It is only when the labels tell something essential about the images that meaningful learning is possible.

@ProGamerGov
Copy link
Author

Ok, I think I got it now. Change the fc8_168 to fc8_43 because I have 43 categories. Then change it to fcpa_43. Even with scripts in Notepad, it will take me a little while to label all the categories. Do I need to do this for both the train and val txt files, or just the one?

@htoyryla
Copy link

htoyryla commented Jul 27, 2016

train.txt and val.txt both have to conform to this format. They also should not include same files, as the val.txt is used to crosscheck that the model really learns to generalize and not simply remember the individual images. I usually first make a train.txt containing all images & labels and then use a script to move every tenth entry to val.txt.

I might first make very short txt files to test if the lmdb creation succeeds. There may still be an issue in the create_imagenet.sh, too. I have sometimes struggled with the paths, everything looked ok but 0 images found, until suddenly after changing something back and forth it worked.

@htoyryla
Copy link

htoyryla commented Jul 27, 2016

I didn't understand your "Then change it to fcpa_43". It should be enough to change to fc8_43, so that the layer name is not fc8 which is in the caffemodel which you will finetune.

@ProGamerGov
Copy link
Author

@htoyryla Ok, thanks for the help!

@ProGamerGov
Copy link
Author

So I successfully create the lmdb files!

https://gist.github.com/ProGamerGov/d0038f7e3186d057bb7b26398bd764f9

It seems that a few of the images listed in the train.txt and val.txt files, did not exist in the actual data set.

@htoyryla
Copy link

htoyryla commented Jul 27, 2016

It happened to me too, now that you mention. Many (most?) datasets do not contain the actual images, only links for downloading from the original location. Probably the wikiart urls no longer work for some files, those files don't get downloaded. It is like broken links, not unusual in internet.

@ProGamerGov
Copy link
Author

@htoyryla

From previous testing, I found this interesting: https://i.imgur.com/XHg8CPA.jpg
It appears that after 200 iterations, the edge detection starts being damaged by the fine-tuning. On all my training experiments, I have seen very similar results. Though usually the iterations where edge detection begins to break down are near the half way point, as opposed to so close to the start.

@ProGamerGov
Copy link
Author

ProGamerGov commented Nov 26, 2016

@htoyryla The difference between "layers" and "layers" is that "layers" is the outdated version of the prototxt. You can use upgrade_net_proto_text to update the prototxt file to the newer version.

cd ~ 

cd caffe

./build/tools/upgrade_net_proto_text vgg16_finetuned_train_val.prototxt vgg16_finetuned_train_val_out.prototxt

./build/tools/upgrade_net_proto_binary VGG16_SOD_finetune.caffemodel VGG16_SOD_finetune_out.caffemodel

@ProGamerGov
Copy link
Author

ProGamerGov commented Nov 27, 2016

I seem to have figured out how to change the output in an neutral manor that only affects the seed value in Neural-Style by fine-tuning the VGG-16 SOD Finetune model. Interestingly enough my data set was composed of art produced by neural networks.

Edit:

On closer inspection, it appears like the differences between the original and the fine-tuned version are in terms of smaller details. I only ran it for 600 iterations as I have to use AWS spot instances for this kind of stuff, but it looks like the newly fine-tuned model version produces more intricate details than the original model.

If I have achieved settings that result in an almost neutral change, then I can now theoretically change single parameters, target layers, etc... to achieve better artistic outputs.

@ProGamerGov
Copy link
Author

ProGamerGov commented Nov 27, 2016

So targeting specific layers seems to produce different output that are not worse than the original model's outputs. Really wish I had the resources to fully flesh this out, as it looks really promising for enhancing Neural-Style's outputs.

I think that by targeting different combinations of the default layers that Neural-Style uses, one can improve the model's ability in specific areas with the proper data set.

@ProGamerGov
Copy link
Author

ProGamerGov commented Nov 27, 2016

This prototxt here has been configured to stop learning on all layers by default: https://gist.github.com/ProGamerGov/1514d74dc6b799389875ce1764c1a12e

I was using the VGG16_SOD_finetune model: https://gist.github.com/jimmie33/509111f8a00a9ece2c3d5dde6a750129

And I ran ./build/tools/upgrade_net_proto_binary VGG16_SOD_finetune.caffemodel VGG16_SOD_finetune_out.caffemodel to convert the model to the latest version of Caffe.

You can allow learning on your layer of choice by changing the following lines of code on the desired layer:

  param {
    lr_mult: 0
    decay_mult: 1
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }

To:

param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }

The learning related values are from this Caffe guide here for training certain layers exclusively: https://github.com/BVLC/caffe/wiki/Fine-Tuning-or-Training-Certain-Layers-Exclusively

Another note is that edge detection abilities of the model do not seem to be positively or negatively impacted by this layer specific training.


I can also provide my two category Deepart.io and Ostagram data set which contains aproximately 3000 images for each of the two categories, if you want.

@ProGamerGov
Copy link
Author

crowsonkb's style_transfer has an updated Amazon AMI, which has the latest version of Caffe already installed.

@ProGamerGov
Copy link
Author

ProGamerGov commented Nov 28, 2016

It looks like training a specific layer, or the default Neural-Style layers, requires a lot longer training time to notice major differences between the original and fine-tuned model.

Here are the results from some small scale experiments I ran using the newly found neutral training parameters on the upgraded model and protoxt files: https://i.imgur.com/k0jxvtv.png

@ProGamerGov
Copy link
Author

ProGamerGov commented Nov 28, 2016

So, just in case I am making the wrong assumptions, as per the prototxt file and Neural-Style's default layer related settings, the -content_layers and -style_layers map to the following prototxt layer names.

Prototxt Neural-Style
conv1_1 relu1_1
conv2_1 relu2_1
conv3_1 relu3_1
conv4_1 relu4_1
conv4_2 relu4_2
conv5_1 relu5_1

Or is Neural-Style using the part below each "conv" layer which has "relu" instead of "conv"?

Example of the prototxt layout:

layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1_1"
  type: "ReLU"
  bottom: "conv1_1"
  top: "conv1_1"
}

The prototxt I was using can be found here: https://gist.github.com/ProGamerGov/1514d74dc6b799389875ce1764c1a12e

@htoyryla
Copy link

htoyryla commented Nov 29, 2016

I am not fully sure I understand your question. Especially when you say 'is Neural-Style using the part below each "conv" layer which has "relu" instead of "conv"' Below in the sense "below in the prototxt file" or "in a lower layer".

But never mind. ReLU is really nothing more than an add-on function on top of a conv layer which sets all negative values to zero. This is why it is also called a rectifier. So in theory convx_y can output both negative and positive values, but after relux_y all negative values have been replaced by zero.

Furthermore, this discussion #93 hints that in an implementation such as Torch, the ReLU layer is actually perfomed in-place, which I read to mean that the ReLU directly modifies the memory containing the output of the convlayer. If this is true then there is actually no difference whether one uses conv or relu layers in neural_style, the ReLU function is there anyway, even if you access the conv layer.

@jcjohnson
Copy link
Owner

jcjohnson commented Nov 29, 2016

@htoyryla You are correct that ReLU is performed in-place in Torch so after a forward pass it doesn't matter whether you pick a conv layer or its associated ReLU layer; they will both have the same value. However there will be a difference during the backward pass: when you backprop through a ReLU layer, the upstream gradients will be zeroed in the same places the activations were zeroed during the forward pass; if you ask neural-style to work with a conv layer then it will not backprop through the ReLU during the backward pass.

This means that when you ask neural-style to use activations on a conv layer, ReLU gets used during the forward pass but not during the backward pass, so the backward pass will not be correct in this case. You can still get nice style transfer effects even when the gradients are incorrect in this way, but for this reason I'd generally expect better results using ReLU layers.

@htoyryla
Copy link

@jcjohnson, good point, I did not think about the backward pass.

@ProGamerGov
Copy link
Author

I suspect that image quality affects training accuracy . This research paper seems to show the effects of image quality on training neural network models: "Understanding How Image Quality Affects Deep Neural Networks"

@ProGamerGov
Copy link
Author

I recently trained a NIN model on a roughly sorted custom data set of about 40,000 faces. There appear to be direct improvements to how the model handles faces in terms of content images. But style images which do not have faces, do not work as well. I think that if one could train the model on artwork, in addition to common content images, it would help the model understand both.

@htoyryla
Copy link

htoyryla commented May 9, 2017

I have sometimes been thinking about using two models, one for style, one for content, both trained with limited material. Don't know if it would work though, and memory usage would certainly be a problem. Yet it could be an interesting exercise.

@ProGamerGov
Copy link
Author

ProGamerGov commented May 9, 2017

@htoyryla That idea could be more resource efficient by using two small NIN like models that are trained on one target category each only.


So it turns out that at least for the NIN model, it still has the knowledge required for style transfer, in addition to the newer face related knowledge that I gave it.

The unmodified NIN model is on the right, and the fine tuned NIN model is on the left:

I used a DeepDream project based on Neural-Style to try and determine why things had changed in the modified NIN model. Below are the DeepDream layer activation tests for all 29 layers used by the NIN model:

The original model:

The modified model:

These DeepDream images helped me figure out that by simply changing the -content_layers and -style_layers, I could utilize the improved facial feature detection abilities of my fine tuned NIN model.

The NIN model itself that I created, had 15700 iterations during training, and seemed to maintain 86-96% accuracy during the last couple thousand iterations. With around 40k training images, I calculated around 24-25 epochs occurred during the training session? I also stopped the training 11600 iterations, in order to lower the learning rate so that the loss would continue going down. I'm not sure if I was over-fitting the model, but it seemed to have improved abilities on an image that it was not apart of the training data set.

After the NIN experiments, I attempted to fine tuned a VGG-16 model on my rough faces data set. It's a lot slower to fine tune VGG-16 models than it is to fine tune NIN models. From iterations 1000 to 8000, it seems that the model is actually improving on it's ability to recognize facial features:

The output from the non fine tuned SOD_FINETUNE model can be found here: https://i.imgur.com/wWtWysT.png

Obviously for my experiments I used the exact same parameters, seed values, etc... to eliminate any other things that might cause different outputs.

An album with the full versions of the images I posted in this comment can be found here:
https://imgur.com/a/njDJ1

Edit:

To clarify, the VGG-16 model that I fine tuned is called the "VGG-16 SOD Finetune" model. The "finetune" in the original model's name is because it was fine tuned for salient object detection from the regular VGG-16 model. I have now fine tuned this previously fine tuned model, with a new data set.

@ProGamerGov
Copy link
Author

ProGamerGov commented May 10, 2017

Trying to train a NIN model from scratch with my data set did not work, and only produces blurry style transfer images, and broken DeepDream images. Maybe there are certain classes that help the model learn other classes? Or maybe I just choose bad training parameters?

Edit:

Analyzing the training loss (idk what graphing tool to use), it appears like the NIN model from scratch had the loss decrease quickly, and then stay constant. For the fine-tuned NIN model, the training loss dropped quickly and seems to have very slowly decreased/maybe stayed the same. Though it must have worked better than when I tried to train from scratch, seeing as it does appear to have better facial feature detection abilities.

The fine tuned SOD model has the loss drop continuously over time, which I imagine looks like what one should expect with good training parameters. So I think the results from my fine tuned NIN model are questionable and needs better training parameters, but the VGG-16 SOD model seems to actually be improved in a way that is appropriately reflected in the loss values.

Second Edit:

After some more testing on my fine tuned SOD model, it appears that I may have actually improved the model with very little change to the model's other abilities. It now more accurately deals with faces, and possibly other parts of the human body (upper portion of the body I think).

I wonder if the "roughly sorted" nature of my data set helps the model's new abilities, or weakens the model's new abilities?

@ProGamerGov
Copy link
Author

ProGamerGov commented May 10, 2017

The training loss graphs seem to support my results.

The NIN model from scratch is on the left, and the fine tuned NIN model is on the right:

The fine tuned VGG-16 model:

I think using a larger batch size (64 instead of less than 10) compared to earlier experiments, is part of the reason for this recent training success.

@ProGamerGov
Copy link
Author

ProGamerGov commented May 11, 2017

I think I might be onto something here as my fine tuned model appears to be better at facial feature preservation:

An album with the full images can be found here: https://imgur.com/a/tArrY

It looks as though my fine tuned model is more accurately detecting the eyes, and mouth of the person in the photo.

  • The mouth is more "horizontal" in the image produced by my fine tuned model, just like in the original photo, while the unmodified model is curving the mouth in an extreme way.

  • The chin in my image seems to be more "separated" from the neck and background than the control image. I believe this is from a technique people use to create better looking photographs of themselves via making their chin stand out from their neck more.

  • The eyes in my fine tuned model are outlined, while the original model does not outline the eyes.

  • The eyebrows are darker in my image, than in the control image.

The solver.prototxt file and the train_val.prototxt can be found here: https://gist.github.com/ProGamerGov/2bdf7659ee14dac03269a3ec3a7f1fcd

@ProGamerGov
Copy link
Author

Imagemagick seems to be slow for resizing large data sets of images (especially when using the -resize option), but using parallel like this makes it faster:

parallel -j 8 convert {} -resize '256x256^' -gravity Center {} ::: *.png
parallel -j 8 mogrify {} -format png {} ::: *.jpg

You can get Parallel via sudo apt-get install parallel.

Source: https://stackoverflow.com/questions/26168783/imagemagick-convert-and-gnu-parallel-together

@ProGamerGov
Copy link
Author

I uploaded the Rough Faces model and added a link to download it on the alternative models wiki page: https://github.com/jcjohnson/neural-style/wiki/Using-Other-Neural-Models

Hopefully it can help those seeking better facial preservation with Neural-Style.

@htoyryla
Copy link

For me, the Rough Faces model didn't work well. For a faces specific model, I would expect it to have strong activations for features like head/face shape, hair, eyes, nose, mouth etc. Here, it mainly picked up the merengue desert in the background :)

content image

hannu512z

style image

naama007

result

out

@ProGamerGov
Copy link
Author

I did test your content image with my fine tuned model, and I think the issue may be that the "rough faces" training data was not very diverse and as a result it performs best with certain images. My example image for testing, was also a part of training data, so that may skew the results (though I did test it on other images that I think were not part of the training data).

@ProGamerGov
Copy link
Author

ProGamerGov commented Feb 8, 2018

I was looking through my old experiments, and I see that I didn't seem to actually share the two successfully fine-tuned models that I had created. The one model in particular (The "Plaster" model) creates a very different output than the non fine-tuned version.

Some experimentation with parameters may be required to achieve satisfactory results as like with all the models I trained and fine-tuned, I would only test them in Neural-Style with certain parameter values.

I'm not sure if the "Low Noise" model is actually different than the non fine-tuned model in a way that that's useful for certain styles like the "Plaster" model is, so it can be removed if it's not useful.

I posted the models on the wiki page here: https://github.com/jcjohnson/neural-style/wiki/Using-Other-Neural-Models

Seeing as both models are from 2016, I am going to test them with a bunch of more "modern" Neural-Style parameters, like setting the TV weight to 0, using the Adam parameters I discovered in addition to L-BFGS, and using multiscale resolution.

@Nusrat12
Copy link

I want to change iteration numbers.Where I have to change?

@ProGamerGov
Copy link
Author

@Nusrat12 You control the maximum number of iterations by setting the max_iter: value in your solver.prototxt. You can see an example of it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants