Support cafemodel directly #72

waTeim · 2015-07-14T05:56:30Z

Yes, asked before in #55, and yes caffemodels can be converted. Not good enough. But looks like they can be read directly using Julia ProtoBuf.jl see JuliaIO/ProtoBuf.jl#48.

To confirm, what test would you propose?

waTeim · 2015-07-15T03:48:30Z

Following up, I wrote a little exploratory stuff and here are the layer types of GoogleNet. InnerProducts are supported, and I guess Convolutions. What about the rest? Looks like in addition there's DATA POOLING RELU SPLIT SOFTMAX_LOSS LRN CONCAT DROPOUT

julia> import CaffeOperations;
julia> x = CaffeOperations.loadCaffeeNetwork("bvlc_googlenet.caffemodel");

julia> reshape(CaffeOperations.layerTypes(x),(13,13))
13x13 Array{Symbol,2}:
 :DATA         :CONVOLUTION  :CONCAT       :CONVOLUTION  :CONVOLUTION    :INNER_PRODUCT  …  :RELU           :DROPOUT        :POOLING      :RELU         :RELU         
 :SPLIT        :RELU         :SPLIT        :RELU         :RELU           :SOFTMAX_LOSS      :CONVOLUTION    :INNER_PRODUCT  :CONVOLUTION  :CONVOLUTION  :CONVOLUTION  
 :CONVOLUTION  :CONVOLUTION  :CONVOLUTION  :CONCAT       :POOLING        :CONVOLUTION       :RELU           :SOFTMAX_LOSS   :RELU         :RELU         :RELU         
 :RELU         :RELU         :RELU         :POOLING      :CONVOLUTION    :RELU              :POOLING        :CONVOLUTION    :CONCAT       :POOLING      :CONVOLUTION  
 :POOLING      :CONVOLUTION  :CONVOLUTION  :SPLIT        :RELU           :CONVOLUTION       :CONVOLUTION    :RELU           :POOLING      :CONVOLUTION  :RELU         
 :LRN          :RELU         :RELU         :CONVOLUTION  :CONCAT         :RELU           …  :RELU           :CONVOLUTION    :SPLIT        :RELU         :POOLING      
 :CONVOLUTION  :CONVOLUTION  :CONVOLUTION  :RELU         :SPLIT          :CONVOLUTION       :CONCAT         :RELU           :CONVOLUTION  :CONCAT       :CONVOLUTION  
 :RELU         :RELU         :RELU         :CONVOLUTION  :POOLING        :RELU              :SPLIT          :CONVOLUTION    :RELU         :SPLIT        :RELU         
 :CONVOLUTION  :CONVOLUTION  :CONVOLUTION  :RELU         :CONVOLUTION    :CONVOLUTION       :POOLING        :RELU           :CONVOLUTION  :CONVOLUTION  :CONCAT       
 :RELU         :RELU         :RELU         :CONVOLUTION  :RELU           :RELU              :CONVOLUTION    :CONVOLUTION    :RELU         :RELU         :POOLING      
 :LRN          :POOLING      :CONVOLUTION  :RELU         :INNER_PRODUCT  :CONVOLUTION    …  :RELU           :RELU           :CONVOLUTION  :CONVOLUTION  :DROPOUT      
 :POOLING      :CONVOLUTION  :RELU         :CONVOLUTION  :RELU           :RELU              :INNER_PRODUCT  :CONVOLUTION    :RELU         :RELU         :INNER_PRODUCT
 :SPLIT        :RELU         :POOLING      :RELU         :DROPOUT        :POOLING           :RELU           :RELU           :CONVOLUTION  :CONVOLUTION  :SOFTMAX_LOSS 

julia> x.name
"GoogleNet"

pluskid · 2015-07-15T06:07:23Z

All the layers mentioned here are supported. Checkout the I Julia notebook of pretrained image net model for example of the correspondence.

waTeim · 2015-07-15T15:34:09Z

Yea I'm reading the docs now looks like there's a translation possible, I'm looking at them one by one.

So far looks like all of the convolution layers have 2 blobs associated with them is that expected?

As far as the Xavier filler <--> Initializer, looks like the caffe model allows parameterization?

dump(x.layers[9].convolution_param)
...
  weight_filler: CaffeOperations.caffe.FillerParameter 
    _type: ASCIIString "xavier"
    value: Float32 0.0
    min: Float32 0.0
    max: Float32 1.0
    mean: Float32 0.0
    std: Float32 0.03      <--- here
    sparse: Int32 -1
    variance_norm: Int32 0

Do you have a URL for that notebook?

pluskid · 2015-07-15T18:51:15Z

Sorry I'm currently traveling and do not have a computer. So I'll try to be brief.

There is a link to the notebook in the tutorial section of the doc. Currently xavier layer is not customizable I believe, but it should be very easy to add a parameter.

Yes convolutional layer expect two blocks, but you can always set the bias blob to zero if do not need it.

waTeim · 2015-07-17T20:18:04Z

that's fine, the caffe file does have 2 blobs. Bias blob? Does that correspond to bottom?

make_blob(backend, ...

Should this have a default value of whatever the current backend is?

pluskid · 2015-07-18T05:29:59Z

Yes caffe has two blobs for convolution. They are not bottoms, bottoms are input blobs, what we were talking about are parameter blobs.

I'm not sure I like the idea of a global backend. The idea is a user should supply an initialized backend whenever he wAnted to do something important. I think it is perfectly fine for the function that converting caffe model to accept a backend parameter.

waTeim · 2015-07-21T02:21:17Z

So how are the parameter blobs connected to a Convolution layer. I see the only candidates are bottom and top. If not those then what else is there?

pluskid · 2015-07-22T00:34:44Z

@waTeim filters and bias are parameters of a layer. For example, in an InnerProductLayer, top = parameter * bottom. There are three kinds of blobs: input (bottom), output (top), and weight/filters (parameters).

waTeim · 2015-07-22T05:37:44Z

The part I'm having trouble with is the mapping. Here's the ProtoBuf description. There is a blobs field in the layers section. When read this field is populated with 2 blobs. Which is which? They're no labeled

julia> size(x.layers[9].blobs)
(2,)

Bottoms and tops are set to arrays of symbols which I think refer to some index, how do the blobs get associated with those symbols. Does the .ipynb make it clear?

Current, maybe wrong

  return Mocha.ConvolutionLayer(
   name = caffeLayer.name,
   n_filter = Int(caffeLayer.convolution_param.num_output),
   kernel = kernel,
   pad = pad,
   stride = stride,
   filter_init = newInitializer(caffeLayer.convolution_param.weight_filler),
   bias_init = biasInitializer,
   tops = getLayerRefList(caffeLayer.top),
   bottoms = getLayerRefList(caffeLayer.bottom)
  );

pluskid · 2015-07-22T12:58:38Z

Is the x object a mocha net or a caffe net? In mocha layer state, there is a field called blobs which hold reference to output blobs, but you don't need to care about them as they will be created automatically. In contrary, in caffe, iirc, the blobs fields holds the parameter blobs. You can do the following things with it:

Ignore it, as the parameter blobs will be created automatically according to the specification such as n-filter, etc
You may do cross checking to make sure that the shape of the parameter blobs matches the specification of layer definition. Eg is the n-filter parameter correct?
If the caffe file contains a already trained model, you can actually copy those blobs out and use a customized initializer for the parameter blobs so that they are filled with those trained parameters instead of random init values.

waTeim · 2015-07-23T02:52:54Z

x is a parsed trained caffe net, so looks like option 3. Is this simply a matter of creating a new Initializer type?

pluskid · 2015-07-23T11:32:31Z

Yes, the easiest way I can imagine is to create an initializer that simply copy the content of an existing array to the target blob being initialized. Something roughly like

ConvolutionLayer(..., filter_init=CopyInitializer(caffe_layer.blobs[1]), bias_init=CopyInitializer(caffe_layer.blobs[2]),...)

waTeim · 2015-08-09T05:11:59Z

Took a while, but I'm back on it. This look about right?

immutable CopyInitializer <: Mocha.Initializer
   caffeBlob::caffe.BlobProto
end

function init(initializer::CopyInitializer,blob::Mocha.Blob)
   Mocha.fill!(blob,initializer.caffeBlob.data)
end

pluskid · 2015-08-09T14:11:36Z

Yes, maybe small modifications

I'm not sure whether the data in caffe.BlobProto will retain after you close the protobuffer file. You might need to copy the data into a Julia array and hold the Julia array in your CopyInitializer instead.
You should use Mocha.copy! instead of fill! as fill! is only used to fill everywhere with a scalar.

waTeim · 2015-08-09T19:49:12Z

I'm pretty sure that by the time it gets into to caffe.BlobProto it's an array that exists independent of the file, normal GC applies. Re using copy, yea asy change.

Coming up next is the Inner Product, layer type which seems to be straightforward, except it's not clear to me that caffe's num_output is equivalent to Mocha's dim, though it did appear to be the only choice left.

Here's the Protobuf stuff:

type InnerProductParameter
    num_output::UInt32
    bias_term::Bool
    weight_filler::FillerParameter
    bias_filler::FillerParameter
    axis::Int32

From caffe's docs:

Parameters (InnerProductParameter inner_product_param)

Required
num_output (c_o): the number of filters
Strongly recommended
weight_filler [default type: 'constant' value: 0]
Optional
bias_filler [default type: 'constant' value: 0]
bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs

pluskid · 2015-08-10T14:55:15Z

@waTeim Yes, num_output is exactly output_dim, and similar as before, the fillers correspond to initializers in Mocha.

waTeim · 2015-08-16T01:58:00Z

Last layer type, it's the data layer type and comes with a Transformation Parameter:

type TransformationParameter
    scale::Float32
    mirror::Bool
    crop_size::UInt32
    mean_file::AbstractString
    mean_value::Array{Float32,1}
    force_color::Bool
    force_gray::Bool
    TransformationParameter() = (o=new(); fillunset(o); o)
end #type TransformationParameter

A number of these things don't appear to be supported, but scale and mean. It looks like Caffe assumes both of these things happen simultaneously while mocha appears to want to apply one then the other (assuming mean subtraction followed by scaling). Caffe appears to have multiple mean values (1 per channel?) while Mocha want's a blob.

What's the expected format of this blob?

waTeim · 2015-08-16T21:27:23Z

Limited success.

To keep things simple I used cifar10_nin.caffemodel from Model Zoo
The output can be seen here.
I just arbitrarily picked input blob dimensions of 10x10x1x1 which is almost certainly wrong.

the critical line is this

 x = CaffeOperations.convertCaffeNetwork("cifar10_nin.caffemodel",[(10,10,1,1),(10,10,1,1)]);

How do I determine the input blobs dims. This comes from the data?

pluskid · 2015-08-17T15:17:30Z

scale and mean can be mapped to DataTransformers in Mocha.

Caffe specifies everything together, but technically they cannot happen "together". For example, caffe subtract the mean first, and then do re-scaling. See their code here: https://github.com/BVLC/caffe/blob/master/src/caffe/data_transformer.cpp#L113

Yes, Mocha data transformer expect a mean blob, which should be of the same shape as the data point. Specifically, for image data, we can make this blob by duplicating values for channels at each pixel location. For example,

mean_channels = [1,2,3] # a is an array of mean values for each of the RGB channel
img_width = 256
img_height = 256
mean_channels = reshape(mean_channels, (1,1,3)) # make it proper shape
mean_img = repeat(mean_channels, inner=[img_width,img_height,1]) # of proper layout for mean_blob

crop option can be supported by the CropLayer in Mocha.

force_color and force_gray are not supported yet.

pluskid · 2015-08-17T15:20:20Z

@waTeim That is brilliant! I'm not sure why do you need to decide the input blob dims? I'm not sure whether Caffe model stored this information somewhere. They will be automatically determined when the program start reading data from the HDF5 files. Do you mean you need this shape information in the data transformer?

waTeim · 2015-08-17T19:27:27Z

Hey, thanks! As far as dims, I kinda brought it on myself as I'm trying to remain agnostic as much as I can, and therefore as using MemoryDataLayer. Potentially I can use LevelDB directly as well with some additional help.

Here's the still primitive function in question:

function newDataLayer(caffeLayer::caffe.V1LayerParameter,dims)
   data = Vector{Array}();
   for i = 1:length(dims)
      push!(data,Array(Float32,dims[i]))
   end
   transformers::Vector = [];
   if ProtoBuf.has_field(caffeLayer,:transform_param)
      scale = Float32(caffeLayer.transform_param.scale)
      push!(transformers,Mocha.DataTransformers.Scale(scale));
   end
   return Mocha.MemoryDataLayer(
    name = caffeLayer.name,
    batch_size = 1,
    data = data,    
    transformers = transformers,
    tops = getLayerRefList(caffeLayer.top)
   );
end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support cafemodel directly #72

Support cafemodel directly #72

waTeim commented Jul 14, 2015

waTeim commented Jul 15, 2015

pluskid commented Jul 15, 2015

waTeim commented Jul 15, 2015

pluskid commented Jul 15, 2015

waTeim commented Jul 17, 2015

pluskid commented Jul 18, 2015

waTeim commented Jul 21, 2015

pluskid commented Jul 22, 2015

waTeim commented Jul 22, 2015

pluskid commented Jul 22, 2015

waTeim commented Jul 23, 2015

pluskid commented Jul 23, 2015

waTeim commented Aug 9, 2015

pluskid commented Aug 9, 2015

waTeim commented Aug 9, 2015

pluskid commented Aug 10, 2015

waTeim commented Aug 16, 2015

waTeim commented Aug 16, 2015

pluskid commented Aug 17, 2015

pluskid commented Aug 17, 2015

waTeim commented Aug 17, 2015

Support cafemodel directly #72

Support cafemodel directly #72

Comments

waTeim commented Jul 14, 2015

waTeim commented Jul 15, 2015

pluskid commented Jul 15, 2015

waTeim commented Jul 15, 2015

pluskid commented Jul 15, 2015

waTeim commented Jul 17, 2015

pluskid commented Jul 18, 2015

waTeim commented Jul 21, 2015

pluskid commented Jul 22, 2015

waTeim commented Jul 22, 2015

pluskid commented Jul 22, 2015

waTeim commented Jul 23, 2015

pluskid commented Jul 23, 2015

waTeim commented Aug 9, 2015

pluskid commented Aug 9, 2015

waTeim commented Aug 9, 2015

pluskid commented Aug 10, 2015

waTeim commented Aug 16, 2015

waTeim commented Aug 16, 2015

pluskid commented Aug 17, 2015

pluskid commented Aug 17, 2015

waTeim commented Aug 17, 2015