Caffe Translator Error: Convolution Layer #468

milewis1 · 2017-05-01T17:21:34Z

I have a Caffe model that I'm trying to translate into Caffe2. However, I'm running across the following error on the first operator:

RuntimeError: [enforce fail at conv_op_impl.h:25] X.ndim() == filter.ndim(). 4 vs 1 Error from operator:
input: "data" input: "conv1_w" input: "conv1_b" output: "conv1" type: "Conv" arg { name: "stride" i: 2 } arg { name: "pad" i: 3 } arg { name: "kernel" i: 7 }

The original Caffe model start like this:

layer { name: "data" type: "Input" top: "data" input_param { shape { dim: 1 dim: 3 dim: 224 dim: 224 } } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 64 bias_term: true pad: 3 kernel_size: 7 stride: 2 } }

My input is a single color image whose shape going in is: [1, 3, 224, 224]. Has anyone tried to do something similar?

The text was updated successfully, but these errors were encountered:

teaglin · 2017-05-01T19:37:59Z

I've had conversion issues too and haven't been able to resolve them yet. I tried a GoogleNet model, which doesn't seem to work in Caffe2, but works great in Caffe. Input is a single color image [1, 3, 224, 224].

caffe2::EnforceNotMet: [enforce fail at fully_connected_op.h:61] K == W.size() / W.dim32(0). Dimension mismatch: X: 1 1024 2 2, W: 2 1024, b: 2, axis: 1, M: 1, N: 2, K: 4096 Error from operator: input: "pool5/7x7_s1" input: "loss3/classifier_w" input: "loss3/classifier_b" output: "loss3/classifier" type: "FC"

KleinYuan · 2017-05-02T08:14:06Z

Same here
But it means what it says that the dimension does not match.

littleowl · 2017-05-03T21:25:47Z

Hi, @milewis1 - thx for your other PR's that I've found that have allowed me to get as far as I have trying to convert this ResNet model. - #472

Afterwards, I get the same error you have here. I was able to get past it by sending a Tensor with only 1 dimension instead of the required 4 - only to still fail a few lines of code later.
libc++abi.dylib: terminating with uncaught exception of type caffe2::EnforceNotMet: [enforce fail at conv_op_impl.h:33] C == filter.dim32(1) * group_. Convolution op: input channels does not match: # of input channels 0 is not equal to kernel channels * group:3*1 Error from operator:

However, in the other issue thread, @KeyKy gave an example as to how he edited the prototxt file. I tried my best to edit the ResNet50 in a similar way. (more complicated then I first thought)
Of course, I am unsure of these changes and if the model would still work afterwards. The interesting thing is I was able to run the predictor without error - only somewhere a layer and everything afterwards returns NaN
Seems odd how it would still work after so many changes to inputs and outputs anyways.

So I guess that there must be also something wrong with the way it's converting the input layer on convolution layers in general and / or maybe it does something wrong when connected to a BatchNorm layer.
:(

KeyKy · 2017-05-04T02:16:44Z

@littleowl In my opinion, I look into the SpatialBN source code and find:

    .Input(
        1,
        "scale",
        "The scale as a 1-dimensional tensor of size C to be applied to the "
        "output.")

scale is a tensor of size C. But the PR set it to 1. So in my code, i set it to C. My code:

@TranslatorRegistry.Register("BatchNorm")
def TranslateBatchNorm(layer, pretrained_blobs, is_test):
    caffe_op = BaseTranslate(layer, "SpatialBN")
    output = caffe_op.output[0]
    param = layer.batch_norm_param
    AddArgument(caffe_op, "is_test", is_test)
    AddArgument(caffe_op, "epsilon", param.eps)
    AddArgument(caffe_op, "order", "NCHW")

    caffe_op.input.extend([output + "_scale", output + "_bias", output + "_mean", output + "_var"])
    if not is_test:
        caffe_op.output.extend([output + "_mean", output + "_var", output + "_saved_mean", output + "_saved_var"])

    n_channels = pretrained_blobs[0].shape[0] # get C
    mean = utils.NumpyArrayToCaffe2Tensor(pretrained_blobs[0], output + '_mean')
    var = utils.NumpyArrayToCaffe2Tensor(pretrained_blobs[1], output + '_var')
    pretrained_blobs[2] = np.tile(pretrained_blobs[2], (n_channels, )) # set C
    scale = utils.NumpyArrayToCaffe2Tensor(pretrained_blobs[2], output + '_scale')

    # Create a zero bias array the same size as the scale, we'll let the following
    # Scale (Mul + Add operators in Caffe2) layer handle any bias, just like Caffe
    bias = utils.NumpyArrayToCaffe2Tensor(np.zeros_like(pretrained_blobs[2]), output + '_bias')

    return caffe_op, [scale, bias, mean, var]

It does not give me NaN with my prototxt.

milewis1 · 2017-05-04T18:38:18Z

Thanks for the update. I will check that out tomorrow and update the PR as needed.

…

-- Mike

On Wed, May 3, 2017 at 10:16 PM 康洋 ***@***.***> wrote: @littleowl <https://github.com/littleowl> In my opinion, I look into the SpatialBN source code and found: .Input( 1, "scale", "The scale as a 1-dimensional tensor of size C to be applied to the " "output.") scale is a tensor of size C. But the PR set it to 1. So in my code, i set it to C. My code: @TranslatorRegistry.Register("BatchNorm") def TranslateBatchNorm(layer, pretrained_blobs, is_test): caffe_op = BaseTranslate(layer, "SpatialBN") output = caffe_op.output[0] param = layer.batch_norm_param AddArgument(caffe_op, "is_test", is_test) AddArgument(caffe_op, "epsilon", param.eps) AddArgument(caffe_op, "order", "NCHW") caffe_op.input.extend([output + "_scale", output + "_bias", output + "_mean", output + "_var"]) if not is_test: caffe_op.output.extend([output + "_mean", output + "_var", output + "_saved_mean", output + "_saved_var"]) n_channels = pretrained_blobs[0].shape[0] # get C mean = utils.NumpyArrayToCaffe2Tensor(pretrained_blobs[0], output + '_mean') var = utils.NumpyArrayToCaffe2Tensor(pretrained_blobs[1], output + '_var') pretrained_blobs[2] = np.tile(pretrained_blobs[2], (n_channels, )) # set C scale = utils.NumpyArrayToCaffe2Tensor(pretrained_blobs[2], output + '_scale') # Create a zero bias array the same size as the scale, we'll let the following # Scale (Mul + Add operators in Caffe2) layer handle any bias, just like Caffe bias = utils.NumpyArrayToCaffe2Tensor(np.zeros_like(pretrained_blobs[2]), output + '_bias') return caffe_op, [scale, bias, mean, var] It does not give my NaN with my prototxt. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#468 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC2ETXEkp31ozqjTH7wD9YVNzIAu_H3Tks5r2TURgaJpZM4NNOUx> .

danielhauagge · 2017-07-11T15:52:50Z

@milewis1 were you able to find a fix for this? I'm running into the exact same problem. I'm trying to get a ResNet50 translated from a caffe model to a caffe2 one. I already applied PR #469 that you proposed, that helped with the BatchNorm, but now I'm running into this issue with the convolutional layer.

danielhauagge · 2017-07-11T18:51:24Z

I think I might have found an issue. In the caffe_translator code, if you add a print statement to
ConvertTensorProtosToInitNet

def ConvertTensorProtosToInitNet(net_params, input_name):
    init_net = caffe2_pb2.NetDef()
    for tensor in net_params.protos:
        print tensor.name, list(tensor.dims) # <--- this line
        if len(tensor.float_data) == 0:
            raise RuntimeError("Only float tensors are supported in this util.")
        op = core.CreateOperator(
            "GivenTensorFill", [], [tensor.name],
            arg=[
                utils.MakeArgument("shape", list(tensor.dims)),
                utils.MakeArgument("values", tensor.float_data)])
        init_net.op.extend([op])
    init_net.op.extend([core.CreateOperator("ConstantFill", [], [input_name], shape=[1])])
    return init_net

you will see in the output that two tensors with the same name show up. First, with the right number of dimensions (4), and then with just 1. See below

conv_1_w [64L, 3L, 7L, 7L]       <--- first time shows up with correct number of dimensions
conv_1_b [2L, 2L, 2L, 2L, 2L]
conv_1_scale [64L]
conv_1_bias [64L]
conv_1_mean [64L]
conv_1_var [64L]
conv_1_w [64L]    <--- now shows up with only 1 dimension
conv_1_b [64L]

A hack I did to see if this was the issue was to keep track of the tensors that had already been seen in the loop in ConvertTensorProtosToInitNet, if the tensor had already been seen, then I continued to the next tensor without extending init_op. After this I got a different error:

RuntimeError: [enforce fail at elementwise_op.h:180] A.ndim() > B.ndim(). 4 vs 4. If you are doing broadcasting, input1 should have a smaller number of dimensions. Error from operator: 
input: "conv_1" input: "conv_1_w" output: "conv_1_internal" type: "Mul" arg { name: "axis" i: 1 } arg { name: "broadcast" i: 1 }

danielhauagge · 2017-07-11T20:37:49Z

OK, so it seems like the issue I mentioned in the previous comment has to do with the Scale layer being done in-place on the network I'm translating (ResNet50).

In the function TranslateScale the variable named output is being set to mul_op.output[0]. In my case, because the Scale layer is being done in-place, the output and input have the same name, and that is the name of the convolution layer conv_1, not the scale layer. That causes it's parameter names to clash with those of the convolution layer. One thing I did was to change

output = mul_op.output[0]

to

output = layer.name

in TranslateScale. Now output has the value scale_1, which prevents the name clash I mentioned in the previous comment. I'm not sure if this should be done throughout the code though, is there a reason to use .output[0] instead of layer.name?

Right now the network runs without errors but I still need to see if the output I get is correct.

danielhauagge · 2017-07-11T20:56:23Z

OK, tested and seems like the network works fine now. I'll submit a PR with the change.

milewis1 · 2017-07-11T20:56:23Z

Good to know. I was just catching up on this. When initially had the problem I wound up have to set each of my layers to have defined inputs and outputs, aka not–in-place. That solved the issue for me.

…

On Tue, Jul 11, 2017 at 4:37 PM Daniel Hauagge ***@***.***> wrote: OK, so it seems like the issue I mentioned in the previous comment has to do with the Scale layer being done in-place on the network I'm translating (ResNet50). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#468 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC2ETU1Eh9cPHLl1_A7W96fj38sQht4Rks5sM90lgaJpZM4NNOUx> .

ARSwhut · 2017-10-21T10:43:56Z

I have the same problem.

RuntimeError Traceback (most recent call last)
in ()
9
10 # run the net and return prediction
---> 11 results = p.run([img])
12 #results = np.asarray(results)
13 #print "results shape: ", results.shape

RuntimeError: [enforce fail at tensor.h:671] i < dims_.size(). 0 vs 0. Exceeding ndim limit Error from operator:
input: "data" input: "conv1_w" input: "conv1_b" output: "conv1" type: "Conv" arg { name: "stride" i: 2 } arg { name: "pad" i: 0 } arg { name: "kernel" i: 3 } device_option { } engine: ""
** while accessing input: data

yangqiongyongyu · 2017-11-25T08:48:36Z

Have you resolved them? @ARSwhut

wm10240 · 2017-12-26T07:26:17Z

@ARSwhut ARS Have you resolved it?

milewis1 · 2017-12-26T13:37:48Z

When I initially encountered this problem, the solution was to update the Caffe prototxt so that there were no inplace layers.

…

On Tue, Dec 26, 2017 at 2:26 AM wm10240 ***@***.***> wrote: @ARSwhut <https://github.com/arswhut> ARS Have you resolved it? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#468 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC2ETXi4_B3A_r1KvHrI1FPQTXCa684Nks5tEJ-dgaJpZM4NNOUx> .

BIGBALLON · 2018-07-05T08:40:27Z

@danielhauagge thanks for you solution!! it works fine now!!!!

aaronmarkham added the translator label May 24, 2017

milewis1 closed this as completed Jul 12, 2017

danielhauagge mentioned this issue Jul 12, 2017

Change basename for proto tensor naming to avoid name collision #925

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caffe Translator Error: Convolution Layer #468

Caffe Translator Error: Convolution Layer #468

milewis1 commented May 1, 2017

teaglin commented May 1, 2017

KleinYuan commented May 2, 2017 •

edited

littleowl commented May 3, 2017

KeyKy commented May 4, 2017 •

edited

milewis1 commented May 4, 2017 via email

danielhauagge commented Jul 11, 2017

danielhauagge commented Jul 11, 2017 •

edited

danielhauagge commented Jul 11, 2017 •

edited

danielhauagge commented Jul 11, 2017 •

edited

milewis1 commented Jul 11, 2017 via email

ARSwhut commented Oct 21, 2017

yangqiongyongyu commented Nov 25, 2017

wm10240 commented Dec 26, 2017

milewis1 commented Dec 26, 2017 via email

BIGBALLON commented Jul 5, 2018

Caffe Translator Error: Convolution Layer #468

Caffe Translator Error: Convolution Layer #468

Comments

milewis1 commented May 1, 2017

teaglin commented May 1, 2017

KleinYuan commented May 2, 2017 • edited

littleowl commented May 3, 2017

KeyKy commented May 4, 2017 • edited

milewis1 commented May 4, 2017 via email

danielhauagge commented Jul 11, 2017

danielhauagge commented Jul 11, 2017 • edited

danielhauagge commented Jul 11, 2017 • edited

danielhauagge commented Jul 11, 2017 • edited

milewis1 commented Jul 11, 2017 via email

ARSwhut commented Oct 21, 2017

I have the same problem.

yangqiongyongyu commented Nov 25, 2017

wm10240 commented Dec 26, 2017

milewis1 commented Dec 26, 2017 via email

BIGBALLON commented Jul 5, 2018

KleinYuan commented May 2, 2017 •

edited

KeyKy commented May 4, 2017 •

edited

danielhauagge commented Jul 11, 2017 •

edited

danielhauagge commented Jul 11, 2017 •

edited

danielhauagge commented Jul 11, 2017 •

edited