-
Notifications
You must be signed in to change notification settings - Fork 2k
Caffe Translator Error: Convolution Layer #468
Comments
I've had conversion issues too and haven't been able to resolve them yet. I tried a GoogleNet model, which doesn't seem to work in Caffe2, but works great in Caffe. Input is a single color image [1, 3, 224, 224].
|
Hi, @milewis1 - thx for your other PR's that I've found that have allowed me to get as far as I have trying to convert this ResNet model. - #472 Afterwards, I get the same error you have here. I was able to get past it by sending a Tensor with only 1 dimension instead of the required 4 - only to still fail a few lines of code later. However, in the other issue thread, @KeyKy gave an example as to how he edited the prototxt file. I tried my best to edit the ResNet50 in a similar way. (more complicated then I first thought) So I guess that there must be also something wrong with the way it's converting the input layer on convolution layers in general and / or maybe it does something wrong when connected to a BatchNorm layer. |
@littleowl In my opinion, I look into the SpatialBN source code and find:
scale is a tensor of size C. But the PR set it to 1. So in my code, i set it to C. My code:
It does not give me NaN with my prototxt. |
Thanks for the update. I will check that out tomorrow and update the PR as
needed.
…-- Mike
On Wed, May 3, 2017 at 10:16 PM 康洋 ***@***.***> wrote:
@littleowl <https://github.com/littleowl> In my opinion, I look into the
SpatialBN source code and found:
.Input(
1,
"scale",
"The scale as a 1-dimensional tensor of size C to be applied to the "
"output.")
scale is a tensor of size C. But the PR set it to 1. So in my code, i set
it to C. My code:
@TranslatorRegistry.Register("BatchNorm")
def TranslateBatchNorm(layer, pretrained_blobs, is_test):
caffe_op = BaseTranslate(layer, "SpatialBN")
output = caffe_op.output[0]
param = layer.batch_norm_param
AddArgument(caffe_op, "is_test", is_test)
AddArgument(caffe_op, "epsilon", param.eps)
AddArgument(caffe_op, "order", "NCHW")
caffe_op.input.extend([output + "_scale", output + "_bias", output + "_mean", output + "_var"])
if not is_test:
caffe_op.output.extend([output + "_mean", output + "_var", output + "_saved_mean", output + "_saved_var"])
n_channels = pretrained_blobs[0].shape[0] # get C
mean = utils.NumpyArrayToCaffe2Tensor(pretrained_blobs[0], output + '_mean')
var = utils.NumpyArrayToCaffe2Tensor(pretrained_blobs[1], output + '_var')
pretrained_blobs[2] = np.tile(pretrained_blobs[2], (n_channels, )) # set C
scale = utils.NumpyArrayToCaffe2Tensor(pretrained_blobs[2], output + '_scale')
# Create a zero bias array the same size as the scale, we'll let the following
# Scale (Mul + Add operators in Caffe2) layer handle any bias, just like Caffe
bias = utils.NumpyArrayToCaffe2Tensor(np.zeros_like(pretrained_blobs[2]), output + '_bias')
return caffe_op, [scale, bias, mean, var]
It does not give my NaN with my prototxt.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#468 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AC2ETXEkp31ozqjTH7wD9YVNzIAu_H3Tks5r2TURgaJpZM4NNOUx>
.
|
@milewis1 were you able to find a fix for this? I'm running into the exact same problem. I'm trying to get a ResNet50 translated from a caffe model to a caffe2 one. I already applied PR #469 that you proposed, that helped with the BatchNorm, but now I'm running into this issue with the convolutional layer. |
I think I might have found an issue. In the def ConvertTensorProtosToInitNet(net_params, input_name):
init_net = caffe2_pb2.NetDef()
for tensor in net_params.protos:
print tensor.name, list(tensor.dims) # <--- this line
if len(tensor.float_data) == 0:
raise RuntimeError("Only float tensors are supported in this util.")
op = core.CreateOperator(
"GivenTensorFill", [], [tensor.name],
arg=[
utils.MakeArgument("shape", list(tensor.dims)),
utils.MakeArgument("values", tensor.float_data)])
init_net.op.extend([op])
init_net.op.extend([core.CreateOperator("ConstantFill", [], [input_name], shape=[1])])
return init_net you will see in the output that two tensors with the same name show up. First, with the right number of dimensions (4), and then with just 1. See below
A hack I did to see if this was the issue was to keep track of the tensors that had already been seen in the loop in
|
OK, so it seems like the issue I mentioned in the previous comment has to do with the Scale layer being done in-place on the network I'm translating (ResNet50). In the function output = mul_op.output[0] to output = layer.name in Right now the network runs without errors but I still need to see if the output I get is correct. |
OK, tested and seems like the network works fine now. I'll submit a PR with the change. |
Good to know. I was just catching up on this. When initially had the
problem I wound up have to set each of my layers to have defined inputs and
outputs, aka not–in-place. That solved the issue for me.
…On Tue, Jul 11, 2017 at 4:37 PM Daniel Hauagge ***@***.***> wrote:
OK, so it seems like the issue I mentioned in the previous comment has to
do with the Scale layer being done in-place on the network I'm translating
(ResNet50).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#468 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AC2ETU1Eh9cPHLl1_A7W96fj38sQht4Rks5sM90lgaJpZM4NNOUx>
.
|
I have the same problem.RuntimeError Traceback (most recent call last) RuntimeError: [enforce fail at tensor.h:671] i < dims_.size(). 0 vs 0. Exceeding ndim limit Error from operator: |
Have you resolved them? @ARSwhut |
@ARSwhut ARS Have you resolved it? |
When I initially encountered this problem, the solution was to update the
Caffe prototxt so that there were no inplace layers.
…On Tue, Dec 26, 2017 at 2:26 AM wm10240 ***@***.***> wrote:
@ARSwhut <https://github.com/arswhut> ARS Have you resolved it?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#468 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AC2ETXi4_B3A_r1KvHrI1FPQTXCa684Nks5tEJ-dgaJpZM4NNOUx>
.
|
@danielhauagge thanks for you solution!! it works fine now!!!! |
I have a Caffe model that I'm trying to translate into Caffe2. However, I'm running across the following error on the first operator:
RuntimeError: [enforce fail at conv_op_impl.h:25] X.ndim() == filter.ndim(). 4 vs 1 Error from operator:
input: "data" input: "conv1_w" input: "conv1_b" output: "conv1" type: "Conv" arg { name: "stride" i: 2 } arg { name: "pad" i: 3 } arg { name: "kernel" i: 7 }
The original Caffe model start like this:
layer { name: "data" type: "Input" top: "data" input_param { shape { dim: 1 dim: 3 dim: 224 dim: 224 } } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 64 bias_term: true pad: 3 kernel_size: 7 stride: 2 } }
My input is a single color image whose shape going in is: [1, 3, 224, 224]. Has anyone tried to do something similar?
The text was updated successfully, but these errors were encountered: