Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dnn] Unknown layer type Square in op netvlad/l2_normalize/Square #11223

Closed
DavideCatto opened this issue Apr 4, 2018 · 17 comments
Closed

[dnn] Unknown layer type Square in op netvlad/l2_normalize/Square #11223

DavideCatto opened this issue Apr 4, 2018 · 17 comments
Assignees

Comments

@DavideCatto
Copy link

System information (version)
  • OpenCV: master
  • SO: Windows 64 Bit
  • Compiler: Visual Studio 2015
  • Module Opencv: dnn
Error description

During the loading of tf model cv::dnn::Net net = cv::dnn::readNetFromTensorflow(modelFile);
I get this error, related to the l2_normalize layer:

"Unknown layer type Square in op netvlad/l2_normalize/Square"

How can solve it? Can I merge or define a layer like "FusedBatchNorm" for obtain the "L2Normalize" layer? I'm asking for this solution because I have seen that during the populateNet function in tf_importer.cpp there is a layer defined "L2Normalize" but it doesn t enter into the L2normalize's if because it searching for the "Square" layer that isn t defined.

@dkurt dkurt self-assigned this Apr 4, 2018
@DavideCatto
Copy link
Author

Update
@dkurt thanks for the reply in #11023 ; I don't want to "spam" in OE-4 "OpenCV 4.0" discussion so I answer here.
I'm trying to use your class cv::dnn::NormalizeBBoxLayer suggested but the problem is refered during the cv::dnn::readNetFromTensorflow(modelFile). L2_normalize layer in my model isn't the last layer so I can't cut my model and using NormalizeBBoxLayer for normalize the result of the network. After the L2normalization my model does some convolution and so on.

The problem is that some layer like L2normalize isn t supported but I have seen that in tf_importer.cpp there is a L2Normalize layer defined.
Thanks again for your support.

@dkurt
Copy link
Member

dkurt commented Apr 5, 2018

@DavideCatto, Could you please test your model with the changes from a pull request #11236?

@DavideCatto
Copy link
Author

@dkurt I have tested your pull request #11236 and the conversion of the layer l2_normalize seem worker; the problems occurs when it try to recognize the l2_normalization type. In particular the model does the l2.nn.l2_normalize using axis=1 and the error is "Cannot determine normalization axis".

I have check that axis is equal 1 before the if, but using tensorflow Data_layout is Channel last so I think doesn't enter into the if section.

int axis = reductionIndices.at<int>(0); --> OK axis=1
if (data_layouts[name] == DATA_LAYOUT_NHWC && (axis == -1 || axis == 3) ||
 data_layouts[name] == DATA_LAYOUT_NCHW && axis == 1)
            {
                ...
            }
else
    CV_Error(Error::StsNotImplemented, "Cannot determine normalization axis"); --> Error

Thanks for your support, I think that for the moment the l2_normalization along other axis isn' supported yet. I hope it will be implemented soon on in other version of opencv.

@dkurt
Copy link
Member

dkurt commented Apr 6, 2018

@DavideCatto, please provide TensorFlow code. Do you mean something like this?

input = tf.placeholder(tf.float32, [1, 2, 3, 4])  # NHWC data layout
norm = tf.nn.l2_normalize(input, axis=1)

@DavideCatto
Copy link
Author

DavideCatto commented Apr 6, 2018

@dkurt exactly like your code, a simple:
norm = tf.nn.l2_normalize(input, axis=1)

-- another unsupported layer --
also I have noticed that this "layer"/function:
descriptor = tf.expand_dims(conv_norm, axis=-1)
doesn' t work. I think it's another unsupported layers (tf.expand_dims)

@dkurt
Copy link
Member

dkurt commented Apr 9, 2018

@DavideCatto, just to clarify the operation, for the following input of size 1x2x3x4 where 4 is a number of channels and 2x3 are height x width:

[[[[ 0.08519531  0.98623365  0.918129    0.0386029 ]      # row: 0, col: 0
   [ 1.5385507   0.6286373  -0.40492147  2.2781415 ]      # row: 0, col: 1
   [ 0.88855684  0.81495595  0.5966554  -0.3921826 ]]     # row: 0, col: 2

  [[-0.03891217 -0.38194928 -0.60780287 -0.3312916 ]      # row: 1, col: 0
   [-1.8702366   0.8889074  -1.3220125  -0.29169777]      # row: 1, col: 1
   [-0.07340333  0.03783234 -2.1564784   0.40045223]]]]   # row: 1, col: 2

TensorFlow code

inp = tf.placeholder(tf.float32, [1, 2, 3, 4], 'input')
l2norm = tf.nn.l2_normalize(inp, axis=1)
save(inp, l2norm, 'l2_normalize')

produces the following output:

[[[[ 0.9096127   0.9325103   0.8338413   0.11573936]      # row: 0, col: 0
   [ 0.6353026   0.57740223 -0.29286224  0.99190205]      # row: 0, col: 1
   [ 0.9966053   0.9989242   0.2666619  -0.6996914 ]]     # row: 0, col: 2

  [[-0.41545722 -0.36114326 -0.5520043  -0.9932796 ]      # row: 1, col: 0
   [-0.7722633   0.81645983 -0.9561547  -0.12700512]      # row: 1, col: 1
   [-0.08232917  0.04637262 -0.9637902   0.71444523]]]]   # row: 1, col: 2

So it normalizes an every column.

0.9096127**2 + (-0.41545722)**2 = 1 # col: 0, channel 0
0.6353026**2 + (-0.7722633)**2 = 1  # col: 1, channel 0
0.9966053**2 + (-0.08232917)**2 = 1  # col: 3, channel 0

0.9325103**2 + (-0.36114326)**2 = 1 # col: 0, channel 1
...

pic1
NOTE: an image here and below represents NCHW data layout.

However

inp = tf.placeholder(tf.float32, [1, 2, 3, 4], 'input')
l2norm = tf.nn.l2_normalize(inp, axis=[1, 2])
save(inp, l2norm, 'l2_normalize')

normalizes spatial slices:

[[[[ 0.03299117  0.5723221   0.32199275  0.01616837]
   [ 0.5957909   0.36480504 -0.14200813  0.9541725 ]
   [ 0.34408623  0.47292775  0.20925026 -0.16426101]]

  [[-0.01506841 -0.2216493  -0.21315973 -0.13875754]
   [-0.72423357  0.51584256 -0.46363688 -0.12217414]
   [-0.02842483  0.02195452 -0.7562885   0.16772465]]]]
0.03299117**2 + 0.5957909**2 + 0.34408623**2 + (-0.01506841)**2 + (-0.72423357)**2 + (-0.02842483)**2 = 1 # channel 0
0.5723221**2 + 0.36480504**2 + 0.47292775**2 + (-0.2216493)**2 + 0.51584256**2 + 0.02195452**2 = 1 # channel 1

pic2

To normalize a whole batch, you need to pass all the dimensions:

l2norm = tf.nn.l2_normalize(inp, axis=[1, 2, 3])
[[[[ 0.01758606  0.2035789   0.1895207   0.00796843]
   [ 0.3175885   0.12976366 -0.08358411  0.47025523]
   [ 0.1834164   0.16822366  0.12316194 -0.08095454]]

  [[-0.00803227 -0.07884219 -0.12546301 -0.06838539]
   [-0.3860553   0.18348876 -0.27289057 -0.06021241]
   [-0.01515196  0.00780937 -0.44514146  0.08266157]]]]

Here a sum of squares of all the values is equal to 1.

Could you please specify what kind of transformation do you want to implement?

@DavideCatto
Copy link
Author

DavideCatto commented Apr 9, 2018

@dkurt
Problem 1
In my case the l2_normalization is used like this (using your dimensions, and 1 is the number of examples into a batch, so my example have dimensionality [2, 3, 4]):
inp = tf.placeholder(tf.float32, [1, 2, 3, 4], 'input')
conv_reshape = tf.reshape(inp, 2x3, 4], name='reshape') # conv_reshape is B x N x D --> 1x6x4
conv_norm = tf.nn.l2_normalize(conv_reshape, axis=1) # normalize descriptors
descriptor = tf.expand_dims(conv_norm, axis=-1, name='expanddim') # descriptor is B x N x D x 1

Every example into the batch [1, 2, 3, 4] is used like a [6, 4] descriptors, where 6 represent the number of the local descriptors for that example and every local descriptors's size is 4.

My problem maybe is that the l2_normalization is working consider NHWC or a NCHW input tensors, and it isn't generic for all type tensors. example [1, 2, 3]

Problem 2
Furthermore the tf.expand_dims is necessary for obtain, after a convolution of the input [BxNxDx1] (my local descriptors associated to an example) with the kernel [1, D, 1, K] (D is the dimension of the local descriptor and K is the centers of the clusters) an output of [B x N x 1 x K] (for every Descriptors obtain the "distance"/"similarity" with all K centers)

Update
I know that my l2_normalization is similar to:

  1. Remove Reshape and get [2,3,4] tensor
  2. Do the l2_normalize using axis=1,2
  3. Reshape tensor to get [6,4]
    The problem is that the output of the Conv described over, so a tensor of [BxNx1xK] is passed through a series of operation and convolution to obtain a tensor of [BxDxK]. Furthermore there is a intra and inter normalization so, the l2_normalize should works with tensors with different shape.

@dkurt
Copy link
Member

dkurt commented Apr 10, 2018

@DavideCatto, May I ask you to test your model with an updated PR #11236?

@DavideCatto
Copy link
Author

@dkurt Thank you for your work! I have tried and it's seem working, have you already tested with simple model like this one?
input = tf.placeholder(tf.float32, [2, 3, 4], name="input")
output = tf.nn.l2_normalize(input, axis=1, name="output") <-- axis=1 or axis=2 or axis=[1,2]

I'm trying right now these examples, because my model can define now the l2_normalize layer but it has some trouble with another unsupported layer: ExpandDims layer that isn't implemented yet.

If you want, here there is my test model. The input layer is named "input" with a size of (224, 224, 3) and an output named "netvlad/l2_normalization_2" with an output of (8192).
output_graph.zip

@dkurt
Copy link
Member

dkurt commented Apr 10, 2018

@DavideCatto, You may look at opencv/opencv_extra#454 for added tests.

Have you tried to replace tf.expand_dims to tf.reshape? The first one is useful only if input's sizes are unknown.

@DavideCatto
Copy link
Author

@dkurt, OK, for the test! my mistake, sorry.

Fot the second question my problem is that the tf.expand_dims is necessary for obtain, after a convolution of the input [BxNxDx1] (my local descriptors associated to an example) with the kernel [1, D, 1, K] (D is the dimension of the local descriptor and K is the centers of the clusters) an output of [B x N x 1 x K] (for every Descriptors obtain the "distance"/"similarity" with all K centers).

So is for obtain from [BxNxDx1] conv2d [1,D,1,K] an output of [BxNx1xK], I can't use reshape, it's for matrix coherence dimensionality.

@dkurt
Copy link
Member

dkurt commented Apr 10, 2018

@DavideCatto, please add a code example.

@DavideCatto
Copy link
Author

DavideCatto commented Apr 10, 2018

@dkurt Of course..

input = tf.placeholder(tf.float32, [1, 14, 14, 128], name="input")
conv_reshape = tf.reshape(input, shape = [-1, (14*14), 128], name='reshape') # conv_reshape is BxNxD
conv_norm = tf.nn.l2_normalize(conv_reshape, axis=1)
descriptor = tf.expand_dims(conv_norm, axis=-1, name='expanddim')  # descriptor is B x N x D x 1
conv_vlad = tf.nn.convolution(descriptor, filt, padding='VALID')  # conv_vlad is B x N x 1 x K
bias = tf.nn.bias_add(conv_vlad, conv_biases)
a_k = tf.nn.softmax(tf.squeeze(bias, axis=2), axis=-1, name="vlad_softmax")  # a_k is B x N x K
V1 = tf.matmul(conv_reshape, a_k, transpose_a=True)  # V_1 is B x D x K
V2 = tf.multiply(tf.reduce_sum(a_k, axis=1, keepdims=True), centers)  # V_1 is B x D x K
V = tf.subtract(V1, V2)
norm = tf.nn.l2_normalize(tf.reshape(tf.nn.l2_normalize(V, axis=1), shape=[-1, 128 * 64]), axis=1)  # norm is B x (D x K)

where:
filter is [1, D, 1, K] in this case [1, 128, 1, 64]
conv_biases is [K] in this case [64]
centers is [D, K] in this case [128, 64]

@DavideCatto
Copy link
Author

@dkurt any update?

@dkurt
Copy link
Member

dkurt commented Apr 19, 2018

@DavideCatto, the problem is that we try to focus on image processing and it's hard to support general cases like that. I'll try to make this particular example work again.

@dkurt
Copy link
Member

dkurt commented Jun 25, 2018

@DavideCatto, This is a simplified part of your sample which can work with changes from a PR #11826:

import tensorflow as tf
import cv2 as cv
import numpy as np

np.random.seed(324)

D = 128
K = 64
filt = tf.Variable(tf.random_normal([1, D, 1, K]), name='filt')
conv_biases = tf.Variable(tf.random_normal([K]), name='conv_biases')

input = tf.placeholder(tf.float32, [1, 14*14, D, 1], name="input")
conv_norm = tf.nn.l2_normalize(input, axis=1)
conv_vlad = tf.nn.convolution(conv_norm, filt, padding='VALID', name='output')  # conv_vlad is B x N x 1 x K
bias = tf.nn.bias_add(conv_vlad, conv_biases)

a_k = tf.nn.softmax(tf.squeeze(bias, axis=2), axis=-1)  # a_k is B x N x K

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    inputData = np.random.standard_normal([1, 14*14, D, 1]).astype(np.float32)
    tfOut = sess.run(sess.graph.get_tensor_by_name('output:0'),
                     feed_dict={'input:0': inputData})

    graph_def = sess.graph.as_graph_def()
    graph_def = tf.graph_util.convert_variables_to_constants(sess, graph_def, ['output'])
    with tf.gfile.FastGFile('graph.pb', 'wb') as f:
            f.write(graph_def.SerializeToString())

net = cv.dnn.readNet('graph.pb')
net.setInput(inputData.transpose(0, 3, 1, 2))
cvOut = net.forward()
print np.max(np.abs(cvOut - tfOut.transpose(0, 3, 1, 2)))

Output difference is about 1e-6.

As you may see it receives already reshaped data. The following untrainable matmul is not supported for now. You may perform it outside the network multiplying it's input to it's output.

@DavideCatto
Copy link
Author

@dkurt OK, thank you for this solution; It's a bit different from the original but I think that it works for my purpose. Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants