Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iteration_limit hit in OpRegularizerManager #35

Closed
VashishtMadhavan opened this issue Apr 26, 2019 · 5 comments
Closed

iteration_limit hit in OpRegularizerManager #35

VashishtMadhavan opened this issue Apr 26, 2019 · 5 comments

Comments

@VashishtMadhavan
Copy link

I have a situation in which the ITERATION_LIMIT is even when len(self._all_ops) is only about 3000 in OpRegularizerManager. It seems that certain ops keep cycling back on the self._op_deque. Is there a specific reason this happens? More generally, why are the same ops processed multiple times, even when they have the same type and handler?

In a simple LeNet training example, I go through 34 iterations of assign_grouping when there are far fewer ops to process

Itr: 0 Op: base/conv1/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 1 Op: base/conv1/BatchNorm/gamma/read Type: Identity Handler: grouping_op_handler
Itr: 2 Op: base/conv1/BatchNorm/gamma Type: VariableV2 Handler: grouping_op_handler
Itr: 3 Op: base/conv1/Conv2D Type: Conv2D Handler: output_non_passthrough_op_handler
Itr: 4 Op: Placeholder Type: Placeholder Handler: grouping_op_handler
Itr: 5 Op: base/conv1/Relu Type: Relu Handler: grouping_op_handler
Itr: 6 Op: base/pool1/MaxPool Type: MaxPool Handler: grouping_op_handler
Itr: 7 Op: base/conv2/Conv2D Type: Conv2D Handler: output_non_passthrough_op_handler
Itr: 8 Op: base/conv2/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 9 Op: base/conv2/BatchNorm/gamma/read Type: Identity Handler: grouping_op_handler
Itr: 10 Op: base/conv2/BatchNorm/gamma Type: VariableV2 Handler: grouping_op_handler
Itr: 11 Op: base/conv2/Relu Type: Relu Handler: grouping_op_handler
Itr: 12 Op: base/pool2/MaxPool Type: MaxPool Handler: grouping_op_handler
Itr: 13 Op: base/conv3/Conv2D Type: Conv2D Handler: output_non_passthrough_op_handler
Itr: 14 Op: base/conv3/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 15 Op: base/conv3/BatchNorm/gamma/read Type: Identity Handler: grouping_op_handler
Itr: 16 Op: base/conv3/BatchNorm/gamma Type: VariableV2 Handler: grouping_op_handler
Itr: 17 Op: base/conv3/Relu Type: Relu Handler: grouping_op_handler
Itr: 18 Op: base/conv_output/Conv2D Type: Conv2D Handler: output_non_passthrough_op_handler
Itr: 19 Op: base/conv_output/BiasAdd Type: BiasAdd Handler: grouping_op_handler
Itr: 20 Op: base/conv_output/biases/read Type: Identity Handler: grouping_op_handler
Itr: 21 Op: base/conv_output/biases Type: VariableV2 Handler: grouping_op_handler
Itr: 22 Op: Mean Type: Mean Handler: grouping_op_handler
Itr: 23 Op: base/conv1/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 24 Op: base/conv1/BatchNorm/gamma/read Type: Identity Handler: grouping_op_handler
Itr: 25 Op: base/conv2/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 26 Op: base/conv2/BatchNorm/gamma/read Type: Identity Handler: grouping_op_handler
Itr: 27 Op: base/conv3/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 28 Op: base/conv3/BatchNorm/gamma/read Type: Identity Handler: grouping_op_handler
Itr: 29 Op: base/conv_output/BiasAdd Type: BiasAdd Handler: grouping_op_handler
Itr: 30 Op: base/conv_output/biases/read Type: Identity Handler: grouping_op_handler
Itr: 31 Op: base/conv1/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 32 Op: base/conv2/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 33 Op: base/conv3/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 34 Op: base/conv_output/BiasAdd Type: BiasAdd Handler: grouping_op_handler
@VashishtMadhavan
Copy link
Author

Also here is the model I am considering, for the example above

def fully_conv_model(x_ph, is_training, scope, channels=[32, 64, 64], reuse=False):
    norm_params = {'is_training': is_training, 'scale': True, 'center': False}
    # Network Definition
    with tf.variable_scope(scope, reuse=reuse):
        with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      normalizer_fn=slim.batch_norm,
                      normalizer_params=norm_params,
                      weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
                      weights_regularizer=slim.l2_regularizer(0.0005)):
            conv1 = slim.conv2d(x_ph, num_outputs=channels[0], kernel_size=3, scope='conv1')
            pool1 = slim.max_pool2d(conv1, kernel_size=2, scope='pool1')
            conv2 = slim.conv2d(pool1, num_outputs=channels[1], kernel_size=3, scope='conv2')
            pool2 = slim.max_pool2d(conv2, kernel_size=2, scope='pool2')
            conv3 = slim.conv2d(pool2, num_outputs=channels[2], kernel_size=3, scope='conv3')
            out = slim.conv2d(conv3, num_outputs=10, kernel_size=7, padding='VALID', normalizer_fn=None, normalizer_params=None,
                activation_fn=None, scope='conv_output')
    reshape_out = tf.reduce_mean(out, [1, 2], keepdims=False)
    pred = tf.argmax(reshape_out, axis=1)
    return reshape_out, pred

@ayp-google
Copy link
Collaborator

Hi Vashisht,

OpRegularizerManager makes several passes over the ops to determine the grouping* of the channels. If you hit the ITERATION_LIMIT, that means the manager ran into some configuration of ops that it did not know how to handle. Looking at the model, I suspect the issue is with the reduce_mean and argmax at the end. There are a couple options to try:

  1. Tell MorphNet to start analysis at "out" instead of reshape_out. You don't need to regularize the reduce_mean and argmax anyway, so there is no harm in skipping them.
  2. Add a OpHandler for reduce_mean and argmax. I think output_non_passthrough_op_handler should work in this case. This is the case for ops where the input channel count and output channel count do not directly map (e.g. convolutions set num_outputs explicitly). By default, the manager tries to map the input channels to the output channels (e.g. Identity op) and fails is the number of channels don't match.

Hopefully that helps.

  • In MorphNet, the concept of grouping refers to the fact that some channel counts are constrained by the network architecture. For example, consider out = conv3 + concat(conv1, conv2). If conv1 has 3 output channels and conv2 has 5 output channels, then conv3 must have 8 output channels in order for sizes to be consistent. If MorphNet finds out that channel_4 of conv3 could be removed, this is contingent on also removing channel_1 of conv2 to keep sizes consistent. Conceptually, this isn't so complicated, but mechanically it can be difficult to analyze the grouping for complex networks. This is why the manager makes several passes over the ops.

@wing02
Copy link

wing02 commented May 6, 2019

I use it in a bert model, and got the same error.

  File "bert/run_pretraining.py", line 202, in model_fn
    [masked_lm_logits.op, next_sentence_logits.op], threshold=1e-3)
  File "morphnet/common/morph_net/network_regularizers/flop_regularizer.py", line 153, in __init__
    regularizer_blacklist=regularizer_blacklist)
  File "model-compression/morphnet/common/morph_net/framework/op_regularizer_manager.py", line 132, in __init__
    ['%s (%s)' % (o.name, o.type) for o in self._op_deque])
RuntimeError: OpRegularizerManager could not handle ops: ['cls/predictions/transform/LayerNorm/moments/variance (Mean)', 'cls/predictions/transform/LayerNorm/batchnorm/add (Add)', 'cls/predictions/transform/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'cls/predictions/transform/LayerNorm/batchnorm/mul (Mul)', 'cls/predictions/transform/LayerNorm/batchnorm/mul_1 (Mul)', 'GatherV2 (GatherV2)', 'bert/embeddings/LayerNorm/moments/mean (Mean)', 'bert/embeddings/LayerNorm/moments/StopGradient (StopGradient)', 'bert/embeddings/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/embeddings/LayerNorm/batchnorm/add_1 (Add)', 'bert/embeddings/LayerNorm/batchnorm/sub (Sub)', 'bert/embeddings/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/embeddings/LayerNorm/moments/variance (Mean)', 'bert/embeddings/LayerNorm/batchnorm/add (Add)', 'bert/embeddings/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/embeddings/LayerNorm/batchnorm/mul (Mul)', 'bert/embeddings/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/embeddings/one_hot (OneHot)', 'bert/encoder/layer_0/attention/self/MatMul (BatchMatMul)', 'bert/encoder/layer_0/attention/self/MatMul_1 (BatchMatMul)', 'bert/encoder/layer_0/attention/self/Softmax (Softmax)', 'bert/encoder/layer_0/attention/self/add (Add)', 'bert/encoder/layer_0/attention/self/Mul (Mul)', 'bert/encoder/layer_0/output/LayerNorm/moments/mean (Mean)', 'bert/encoder/layer_0/output/LayerNorm/moments/StopGradient (StopGradient)', 'bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/add_1 (Add)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/sub (Sub)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/encoder/layer_0/output/LayerNorm/moments/variance (Mean)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/add (Add)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul (Mul)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/encoder/layer_0/output/add (Add)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/mean (Mean)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/StopGradient (StopGradient)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/variance (Mean)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add (Add)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/sub (Sub)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add_1 (Add)', 'cls/predictions/transform/LayerNorm/moments/mean (Mean)', 'cls/predictions/transform/LayerNorm/moments/StopGradient (StopGradient)', 'cls/predictions/transform/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'cls/predictions/transform/LayerNorm/batchnorm/add_1 (Add)', 'cls/predictions/transform/LayerNorm/batchnorm/sub (Sub)', 'cls/predictions/transform/LayerNorm/batchnorm/mul_2 (Mul)']

Anyone known how to fix it?

@eaoibng
Copy link

eaoibng commented May 14, 2019

I use it in a bert model, and got the same error.

  File "bert/run_pretraining.py", line 202, in model_fn
    [masked_lm_logits.op, next_sentence_logits.op], threshold=1e-3)
  File "morphnet/common/morph_net/network_regularizers/flop_regularizer.py", line 153, in __init__
    regularizer_blacklist=regularizer_blacklist)
  File "model-compression/morphnet/common/morph_net/framework/op_regularizer_manager.py", line 132, in __init__
    ['%s (%s)' % (o.name, o.type) for o in self._op_deque])
RuntimeError: OpRegularizerManager could not handle ops: ['cls/predictions/transform/LayerNorm/moments/variance (Mean)', 'cls/predictions/transform/LayerNorm/batchnorm/add (Add)', 'cls/predictions/transform/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'cls/predictions/transform/LayerNorm/batchnorm/mul (Mul)', 'cls/predictions/transform/LayerNorm/batchnorm/mul_1 (Mul)', 'GatherV2 (GatherV2)', 'bert/embeddings/LayerNorm/moments/mean (Mean)', 'bert/embeddings/LayerNorm/moments/StopGradient (StopGradient)', 'bert/embeddings/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/embeddings/LayerNorm/batchnorm/add_1 (Add)', 'bert/embeddings/LayerNorm/batchnorm/sub (Sub)', 'bert/embeddings/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/embeddings/LayerNorm/moments/variance (Mean)', 'bert/embeddings/LayerNorm/batchnorm/add (Add)', 'bert/embeddings/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/embeddings/LayerNorm/batchnorm/mul (Mul)', 'bert/embeddings/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/embeddings/one_hot (OneHot)', 'bert/encoder/layer_0/attention/self/MatMul (BatchMatMul)', 'bert/encoder/layer_0/attention/self/MatMul_1 (BatchMatMul)', 'bert/encoder/layer_0/attention/self/Softmax (Softmax)', 'bert/encoder/layer_0/attention/self/add (Add)', 'bert/encoder/layer_0/attention/self/Mul (Mul)', 'bert/encoder/layer_0/output/LayerNorm/moments/mean (Mean)', 'bert/encoder/layer_0/output/LayerNorm/moments/StopGradient (StopGradient)', 'bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/add_1 (Add)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/sub (Sub)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/encoder/layer_0/output/LayerNorm/moments/variance (Mean)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/add (Add)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul (Mul)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/encoder/layer_0/output/add (Add)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/mean (Mean)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/StopGradient (StopGradient)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/variance (Mean)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add (Add)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/sub (Sub)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add_1 (Add)', 'cls/predictions/transform/LayerNorm/moments/mean (Mean)', 'cls/predictions/transform/LayerNorm/moments/StopGradient (StopGradient)', 'cls/predictions/transform/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'cls/predictions/transform/LayerNorm/batchnorm/add_1 (Add)', 'cls/predictions/transform/LayerNorm/batchnorm/sub (Sub)', 'cls/predictions/transform/LayerNorm/batchnorm/mul_2 (Mul)']

Anyone known how to fix it?

hello, I got the same error. Did you resolve it?

@wing02
Copy link

wing02 commented May 15, 2019

I use it in a bert model, and got the same error.

  File "bert/run_pretraining.py", line 202, in model_fn
    [masked_lm_logits.op, next_sentence_logits.op], threshold=1e-3)
  File "morphnet/common/morph_net/network_regularizers/flop_regularizer.py", line 153, in __init__
    regularizer_blacklist=regularizer_blacklist)
  File "model-compression/morphnet/common/morph_net/framework/op_regularizer_manager.py", line 132, in __init__
    ['%s (%s)' % (o.name, o.type) for o in self._op_deque])
RuntimeError: OpRegularizerManager could not handle ops: ['cls/predictions/transform/LayerNorm/moments/variance (Mean)', 'cls/predictions/transform/LayerNorm/batchnorm/add (Add)', 'cls/predictions/transform/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'cls/predictions/transform/LayerNorm/batchnorm/mul (Mul)', 'cls/predictions/transform/LayerNorm/batchnorm/mul_1 (Mul)', 'GatherV2 (GatherV2)', 'bert/embeddings/LayerNorm/moments/mean (Mean)', 'bert/embeddings/LayerNorm/moments/StopGradient (StopGradient)', 'bert/embeddings/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/embeddings/LayerNorm/batchnorm/add_1 (Add)', 'bert/embeddings/LayerNorm/batchnorm/sub (Sub)', 'bert/embeddings/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/embeddings/LayerNorm/moments/variance (Mean)', 'bert/embeddings/LayerNorm/batchnorm/add (Add)', 'bert/embeddings/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/embeddings/LayerNorm/batchnorm/mul (Mul)', 'bert/embeddings/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/embeddings/one_hot (OneHot)', 'bert/encoder/layer_0/attention/self/MatMul (BatchMatMul)', 'bert/encoder/layer_0/attention/self/MatMul_1 (BatchMatMul)', 'bert/encoder/layer_0/attention/self/Softmax (Softmax)', 'bert/encoder/layer_0/attention/self/add (Add)', 'bert/encoder/layer_0/attention/self/Mul (Mul)', 'bert/encoder/layer_0/output/LayerNorm/moments/mean (Mean)', 'bert/encoder/layer_0/output/LayerNorm/moments/StopGradient (StopGradient)', 'bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/add_1 (Add)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/sub (Sub)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/encoder/layer_0/output/LayerNorm/moments/variance (Mean)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/add (Add)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul (Mul)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/encoder/layer_0/output/add (Add)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/mean (Mean)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/StopGradient (StopGradient)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/variance (Mean)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add (Add)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/sub (Sub)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add_1 (Add)', 'cls/predictions/transform/LayerNorm/moments/mean (Mean)', 'cls/predictions/transform/LayerNorm/moments/StopGradient (StopGradient)', 'cls/predictions/transform/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'cls/predictions/transform/LayerNorm/batchnorm/add_1 (Add)', 'cls/predictions/transform/LayerNorm/batchnorm/sub (Sub)', 'cls/predictions/transform/LayerNorm/batchnorm/mul_2 (Mul)']

Anyone known how to fix it?

hello, I got the same error. Did you resolve it?

No. I just give up using morphnet on bert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants