iteration_limit hit in OpRegularizerManager #35

VashishtMadhavan · 2019-04-26T07:25:35Z

I have a situation in which the ITERATION_LIMIT is even when len(self._all_ops) is only about 3000 in OpRegularizerManager. It seems that certain ops keep cycling back on the self._op_deque. Is there a specific reason this happens? More generally, why are the same ops processed multiple times, even when they have the same type and handler?

In a simple LeNet training example, I go through 34 iterations of assign_grouping when there are far fewer ops to process

Itr: 0 Op: base/conv1/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 1 Op: base/conv1/BatchNorm/gamma/read Type: Identity Handler: grouping_op_handler
Itr: 2 Op: base/conv1/BatchNorm/gamma Type: VariableV2 Handler: grouping_op_handler
Itr: 3 Op: base/conv1/Conv2D Type: Conv2D Handler: output_non_passthrough_op_handler
Itr: 4 Op: Placeholder Type: Placeholder Handler: grouping_op_handler
Itr: 5 Op: base/conv1/Relu Type: Relu Handler: grouping_op_handler
Itr: 6 Op: base/pool1/MaxPool Type: MaxPool Handler: grouping_op_handler
Itr: 7 Op: base/conv2/Conv2D Type: Conv2D Handler: output_non_passthrough_op_handler
Itr: 8 Op: base/conv2/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 9 Op: base/conv2/BatchNorm/gamma/read Type: Identity Handler: grouping_op_handler
Itr: 10 Op: base/conv2/BatchNorm/gamma Type: VariableV2 Handler: grouping_op_handler
Itr: 11 Op: base/conv2/Relu Type: Relu Handler: grouping_op_handler
Itr: 12 Op: base/pool2/MaxPool Type: MaxPool Handler: grouping_op_handler
Itr: 13 Op: base/conv3/Conv2D Type: Conv2D Handler: output_non_passthrough_op_handler
Itr: 14 Op: base/conv3/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 15 Op: base/conv3/BatchNorm/gamma/read Type: Identity Handler: grouping_op_handler
Itr: 16 Op: base/conv3/BatchNorm/gamma Type: VariableV2 Handler: grouping_op_handler
Itr: 17 Op: base/conv3/Relu Type: Relu Handler: grouping_op_handler
Itr: 18 Op: base/conv_output/Conv2D Type: Conv2D Handler: output_non_passthrough_op_handler
Itr: 19 Op: base/conv_output/BiasAdd Type: BiasAdd Handler: grouping_op_handler
Itr: 20 Op: base/conv_output/biases/read Type: Identity Handler: grouping_op_handler
Itr: 21 Op: base/conv_output/biases Type: VariableV2 Handler: grouping_op_handler
Itr: 22 Op: Mean Type: Mean Handler: grouping_op_handler
Itr: 23 Op: base/conv1/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 24 Op: base/conv1/BatchNorm/gamma/read Type: Identity Handler: grouping_op_handler
Itr: 25 Op: base/conv2/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 26 Op: base/conv2/BatchNorm/gamma/read Type: Identity Handler: grouping_op_handler
Itr: 27 Op: base/conv3/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 28 Op: base/conv3/BatchNorm/gamma/read Type: Identity Handler: grouping_op_handler
Itr: 29 Op: base/conv_output/BiasAdd Type: BiasAdd Handler: grouping_op_handler
Itr: 30 Op: base/conv_output/biases/read Type: Identity Handler: grouping_op_handler
Itr: 31 Op: base/conv1/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 32 Op: base/conv2/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 33 Op: base/conv3/BatchNorm/FusedBatchNorm Type: FusedBatchNorm Handler: batch_norm_source_op_handler
Itr: 34 Op: base/conv_output/BiasAdd Type: BiasAdd Handler: grouping_op_handler

The text was updated successfully, but these errors were encountered:

VashishtMadhavan · 2019-04-26T07:26:31Z

Also here is the model I am considering, for the example above

def fully_conv_model(x_ph, is_training, scope, channels=[32, 64, 64], reuse=False):
    norm_params = {'is_training': is_training, 'scale': True, 'center': False}
    # Network Definition
    with tf.variable_scope(scope, reuse=reuse):
        with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      normalizer_fn=slim.batch_norm,
                      normalizer_params=norm_params,
                      weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
                      weights_regularizer=slim.l2_regularizer(0.0005)):
            conv1 = slim.conv2d(x_ph, num_outputs=channels[0], kernel_size=3, scope='conv1')
            pool1 = slim.max_pool2d(conv1, kernel_size=2, scope='pool1')
            conv2 = slim.conv2d(pool1, num_outputs=channels[1], kernel_size=3, scope='conv2')
            pool2 = slim.max_pool2d(conv2, kernel_size=2, scope='pool2')
            conv3 = slim.conv2d(pool2, num_outputs=channels[2], kernel_size=3, scope='conv3')
            out = slim.conv2d(conv3, num_outputs=10, kernel_size=7, padding='VALID', normalizer_fn=None, normalizer_params=None,
                activation_fn=None, scope='conv_output')
    reshape_out = tf.reduce_mean(out, [1, 2], keepdims=False)
    pred = tf.argmax(reshape_out, axis=1)
    return reshape_out, pred

ayp-google · 2019-04-29T20:53:33Z

Hi Vashisht,

OpRegularizerManager makes several passes over the ops to determine the grouping* of the channels. If you hit the ITERATION_LIMIT, that means the manager ran into some configuration of ops that it did not know how to handle. Looking at the model, I suspect the issue is with the reduce_mean and argmax at the end. There are a couple options to try:

Tell MorphNet to start analysis at "out" instead of reshape_out. You don't need to regularize the reduce_mean and argmax anyway, so there is no harm in skipping them.
Add a OpHandler for reduce_mean and argmax. I think output_non_passthrough_op_handler should work in this case. This is the case for ops where the input channel count and output channel count do not directly map (e.g. convolutions set num_outputs explicitly). By default, the manager tries to map the input channels to the output channels (e.g. Identity op) and fails is the number of channels don't match.

Hopefully that helps.

In MorphNet, the concept of grouping refers to the fact that some channel counts are constrained by the network architecture. For example, consider out = conv3 + concat(conv1, conv2). If conv1 has 3 output channels and conv2 has 5 output channels, then conv3 must have 8 output channels in order for sizes to be consistent. If MorphNet finds out that channel_4 of conv3 could be removed, this is contingent on also removing channel_1 of conv2 to keep sizes consistent. Conceptually, this isn't so complicated, but mechanically it can be difficult to analyze the grouping for complex networks. This is why the manager makes several passes over the ops.

wing02 · 2019-05-06T07:18:03Z

I use it in a bert model, and got the same error.

  File "bert/run_pretraining.py", line 202, in model_fn
    [masked_lm_logits.op, next_sentence_logits.op], threshold=1e-3)
  File "morphnet/common/morph_net/network_regularizers/flop_regularizer.py", line 153, in __init__
    regularizer_blacklist=regularizer_blacklist)
  File "model-compression/morphnet/common/morph_net/framework/op_regularizer_manager.py", line 132, in __init__
    ['%s (%s)' % (o.name, o.type) for o in self._op_deque])
RuntimeError: OpRegularizerManager could not handle ops: ['cls/predictions/transform/LayerNorm/moments/variance (Mean)', 'cls/predictions/transform/LayerNorm/batchnorm/add (Add)', 'cls/predictions/transform/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'cls/predictions/transform/LayerNorm/batchnorm/mul (Mul)', 'cls/predictions/transform/LayerNorm/batchnorm/mul_1 (Mul)', 'GatherV2 (GatherV2)', 'bert/embeddings/LayerNorm/moments/mean (Mean)', 'bert/embeddings/LayerNorm/moments/StopGradient (StopGradient)', 'bert/embeddings/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/embeddings/LayerNorm/batchnorm/add_1 (Add)', 'bert/embeddings/LayerNorm/batchnorm/sub (Sub)', 'bert/embeddings/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/embeddings/LayerNorm/moments/variance (Mean)', 'bert/embeddings/LayerNorm/batchnorm/add (Add)', 'bert/embeddings/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/embeddings/LayerNorm/batchnorm/mul (Mul)', 'bert/embeddings/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/embeddings/one_hot (OneHot)', 'bert/encoder/layer_0/attention/self/MatMul (BatchMatMul)', 'bert/encoder/layer_0/attention/self/MatMul_1 (BatchMatMul)', 'bert/encoder/layer_0/attention/self/Softmax (Softmax)', 'bert/encoder/layer_0/attention/self/add (Add)', 'bert/encoder/layer_0/attention/self/Mul (Mul)', 'bert/encoder/layer_0/output/LayerNorm/moments/mean (Mean)', 'bert/encoder/layer_0/output/LayerNorm/moments/StopGradient (StopGradient)', 'bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/add_1 (Add)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/sub (Sub)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/encoder/layer_0/output/LayerNorm/moments/variance (Mean)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/add (Add)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul (Mul)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/encoder/layer_0/output/add (Add)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/mean (Mean)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/StopGradient (StopGradient)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/variance (Mean)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add (Add)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/sub (Sub)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add_1 (Add)', 'cls/predictions/transform/LayerNorm/moments/mean (Mean)', 'cls/predictions/transform/LayerNorm/moments/StopGradient (StopGradient)', 'cls/predictions/transform/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'cls/predictions/transform/LayerNorm/batchnorm/add_1 (Add)', 'cls/predictions/transform/LayerNorm/batchnorm/sub (Sub)', 'cls/predictions/transform/LayerNorm/batchnorm/mul_2 (Mul)']

Anyone known how to fix it?

eaoibng · 2019-05-14T02:57:06Z

I use it in a bert model, and got the same error.

  File "bert/run_pretraining.py", line 202, in model_fn
    [masked_lm_logits.op, next_sentence_logits.op], threshold=1e-3)
  File "morphnet/common/morph_net/network_regularizers/flop_regularizer.py", line 153, in __init__
    regularizer_blacklist=regularizer_blacklist)
  File "model-compression/morphnet/common/morph_net/framework/op_regularizer_manager.py", line 132, in __init__
    ['%s (%s)' % (o.name, o.type) for o in self._op_deque])
RuntimeError: OpRegularizerManager could not handle ops: ['cls/predictions/transform/LayerNorm/moments/variance (Mean)', 'cls/predictions/transform/LayerNorm/batchnorm/add (Add)', 'cls/predictions/transform/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'cls/predictions/transform/LayerNorm/batchnorm/mul (Mul)', 'cls/predictions/transform/LayerNorm/batchnorm/mul_1 (Mul)', 'GatherV2 (GatherV2)', 'bert/embeddings/LayerNorm/moments/mean (Mean)', 'bert/embeddings/LayerNorm/moments/StopGradient (StopGradient)', 'bert/embeddings/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/embeddings/LayerNorm/batchnorm/add_1 (Add)', 'bert/embeddings/LayerNorm/batchnorm/sub (Sub)', 'bert/embeddings/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/embeddings/LayerNorm/moments/variance (Mean)', 'bert/embeddings/LayerNorm/batchnorm/add (Add)', 'bert/embeddings/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/embeddings/LayerNorm/batchnorm/mul (Mul)', 'bert/embeddings/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/embeddings/one_hot (OneHot)', 'bert/encoder/layer_0/attention/self/MatMul (BatchMatMul)', 'bert/encoder/layer_0/attention/self/MatMul_1 (BatchMatMul)', 'bert/encoder/layer_0/attention/self/Softmax (Softmax)', 'bert/encoder/layer_0/attention/self/add (Add)', 'bert/encoder/layer_0/attention/self/Mul (Mul)', 'bert/encoder/layer_0/output/LayerNorm/moments/mean (Mean)', 'bert/encoder/layer_0/output/LayerNorm/moments/StopGradient (StopGradient)', 'bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/add_1 (Add)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/sub (Sub)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/encoder/layer_0/output/LayerNorm/moments/variance (Mean)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/add (Add)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul (Mul)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/encoder/layer_0/output/add (Add)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/mean (Mean)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/StopGradient (StopGradient)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/variance (Mean)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add (Add)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/sub (Sub)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add_1 (Add)', 'cls/predictions/transform/LayerNorm/moments/mean (Mean)', 'cls/predictions/transform/LayerNorm/moments/StopGradient (StopGradient)', 'cls/predictions/transform/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'cls/predictions/transform/LayerNorm/batchnorm/add_1 (Add)', 'cls/predictions/transform/LayerNorm/batchnorm/sub (Sub)', 'cls/predictions/transform/LayerNorm/batchnorm/mul_2 (Mul)']

Anyone known how to fix it?

hello, I got the same error. Did you resolve it?

wing02 · 2019-05-15T08:14:40Z

I use it in a bert model, and got the same error.

  File "bert/run_pretraining.py", line 202, in model_fn
    [masked_lm_logits.op, next_sentence_logits.op], threshold=1e-3)
  File "morphnet/common/morph_net/network_regularizers/flop_regularizer.py", line 153, in __init__
    regularizer_blacklist=regularizer_blacklist)
  File "model-compression/morphnet/common/morph_net/framework/op_regularizer_manager.py", line 132, in __init__
    ['%s (%s)' % (o.name, o.type) for o in self._op_deque])
RuntimeError: OpRegularizerManager could not handle ops: ['cls/predictions/transform/LayerNorm/moments/variance (Mean)', 'cls/predictions/transform/LayerNorm/batchnorm/add (Add)', 'cls/predictions/transform/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'cls/predictions/transform/LayerNorm/batchnorm/mul (Mul)', 'cls/predictions/transform/LayerNorm/batchnorm/mul_1 (Mul)', 'GatherV2 (GatherV2)', 'bert/embeddings/LayerNorm/moments/mean (Mean)', 'bert/embeddings/LayerNorm/moments/StopGradient (StopGradient)', 'bert/embeddings/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/embeddings/LayerNorm/batchnorm/add_1 (Add)', 'bert/embeddings/LayerNorm/batchnorm/sub (Sub)', 'bert/embeddings/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/embeddings/LayerNorm/moments/variance (Mean)', 'bert/embeddings/LayerNorm/batchnorm/add (Add)', 'bert/embeddings/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/embeddings/LayerNorm/batchnorm/mul (Mul)', 'bert/embeddings/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/embeddings/one_hot (OneHot)', 'bert/encoder/layer_0/attention/self/MatMul (BatchMatMul)', 'bert/encoder/layer_0/attention/self/MatMul_1 (BatchMatMul)', 'bert/encoder/layer_0/attention/self/Softmax (Softmax)', 'bert/encoder/layer_0/attention/self/add (Add)', 'bert/encoder/layer_0/attention/self/Mul (Mul)', 'bert/encoder/layer_0/output/LayerNorm/moments/mean (Mean)', 'bert/encoder/layer_0/output/LayerNorm/moments/StopGradient (StopGradient)', 'bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/add_1 (Add)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/sub (Sub)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/encoder/layer_0/output/LayerNorm/moments/variance (Mean)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/add (Add)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul (Mul)', 'bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/encoder/layer_0/output/add (Add)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_1 (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/mean (Mean)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/StopGradient (StopGradient)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'bert/encoder/layer_0/attention/output/LayerNorm/moments/variance (Mean)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add (Add)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt (Rsqrt)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_2 (Mul)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/sub (Sub)', 'bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add_1 (Add)', 'cls/predictions/transform/LayerNorm/moments/mean (Mean)', 'cls/predictions/transform/LayerNorm/moments/StopGradient (StopGradient)', 'cls/predictions/transform/LayerNorm/moments/SquaredDifference (SquaredDifference)', 'cls/predictions/transform/LayerNorm/batchnorm/add_1 (Add)', 'cls/predictions/transform/LayerNorm/batchnorm/sub (Sub)', 'cls/predictions/transform/LayerNorm/batchnorm/mul_2 (Mul)']

Anyone known how to fix it?

hello, I got the same error. Did you resolve it?

No. I just give up using morphnet on bert.

VashishtMadhavan mentioned this issue Apr 26, 2019

Something wrong happened when I use flop_regularizer on keras model. #36

Closed

pkch mentioned this issue May 8, 2019

How could I use flop_regularizer on MobilnetV2 model(in tf.slim)? #53

Closed

eladeban closed this as completed May 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iteration_limit hit in OpRegularizerManager #35

iteration_limit hit in OpRegularizerManager #35

VashishtMadhavan commented Apr 26, 2019

VashishtMadhavan commented Apr 26, 2019

ayp-google commented Apr 29, 2019

wing02 commented May 6, 2019

eaoibng commented May 14, 2019

wing02 commented May 15, 2019

iteration_limit hit in OpRegularizerManager #35

iteration_limit hit in OpRegularizerManager #35

Comments

VashishtMadhavan commented Apr 26, 2019

VashishtMadhavan commented Apr 26, 2019

ayp-google commented Apr 29, 2019

wing02 commented May 6, 2019

eaoibng commented May 14, 2019

wing02 commented May 15, 2019