Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid argument: ConcatOp : Dimensions of inputs should match #1140

Closed
albertz opened this issue Oct 13, 2022 · 4 comments · Fixed by #1144
Closed

Invalid argument: ConcatOp : Dimensions of inputs should match #1140

albertz opened this issue Oct 13, 2022 · 4 comments · Fixed by #1144

Comments

@albertz
Copy link
Member

albertz commented Oct 13, 2022

TensorFlow exception: 2 root error(s) found.
  (0) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [19,20,35,1024] vs. shape[2] = [19,8,35,1024]
 [[node output/rec/concat_am_att_lm_masked/concat_sources/concat (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:122) ]]
 [[objective/loss/loss/Sum/_599]]
  (1) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [19,20,35,1024] vs. shape[2] = [19,8,35,1024]
 [[node output/rec/concat_am_att_lm_masked/concat_sources/concat (defined at /setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/basic.py:122) ]]

Full log and partial config is here.
I don't really know what the problem is yet.

In the feed dict, the Neg_1 looks suspicious:

Feed dict:
  <tf.Tensor 'Neg_1:0' shape=(?,) dtype=int32>: shape (35,), dtype int32, min/max 1/7, ([1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 7 1 1 1 1 1 1])
  <tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(35)
  <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 40) dtype=float32>: shape (35, 112, 40), dtype float32, min/max -4.506656/4.565903, mean/stddev -5.71718e-10/0.9186281, Data{'data', [B,T|'time'[B],F|F'audio'(40)]}
  <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (35,), dtype int32, min/max 53/112, ([ 61  67  72  81  83  54  90  91  76  93  95  96  97  98  99 100  93 101
 101 102 103 104 104 105 105  53 107 104 108 109 110 111 111 112 112])
  <tf.Tensor 'extern_data/placeholders/orth_classes/orth_classes:0' shape=(?, ?) dtype=int32>: shape (35, 7), dtype int32, min/max 0/444, Data{'orth_classes', [B,T|'out-spatial'[B]], dtype='int32', sparse_dim=Dim{F'vocab'(1030)}, available_for_inference=False}
  <tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)

This could be some dim tag mess up.

In another run, I also saw this:

NativeOp assertion failed: Ndarray_DIMS(X)[0] == T, 8 == 20
NativeOp assertion failed: Ndarray_DIMS(Y)[0] == T, 8 == 20
NativeOp assertion failed: Ndarray_DIMS(C)[0] == T, 8 == 20
NativeOp assertion failed: Ndarray_DIMS(H)[0] == T, 8 == 20

And:

...
layer /output(rec-subnet-output)/'label_emit_log_prob': [T|'lstm0_pool:conv:s0'[B],'1+(out-spatial)'[B],B,F|F'la
bel_log_prob:feature-dense'(1030)] float32
layer /output(rec-subnet-output)/'output_log_prob': [T|'lstm0_pool:conv:s0'[B],'1+(out-spatial)'[B],B,F|F'(emit_
prob0:feature-dense)+(label_log_prob:feature-dense)'(1031)] float32
layer /output(rec-subnet-output)/'rna_alignment': [B,T|'lstm0_pool:conv:s0'[B]] int32 sparse_dim=Dim{F'rna_align
ment:sparse-dim'(1031)}
layer /output(rec-subnet-output)/'output_prob': [B] float32
Warning: assuming dim tags are same with different size placeholders: <tf.Tensor 'lstm0_pool/Neg_1:0' shape=(?,)
 dtype=int32> vs <tf.Tensor 'extern_data/placeholders/orth_classes/orth_classes_dim0_size:0' shape=(?,) dtype=in
t32>
Warning: assuming dim tags are same with different size placeholders: <tf.Tensor 'lstm0_pool/Neg_1:0' shape=(?,)
 dtype=int32> vs <tf.Tensor 'extern_data/placeholders/orth_classes/orth_classes_dim0_size:0' shape=(?,) dtype=in
t32>

Net construction log:

DEPRECATION WARNING: Missing "from" in layer definition: /source
This will be disallowed with behavior_version 1.
layer /'data': [B,T|'time'[B],F|F'audio'(40)] float32
layer /'source': [B,T|'time'[B],F|F'audio'(40)] float32
WARNING:tensorflow:From /u/zeyer/setups/combined/2021-05-31/tools/returnn/returnn/tf/network.py:2153: calling Ze
ros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future vers
ion.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer /'source0': [B,T|'time'[B],F'audio'(40),F|F'source0_split_dims1'(1)] float32]         32 segments ~  2min
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /'conv0': [B,T|'time'[B],F'audio'(40),F|F'conv0:channel'(32)] float32
layer /'conv0p': [B,T|'time'[B],'conv0p:conv:s1'(20),F|F'conv0:channel'(32)] float32
layer /'conv1': [B,T|'time'[B],'conv0p:conv:s1'(20),F|F'conv1:channel'(32)] float32
layer /'conv1p': [B,T|'time'[B],'conv1p:conv:s1'(10),F|F'conv1:channel'(32)] float32
layer /'conv_merged': [B,T|'time'[B],F|F'conv1p:conv:s1*conv1:channel'(320)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /'lstm0_fw': [T|'time'[B],B,F|F'lstm0_fw:feature'(512)] float32
WARNING:tensorflow:From /u/zeyer/setups/combined/2021-05-31/tools/returnn/returnn/tf/util/basic.py:1416: calling
 VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in 
a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer /'lstm0_bw': [T|'time'[B],B,F|F'lstm0_bw:feature'(512)] float32             ]         76 segments ~ 12min
layer /'lstm0_pool': [B,T|'lstm0_pool:conv:s0'[B],F|F'lstm0_fw:feature+lstm0_bw:feature'(1024)] float32
layer /'lstm1_fw': [T|'lstm0_pool:conv:s0'[B],B,F|F'lstm1_fw:feature'(512)] float32
layer /'lstm1_bw': [T|'lstm0_pool:conv:s0'[B],B,F|F'lstm1_bw:feature'(512)] float32
layer /'encoder0': [T|'lstm0_pool:conv:s0'[B],B,F|F'lstm1_fw:feature+lstm1_bw:feature'(1024)] float32
layer /'encoder': [T|'lstm0_pool:conv:s0'[B],B,F|F'lstm1_fw:feature+lstm1_bw:feature'(1024)] float32
layer /'enc_ctx0': [T|'lstm0_pool:conv:s0'[B],B,F|F'enc_ctx0:feature-dense'(200)] float32
layer /'enc_ctx_win': [B,T|'lstm0_pool:conv:s0'[B],'enc_ctx_win:window'(5),F|F'enc_ctx0:feature-dense'(200)] flo
at32
DEPRECATION WARNING: Do not specify axis or axes in a way that depends on the order of the axes.
This will be disallowed with behavior_version 7.
layer /'enc_val': [T|'lstm0_pool:conv:s0'[B],B,F|F'lstm1_fw:feature+lstm1_bw:feature'(1024)] float32
layer /'enc_val_win': [B,T|'lstm0_pool:conv:s0'[B],'enc_val_win:window'(5),F|F'lstm1_fw:feature+lstm1_bw:feature
'(1024)] float32
layer /'data:orth_classes': [B,T|'out-spatial'[B]] int32 sparse_dim=Dim{F'vocab'(1030)}
layer /'lm_input0': [B,T|'out-spatial'[B]] int32 sparse_dim=Dim{F'vocab'(1030)}
layer /'lm_input1': [B,T|'1+(out-spatial)'[?]] int32 sparse_dim=Dim{F'vocab'(1030)}
layer /'lm_input': [B,T|'1+(out-spatial)'[B]] int32 sparse_dim=Dim{F'vocab'(1030)}
layer /'output': [T|'lstm0_pool:conv:s0'[B],B] int32 sparse_dim=Dim{F'vocab'(1030)}
Rec layer 'output' (search False, train 'globals/train_flag:0') sub net:
  Input layers moved out of loop: (#: 0)
    None
  Output layers moved out of loop: (#: 24)
    output_prob
    output_emit
    output_is_not_blank
    output
    rna_alignment
    output_log_prob
    blank_log_prob
    label_emit_log_prob
    label_log_prob
    emit_log_prob
    emit_prob0
    readout
    readout_in
    att
    att_weights
    att_weights1
    att_weights0
    att_energy
    enc_ctx_win
    att_query
    am
    enc_val_win
    :i
    lm_masked
  Layers in loop: (#: 0)
    None
  Unused layers: (#: 5)
    const0
    const1
    lm
    out_str
    prev_out_non_blank
layer /output(rec-subnet-output)/'data:source': [T|'lstm0_pool:conv:s0'[B],B,F|F'lstm1_fw:feature+lstm1_bw:featu
re'(1024)] float32
layer /output(rec-subnet-output)/'am': [T|'lstm0_pool:conv:s0'[B],B,F|F'lstm1_fw:feature+lstm1_bw:feature'(1024)
] float32
layer /output(rec-subnet-output)/':i': [T|'lstm0_pool:conv:s0'[B]] int32
layer /output(rec-subnet-output)/'enc_ctx_win': [B,T|'lstm0_pool:conv:s0'[B],'enc_ctx_win:window'(5),F|F'enc_ctx
0:feature-dense'(200)] float32
layer /output(rec-subnet-output)/'att_query': [T|'lstm0_pool:conv:s0'[B],B,F|F'att_query:feature-dense'(200)] fl
oat32
layer /output(rec-subnet-output)/'att_energy': [B,T|'lstm0_pool:conv:s0'[B],'enc_ctx_win:window'(5),F|'att_energ
y:dot:dummy-var2'(1)] float32
layer /output(rec-subnet-output)/'att_weights0': [B,T|'lstm0_pool:conv:s0'[B],F|'att_energy:dot:dummy-var2'(1),'
enc_ctx_win:window'(5)] float32
layer /output(rec-subnet-output)/'att_weights1': [B,T|'lstm0_pool:conv:s0'[B],F|'att_energy:dot:dummy-var2'(1),'
enc_ctx_win:window'(5)] float32
layer /output(rec-subnet-output)/'att_weights': [B,T|'lstm0_pool:conv:s0'[B],F|'(att_energy:dot:dummy-var2)*enc_
ctx_win:window'(5)] float32
layer /output(rec-subnet-output)/'enc_val_win': [B,T|'lstm0_pool:conv:s0'[B],'enc_val_win:window'(5),F|F'lstm1_f
w:feature+lstm1_bw:feature'(1024)] float32
layer /output(rec-subnet-output)/'att': [B,T|'lstm0_pool:conv:s0'[B],F|F'lstm1_fw:feature+lstm1_bw:feature'(1024
)] float32
layer /output(rec-subnet-output)/'lm_masked': [T|'1+(out-spatial)'[B],B,F|F'lstm0:feature'(1024)] float32
layer /output(rec-subnet-output)(extra._internal.masked(lm_masked))/lm_masked/'input_embed': [B,T|'1+(out-spatia
l)'[B],F|F'input_embed:feature-dense'(256)] float32
layer /output(rec-subnet-output)(extra._internal.masked(lm_masked))/lm_masked/'lstm0': [T|'1+(out-spatial)'[B],B
,F|F'lstm0:feature'(1024)] float32
layer /output(rec-subnet-output)(extra._internal.masked(lm_masked))/lm_masked/'output': [T|'1+(out-spatial)'[B],
B,F|F'lstm0:feature'(1024)] float32
layer /output(rec-subnet-output)/'readout_in': [T|'lstm0_pool:conv:s0'[B],'1+(out-spatial)'[B],B,F|F'feature:rea
dout_in_output'(1000)] float32
DEPRECATION WARNING: All inputs
 - Data{'am_output', [T|'lstm0_pool:conv:s0'[B],B,F|F'lstm1_fw:feature+lstm1_bw:feature'(1024)]}
 - Data{'att_output', [B,T|'lstm0_pool:conv:s0'[B],F|F'lstm1_fw:feature+lstm1_bw:feature'(1024)]}
 - Data{'lm_masked_output', [T|'1+(out-spatial)'[B],B,F|F'lstm0:feature'(1024)]}
require broadcasting to 
  Data{'am_output', [T|'lstm0_pool:conv:s0'[B],'1+(out-spatial)'[B],B,F|F'lstm1_fw:feature+lstm1_bw:feature'(102
4)]}.
This must be explicitly allowed, e.g. by specifying out_shape.
This will be disallowed with behavior_version 4.
layer /output(rec-subnet-output)/'readout': [T|'lstm0_pool:conv:s0'[B],'1+(out-spatial)'[B],B,F|F'feature:readou
t_in_output//2'(500)] float32
layer /output(rec-subnet-output)/'emit_prob0': [T|'lstm0_pool:conv:s0'[B],'1+(out-spatial)'[B],B,F|F'emit_prob0:
feature-dense'(1)] float32
layer /output(rec-subnet-output)/'data:orth_classes': [B,T|'out-spatial'[B]] int32 sparse_dim=Dim{F'vocab'(1030)
}
layer /output(rec-subnet-output)/'output': [B,T|'out-spatial'[B]] int32 sparse_dim=Dim{F'vocab'(1030)}
layer /output(rec-subnet-output)/'output_is_not_blank': [B,T|'out-spatial'[B]] bool
layer /output(rec-subnet-output)/'output_emit': [B,T|'out-spatial'[B]] bool
layer /output(rec-subnet-output)/'blank_log_prob': [T|'lstm0_pool:conv:s0'[B],'1+(out-spatial)'[B],B,F|F'emit_pr
ob0:feature-dense'(1)] float32
layer /output(rec-subnet-output)/'label_log_prob': [T|'lstm0_pool:conv:s0'[B],'1+(out-spatial)'[B],B,F|F'label_l
og_prob:feature-dense'(1030)] float32
layer /output(rec-subnet-output)/'emit_log_prob': [T|'lstm0_pool:conv:s0'[B],'1+(out-spatial)'[B],B,F|F'emit_pro
b0:feature-dense'(1)] float32
layer /output(rec-subnet-output)/'label_emit_log_prob': [T|'lstm0_pool:conv:s0'[B],'1+(out-spatial)'[B],B,F|F'la
bel_log_prob:feature-dense'(1030)] float32
layer /output(rec-subnet-output)/'output_log_prob': [T|'lstm0_pool:conv:s0'[B],'1+(out-spatial)'[B],B,F|F'(emit_
prob0:feature-dense)+(label_log_prob:feature-dense)'(1031)] float32
layer /output(rec-subnet-output)/'rna_alignment': [B,T|'lstm0_pool:conv:s0'[B]] int32 sparse_dim=Dim{F'rna_align
ment:sparse-dim'(1031)}
layer /output(rec-subnet-output)/'output_prob': [B] float32
Exception creating layer /'output' of class RecLayer with opts:
{'_name': 'output',
 '_network': <TFNetwork '' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'axis': Dim{'lstm0_pool:conv:s0'[B]},
 'back_prop': True,
 'include_eos': True,
 'n_out': <class 'returnn.util.basic.NotSpecified'>,
 'name': 'output',
 'network': <TFNetwork '' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
 'output': Data{'output_output', [T|'lstm0_pool:conv:s0'[B],B], dtype='int32', sparse_dim=Dim{F'vocab'(1030)}},
 'sources': [<CopyLayer 'encoder' out_type=Data{[T|'lstm0_pool:conv:s0'[B],B,F|F'lstm1_fw:feature+lstm1_bw:featu
re'(1024)]}>],
 'unit': <_SubnetworkRecCell '/output(rec-subnet)'>}
@albertz
Copy link
Member Author

albertz commented Oct 13, 2022

This last warning looks very suspicious:

Warning: assuming dim tags are same with different size placeholders: <tf.Tensor 'lstm0_pool/Neg_1:0' shape=(?,) dtype=int32> vs <tf.Tensor 'extern_data/placeholders/orth_classes/orth_classes_dim0_size:0' shape=(?,) dtype=int32>

If it mixes up the dims here, marks them incorrectly as same, this could explain such an error.

@albertz
Copy link
Member Author

albertz commented Oct 13, 2022

I temporarily made that dim tag warning an exception (see also #1141), and this is the stacktrace:

...
  File "/u/zeyer/setups/combined/2021-05-31/tools/returnn/returnn/tf/network.py", line 1068, in TFNetwork._creat
e_layer
    line: layer = layer_class(**layer_desc)
    locals:
      layer = <not found>
      layer_class = <local> <class 'returnn.tf.layers.rec.RecLayer'>
      layer_desc = <local> {'include_eos': True, 'back_prop': True, '_network': <TFNetwork '' train=<tf.Tensor '
globals/train_flag:0' shape=() dtype=bool>>, '_name': 'output', 'n_out': <class 'returnn.util.basic.NotSpecified
'>, 'sources': [<CopyLayer 'encoder' out_type=Data{[T|'lstm0_pool:conv:s0'[B],B,F|F'lstm1_fw:feature..., len = 1
1
  File "/u/zeyer/setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/rec.py", line 259, in RecLayer.__ini
t__
    line: y = self._get_output_subnet_unit(self.cell)
    locals:
      y = <not found>
      self = <local> <RecLayer 'output' out_type=Data{[T|'lstm0_pool:conv:s0'[B],B], dtype='int32', sparse_dim=D
im{F'vocab'(1030)}}>
      self._get_output_subnet_unit = <local> <bound method RecLayer._get_output_subnet_unit of <RecLayer 'output
' out_type=Data{[T|'lstm0_pool:conv:s0'[B],B], dtype='int32', sparse_dim=Dim{F'vocab'(1030)}}>>
      self.cell = <local> <_SubnetworkRecCell '/output(rec-subnet)'>
  File "/u/zeyer/setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/rec.py", line 1073, in RecLayer._get
_output_subnet_unit
    line: output = cell.get_output()
    locals:
      output = <not found>
      cell = <local> <_SubnetworkRecCell '/output(rec-subnet)'>
      cell.get_output = <local> <bound method _SubnetworkRecCell.get_output of <_SubnetworkRecCell '/output(rec-
subnet)'>>
  File "/u/zeyer/setups/combined/2021-05-31/tools/returnn/returnn/tf/layers/rec.py", line 3079, in _SubnetworkRe
cCell.get_output
    line: self.time_dim_tag.declare_same_as(output_data.get_time_dim_tag())
    locals:
      self = <local> <_SubnetworkRecCell '/output(rec-subnet)'>
      self.time_dim_tag = <local> Dim{'lstm0_pool:conv:s0'[B]}
      self.time_dim_tag.declare_same_as = <local> <bound method Dim.declare_same_as of Dim{'lstm0_pool:conv:s0'[
B]}>
      output_data = <local> Data{'orth_classes', [T|'out-spatial'[B],B], dtype='int32', sparse_dim=Dim{F'vocab'(
1030)}}
      output_data.get_time_dim_tag = <local> <bound method Data.get_time_dim_tag of Data{'orth_classes', [T|'out
-spatial'[B],B], dtype='int32', sparse_dim=Dim{F'vocab'(1030)}}>
  File "/u/zeyer/setups/combined/2021-05-31/tools/returnn/returnn/tf/util/data.py", line 979, in Dim.declare_sam
e_as
    line: raise Exception(
            "Warning: assuming dim tags are same with different size placeholders: %r vs %r" % (
              self.dyn_size, other_same_base.dyn_size))
...
Exception: Warning: assuming dim tags are same with different size placeholders: <tf.Tensor 'lstm0_pool/Neg_1:0'
 shape=(?,) dtype=int32> vs <tf.Tensor 'extern_data/placeholders/orth_classes/orth_classes_dim0_size:0' shape=(?
,) dtype=int32>

@albertz
Copy link
Member Author

albertz commented Oct 13, 2022

You see from that log the problem. Specifically this:

    line: self.time_dim_tag.declare_same_as(output_data.get_time_dim_tag())
    locals:
      self = <local> <_SubnetworkRecCell '/output(rec-subnet)'>
      self.time_dim_tag = <local> Dim{'lstm0_pool:conv:s0'[B]}
      output_data = <local> Data{'orth_classes', [T|'out-spatial'[B],B], dtype='int32', sparse_dim=Dim{F'vocab'(
1030)}}

self.time_dim_tag of the rec layer is correct. We loop over the encoder frames.
Not sure on output_data, this looks wrong, or the declare_same_as.

@albertz
Copy link
Member Author

albertz commented Oct 13, 2022

Hm this is tricky. We have a case where the original config clearly was incorrect, but due to bugs (or rather missing extra checks) it worked, because in the end all the broken parts were not used (but still the layers were constructed).

So, what should we do? Either be strict about it and make sure that existing configs keep working, or in this case break an existing. Or rather, it's already broken, so the question is more, fix RETURNN to make it work again, or not.

Specifically, the output sublayer in the rec layer is (via):

            # During training   : targetb = "target"  (RNA-loss)
            # During recognition: targetb = "targetb"
            'output': {
                'class': 'choice', 'target': targetb, 'beam_size': beam_size,
                'from': "output_log_prob", "input_type": "log_prob",
...

So in training, all layers are moved out, and this is the target seq. But the loop goes over the enc seq, so different seq len. So the output time dim does not match the time dim of the rec layer output, but actually it must match.

albertz added a commit that referenced this issue Oct 13, 2022
albertz added a commit that referenced this issue Oct 13, 2022
A check on matching time dim of RecLayer sub output layer
to the RecLayer time dim.

Fix #1140

This introduces a new behavior version 13 (#508).
albertz added a commit that referenced this issue Oct 13, 2022
A check on matching time dim of RecLayer sub output layer
to the RecLayer time dim.

Fix #1140

This introduces a new behavior version 13 (#508).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant