Encoder/decoder mismatch in joint speech to text example #3939

tberckmann · 2021-10-08T13:31:44Z

🐛 Bug

Note to reviewers: this is labeled as "needs triage" but I already wrote and tested a fix: see the bottom of this issue for a link to the branch.

Training step on joint speech to text example hits a python exception due to mismatched tensor sizes in matrix multiplication (since decoder embedding size doesn't match the encoder size)

To Reproduce

Perform data preprocessing as shown in https://github.com/pytorch/fairseq/blob/main/examples/speech_text_joint_to_text/docs/ende-mustc.md
Run the training under "Jointly trained model from scratch." One difference is that I did not use the parallel text data.

Error output is as follows:

Traceback (most recent call last):
.....
File "/home/berckmann/si2/fairseq/fairseq/models/transformer/transformer_decoder.py", line 216, in forward
x, extra = self.extract_features(
File "/home/berckmann/si2/fairseq/fairseq/models/transformer/transformer_decoder.py", line 238, in extract_features
return self.extract_features_scriptable(
File "/home/berckmann/si2/fairseq/fairseq/models/transformer/transformer_decoder.py", line 340, in extract_features_scriptable
x, layer_attn, _ = layer(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/berckmann/si2/fairseq/fairseq/modules/transformer_layer.py", line 388, in forward
x, attn = self.encoder_attn(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/berckmann/si2/fairseq/fairseq/modules/multihead_attention.py", line 216, in forward
k = self.k_proj(key)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 96, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 1847, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (666x256 and 512x256)

Code sample

This is the code that triggered the issue:

$python_exec fairseq/train.py ${MANIFEST_ROOT}
--save-dir ${save_dir}
--num-workers 4
--task speech_text_joint_to_text
--arch dualinputs2ttransformer_s
--user-dir examples/speech_text_joint_to_text
--max-epoch 100 --update-mix-data
--optimizer adam --lr-scheduler inverse_sqrt
--lr 0.001 --update-freq 8 --clip-norm 10.0
--criterion guided_label_smoothed_cross_entropy_with_accuracy
--label-smoothing 0.1 --max-tokens $max_token_cnt --max-tokens-text $max_token_cnt
--max-positions-text 400 --seed 2 --speech-encoder-layers 12
--text-encoder-layers 6 --encoder-shared-layers 6 --decoder-layers 6
--dropout 0.1 --warmup-updates 20000
--text-sample-ratio 0.25
--text-input-cost-ratio 0.5 --enc-grad-mult 2.0 --add-speech-eos
--log-format json --langpairs en-de --noise-token '"'"'▁NOISE'"'"'
--mask-text-ratio 0.0 --max-tokens-valid 20000 --ddp-backend no_c10d
--log-interval 100 --data-buffer-size 50 --config-yaml config.yaml
--keep-last-epochs 10 --valid-subset dev_st --train-subset train_st
--tensorboard-logdir logs/tensorb_log

Expected behavior

Training should complete

Environment

fairseq Version (e.g., 1.0 or main): git hash 72bb444
PyTorch Version (e.g., 1.0): 1.9.1+cu111
OS (e.g., Linux): Linux
How you installed fairseq (pip, source): source
Build command you used (if compiling from source): pip3 install --editable ./
Python version: 3.8
CUDA/cuDNN version: 11.1
GPU models and configuration: Various
Any other relevant information:

Additional context

Wrote code for the fix already, which fixed the problem for me locally:

https://github.com/tberckmann/fairseq/tree/joint_s2t_fixes

The text was updated successfully, but these errors were encountered:

yuntang · 2021-11-02T16:41:43Z

Do you use the same embedding dim for encoder and decoder?

tberckmann added bug needs triage labels Oct 8, 2021

tberckmann mentioned this issue Oct 8, 2021

Joint s2t fixes #3940

Closed

3 tasks

kahne added the speech label Oct 29, 2021

facebook-github-bot closed this as completed in e69a7c1 Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoder/decoder mismatch in joint speech to text example #3939

Encoder/decoder mismatch in joint speech to text example #3939

tberckmann commented Oct 8, 2021 •

edited

Loading

yuntang commented Nov 2, 2021

Encoder/decoder mismatch in joint speech to text example #3939

Encoder/decoder mismatch in joint speech to text example #3939

Comments

tberckmann commented Oct 8, 2021 • edited Loading

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context

yuntang commented Nov 2, 2021

tberckmann commented Oct 8, 2021 •

edited

Loading