Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoder/decoder mismatch in joint speech to text example #3939

Closed
tberckmann opened this issue Oct 8, 2021 · 1 comment
Closed

Encoder/decoder mismatch in joint speech to text example #3939

tberckmann opened this issue Oct 8, 2021 · 1 comment

Comments

@tberckmann
Copy link

tberckmann commented Oct 8, 2021

🐛 Bug

Note to reviewers: this is labeled as "needs triage" but I already wrote and tested a fix: see the bottom of this issue for a link to the branch.

Training step on joint speech to text example hits a python exception due to mismatched tensor sizes in matrix multiplication (since decoder embedding size doesn't match the encoder size)

To Reproduce

  1. Perform data preprocessing as shown in https://github.com/pytorch/fairseq/blob/main/examples/speech_text_joint_to_text/docs/ende-mustc.md
  2. Run the training under "Jointly trained model from scratch." One difference is that I did not use the parallel text data.

Error output is as follows:

Traceback (most recent call last):
.....
File "/home/berckmann/si2/fairseq/fairseq/models/transformer/transformer_decoder.py", line 216, in forward
x, extra = self.extract_features(
File "/home/berckmann/si2/fairseq/fairseq/models/transformer/transformer_decoder.py", line 238, in extract_features
return self.extract_features_scriptable(
File "/home/berckmann/si2/fairseq/fairseq/models/transformer/transformer_decoder.py", line 340, in extract_features_scriptable
x, layer_attn, _ = layer(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/berckmann/si2/fairseq/fairseq/modules/transformer_layer.py", line 388, in forward
x, attn = self.encoder_attn(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/berckmann/si2/fairseq/fairseq/modules/multihead_attention.py", line 216, in forward
k = self.k_proj(key)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 96, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 1847, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (666x256 and 512x256)

Code sample

This is the code that triggered the issue:

$python_exec fairseq/train.py ${MANIFEST_ROOT}
--save-dir ${save_dir}
--num-workers 4
--task speech_text_joint_to_text
--arch dualinputs2ttransformer_s
--user-dir examples/speech_text_joint_to_text
--max-epoch 100 --update-mix-data
--optimizer adam --lr-scheduler inverse_sqrt
--lr 0.001 --update-freq 8 --clip-norm 10.0
--criterion guided_label_smoothed_cross_entropy_with_accuracy
--label-smoothing 0.1 --max-tokens $max_token_cnt --max-tokens-text $max_token_cnt
--max-positions-text 400 --seed 2 --speech-encoder-layers 12
--text-encoder-layers 6 --encoder-shared-layers 6 --decoder-layers 6
--dropout 0.1 --warmup-updates 20000
--text-sample-ratio 0.25
--text-input-cost-ratio 0.5 --enc-grad-mult 2.0 --add-speech-eos
--log-format json --langpairs en-de --noise-token '"'"'▁NOISE'"'"'
--mask-text-ratio 0.0 --max-tokens-valid 20000 --ddp-backend no_c10d
--log-interval 100 --data-buffer-size 50 --config-yaml config.yaml
--keep-last-epochs 10 --valid-subset dev_st --train-subset train_st
--tensorboard-logdir logs/tensorb_log

Expected behavior

Training should complete

Environment

  • fairseq Version (e.g., 1.0 or main): git hash 72bb444
  • PyTorch Version (e.g., 1.0): 1.9.1+cu111
  • OS (e.g., Linux): Linux
  • How you installed fairseq (pip, source): source
  • Build command you used (if compiling from source): pip3 install --editable ./
  • Python version: 3.8
  • CUDA/cuDNN version: 11.1
  • GPU models and configuration: Various
  • Any other relevant information:

Additional context

Wrote code for the fix already, which fixed the problem for me locally:

https://github.com/tberckmann/fairseq/tree/joint_s2t_fixes

@yuntang
Copy link
Contributor

yuntang commented Nov 2, 2021

Do you use the same embedding dim for encoder and decoder?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants