Arch type #1

Bachstelze · 2020-10-13T23:35:35Z

I can't load the pretrained 32-lang-pairs-RAS-ckp - model with the tagged fairseq version 0.9.0:

| model transformer_wmt_en_de_big, criterion LabelSmoothedCrossEntropyCriterion
| num. model params: 243313664 (num. trained: 243313664)
| training on 1 GPUs
| max tokens per GPU = 2048 and max sentences per GPU = None
Traceback (most recent call last):
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/trainer.py", line 194, in load_checkpoint
    self.get_model().load_state_dict(
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/models/fairseq_model.py", line 71, in load_state_dict
    return super().load_state_dict(new_state_dict, strict)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1044, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for TransformerModel:
	size mismatch for encoder.embed_positions.weight: copying a param with shape torch.Size([302, 1024]) from checkpoint, the shape in current model is torch.Size([258, 1024]).
	size mismatch for decoder.embed_positions.weight: copying a param with shape torch.Size([302, 1024]) from checkpoint, the shape in current model is torch.Size([258, 1024]).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/kalle/Sprachdaten/mRASP/train_environment/bin/fairseq-train", line 8, in <module>
    sys.exit(cli_main())
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq_cli/train.py", line 333, in cli_main
    main(args)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq_cli/train.py", line 70, in main
    extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 115, in load_checkpoint
    extra_state = trainer.load_checkpoint(
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/trainer.py", line 202, in load_checkpoint
    raise Exception(
Exception: Cannot load model parameters from checkpoint /media/kalle/Sprachdaten/mRASP/checkpoint_best.pt; please ensure that the architectures match.

The model states itself as transformer_vaswani_wmt_en_de_big. Have there been changes to the architecture? Isn't the architecture compatible due to facebookresearch/fairseq#2664 ?

Thanks for your promissing work!

The text was updated successfully, but these errors were encountered:

PANXiao1994 · 2020-10-14T02:38:46Z

Hi,

Note the above log:

size mismatch for encoder.embed_positions.weight: copying a param with shape torch.Size([302, 1024]) from checkpoint, the shape in current model is torch.Size([258, 1024]).
size mismatch for decoder.embed_positions.weight: copying a param with shape torch.Size([302, 1024]) from checkpoint, the shape in current model is torch.Size([258, 1024]).

Which means you should set --max-source-positions 300 --max-target-positions 300 during training

Bachstelze · 2020-10-23T18:23:10Z

Do i have to set them also during generation? Because i get this error after fine tuning:

Traceback (most recent call last):                                                                                                        
  File "/media/kalle/Sprachdaten/mRASP/train_environment/bin/fairseq-generate", line 8, in <module>
    sys.exit(cli_main())
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq_cli/generate.py", line 199, in cli_main
    main(args)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq_cli/generate.py", line 104, in main
    hypos = task.inference_step(generator, models, sample, prefix_tokens)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/tasks/fairseq_task.py", line 265, in inference_step
    return generator.generate(models, sample, prefix_tokens=prefix_tokens)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/sequence_generator.py", line 113, in generate
    return self._generate(model, sample, **kwargs)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/sequence_generator.py", line 376, in _generate
    cand_scores, cand_indices, cand_beams = self.search.step(
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/search.py", line 81, in step
    torch.div(self.indices_buf, vocab_size, out=self.beams_buf)
RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
``

linzehui · 2020-10-24T04:14:33Z

It seems that it is not the model loading problem. From the log you post, it might due to the Python3.8 issue.
You may check whether there is empty line in the source you generate from. Or you may use Python<3.8 to check whether the problem is raised by Python3.8 .

linzehui closed this as completed Nov 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arch type #1

Arch type #1

Bachstelze commented Oct 13, 2020

PANXiao1994 commented Oct 14, 2020

Bachstelze commented Oct 23, 2020

linzehui commented Oct 24, 2020

Arch type #1

Arch type #1

Comments

Bachstelze commented Oct 13, 2020

PANXiao1994 commented Oct 14, 2020

Bachstelze commented Oct 23, 2020

linzehui commented Oct 24, 2020