Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arch type #1

Closed
Bachstelze opened this issue Oct 13, 2020 · 3 comments
Closed

Arch type #1

Bachstelze opened this issue Oct 13, 2020 · 3 comments

Comments

@Bachstelze
Copy link

I can't load the pretrained 32-lang-pairs-RAS-ckp - model with the tagged fairseq version 0.9.0:

| model transformer_wmt_en_de_big, criterion LabelSmoothedCrossEntropyCriterion
| num. model params: 243313664 (num. trained: 243313664)
| training on 1 GPUs
| max tokens per GPU = 2048 and max sentences per GPU = None
Traceback (most recent call last):
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/trainer.py", line 194, in load_checkpoint
    self.get_model().load_state_dict(
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/models/fairseq_model.py", line 71, in load_state_dict
    return super().load_state_dict(new_state_dict, strict)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1044, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for TransformerModel:
	size mismatch for encoder.embed_positions.weight: copying a param with shape torch.Size([302, 1024]) from checkpoint, the shape in current model is torch.Size([258, 1024]).
	size mismatch for decoder.embed_positions.weight: copying a param with shape torch.Size([302, 1024]) from checkpoint, the shape in current model is torch.Size([258, 1024]).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/kalle/Sprachdaten/mRASP/train_environment/bin/fairseq-train", line 8, in <module>
    sys.exit(cli_main())
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq_cli/train.py", line 333, in cli_main
    main(args)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq_cli/train.py", line 70, in main
    extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 115, in load_checkpoint
    extra_state = trainer.load_checkpoint(
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/trainer.py", line 202, in load_checkpoint
    raise Exception(
Exception: Cannot load model parameters from checkpoint /media/kalle/Sprachdaten/mRASP/checkpoint_best.pt; please ensure that the architectures match.

The model states itself as transformer_vaswani_wmt_en_de_big. Have there been changes to the architecture? Isn't the architecture compatible due to facebookresearch/fairseq#2664 ?

Thanks for your promissing work!

@PANXiao1994
Copy link
Collaborator

Hi,

Note the above log:

size mismatch for encoder.embed_positions.weight: copying a param with shape torch.Size([302, 1024]) from checkpoint, the shape in current model is torch.Size([258, 1024]).
size mismatch for decoder.embed_positions.weight: copying a param with shape torch.Size([302, 1024]) from checkpoint, the shape in current model is torch.Size([258, 1024]).

Which means you should set --max-source-positions 300 --max-target-positions 300 during training

@Bachstelze
Copy link
Author

Do i have to set them also during generation? Because i get this error after fine tuning:

Traceback (most recent call last):                                                                                                        
  File "/media/kalle/Sprachdaten/mRASP/train_environment/bin/fairseq-generate", line 8, in <module>
    sys.exit(cli_main())
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq_cli/generate.py", line 199, in cli_main
    main(args)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq_cli/generate.py", line 104, in main
    hypos = task.inference_step(generator, models, sample, prefix_tokens)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/tasks/fairseq_task.py", line 265, in inference_step
    return generator.generate(models, sample, prefix_tokens=prefix_tokens)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/sequence_generator.py", line 113, in generate
    return self._generate(model, sample, **kwargs)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/sequence_generator.py", line 376, in _generate
    cand_scores, cand_indices, cand_beams = self.search.step(
  File "/media/kalle/Sprachdaten/mRASP/train_environment/lib/python3.8/site-packages/fairseq/search.py", line 81, in step
    torch.div(self.indices_buf, vocab_size, out=self.beams_buf)
RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
``

@linzehui
Copy link
Owner

It seems that it is not the model loading problem. From the log you post, it might due to the Python3.8 issue.
You may check whether there is empty line in the source you generate from. Or you may use Python<3.8 to check whether the problem is raised by Python3.8 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants