Train matching pretrained checkpoint for new dataset

Hi all,

Some classmates and I are trying to train a fusion text generation model using the fairseq code, but on a new dataset that we have scraped and cleaned. We are struggling with how to train a usable pretrained checkpoint.

When I used the checkpoint (pretrained_checkpoint.pt) provided in the README, I got the following error:
```
RuntimeError: Error(s) in loading state_dict for FConvModelSelfAtt:
        size mismatch for encoder.encoder.embed_tokens.weight: copying a param of torch.Size([3257, 256]) from checkpoint, where the shape is torch.Size([19025, 256]) in current model.
        size mismatch for decoder.embed_tokens.weight: copying a param of torch.Size([51411, 256]) from checkpoint,
 where the shape is torch.Size([104960, 256]) in current model.
        size mismatch for decoder.fc3.weight: copying a param of torch.Size([51411, 256]) from checkpoint, where th
e shape is torch.Size([104960, 256]) in current model.
        size mismatch for decoder.fc3.bias: copying a param of torch.Size([51411]) from checkpoint, where the shape
 is torch.Size([104960]) in current model.
```

Here is the line of code that I ran, where "dummy_source" and "dummy_target" are our input and output languages.

```
    $ python train.py data-bin/dummy -a fconv_self_att_wp --lr 0.25 \
        --clip-norm 0.1 --max-tokens 1500 --lr-scheduler reduce_lr_on_plateau \
        --decoder-attention True --encoder-attention False --criterion \
        label_smoothed_cross_entropy --weight-decay .0000001 --label-smoothing 0 \
        --source-lang dummy_source --target-lang dummy_target --gated-attention True \
        --self-attention True --project-input True --pretrained True \
        --pretrained-checkpoint data-bin/models/pretrained_checkpoint.pt
```

But when I attempted to use a checkpoint created by training on our new dataset (checkpoint_best.pt) I got:
```
Exception: Cannot load model parameters from checkpoint, please ensure that the architectures match
```

Here is the line of code I ran for that:
```
    $ python train.py data-bin/dummy -a fconv_self_att_wp --lr 0.25 \
        --clip-norm 0.1 --max-tokens 1500 --lr-scheduler reduce_lr_on_plateau \
        --decoder-attention True --encoder-attention False --criterion \
        label_smoothed_cross_entropy --weight-decay .0000001 --label-smoothing 0 \
        --source-lang dummy_source --target-lang dummy_target --gated-attention True \
        --self-attention True --project-input True --pretrained True \
        --pretrained-checkpoint checkpoints/checkpoint_best.pt
```

How can we go about training a usable pretrained checkpoint on a new dataset?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train matching pretrained checkpoint for new dataset #367

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Train matching pretrained checkpoint for new dataset #367

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions