Skip to content
This repository was archived by the owner on Mar 20, 2026. It is now read-only.
This repository was archived by the owner on Mar 20, 2026. It is now read-only.

Train matching pretrained checkpoint for new dataset #367

@edbltn

Description

@edbltn

Hi all,

Some classmates and I are trying to train a fusion text generation model using the fairseq code, but on a new dataset that we have scraped and cleaned. We are struggling with how to train a usable pretrained checkpoint.

When I used the checkpoint (pretrained_checkpoint.pt) provided in the README, I got the following error:

RuntimeError: Error(s) in loading state_dict for FConvModelSelfAtt:
        size mismatch for encoder.encoder.embed_tokens.weight: copying a param of torch.Size([3257, 256]) from checkpoint, where the shape is torch.Size([19025, 256]) in current model.
        size mismatch for decoder.embed_tokens.weight: copying a param of torch.Size([51411, 256]) from checkpoint,
 where the shape is torch.Size([104960, 256]) in current model.
        size mismatch for decoder.fc3.weight: copying a param of torch.Size([51411, 256]) from checkpoint, where th
e shape is torch.Size([104960, 256]) in current model.
        size mismatch for decoder.fc3.bias: copying a param of torch.Size([51411]) from checkpoint, where the shape
 is torch.Size([104960]) in current model.

Here is the line of code that I ran, where "dummy_source" and "dummy_target" are our input and output languages.

    $ python train.py data-bin/dummy -a fconv_self_att_wp --lr 0.25 \
        --clip-norm 0.1 --max-tokens 1500 --lr-scheduler reduce_lr_on_plateau \
        --decoder-attention True --encoder-attention False --criterion \
        label_smoothed_cross_entropy --weight-decay .0000001 --label-smoothing 0 \
        --source-lang dummy_source --target-lang dummy_target --gated-attention True \
        --self-attention True --project-input True --pretrained True \
        --pretrained-checkpoint data-bin/models/pretrained_checkpoint.pt

But when I attempted to use a checkpoint created by training on our new dataset (checkpoint_best.pt) I got:

Exception: Cannot load model parameters from checkpoint, please ensure that the architectures match

Here is the line of code I ran for that:

    $ python train.py data-bin/dummy -a fconv_self_att_wp --lr 0.25 \
        --clip-norm 0.1 --max-tokens 1500 --lr-scheduler reduce_lr_on_plateau \
        --decoder-attention True --encoder-attention False --criterion \
        label_smoothed_cross_entropy --weight-decay .0000001 --label-smoothing 0 \
        --source-lang dummy_source --target-lang dummy_target --gated-attention True \
        --self-attention True --project-input True --pretrained True \
        --pretrained-checkpoint checkpoints/checkpoint_best.pt

How can we go about training a usable pretrained checkpoint on a new dataset?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions