Hi all,
Some classmates and I are trying to train a fusion text generation model using the fairseq code, but on a new dataset that we have scraped and cleaned. We are struggling with how to train a usable pretrained checkpoint.
When I used the checkpoint (pretrained_checkpoint.pt) provided in the README, I got the following error:
RuntimeError: Error(s) in loading state_dict for FConvModelSelfAtt:
size mismatch for encoder.encoder.embed_tokens.weight: copying a param of torch.Size([3257, 256]) from checkpoint, where the shape is torch.Size([19025, 256]) in current model.
size mismatch for decoder.embed_tokens.weight: copying a param of torch.Size([51411, 256]) from checkpoint,
where the shape is torch.Size([104960, 256]) in current model.
size mismatch for decoder.fc3.weight: copying a param of torch.Size([51411, 256]) from checkpoint, where th
e shape is torch.Size([104960, 256]) in current model.
size mismatch for decoder.fc3.bias: copying a param of torch.Size([51411]) from checkpoint, where the shape
is torch.Size([104960]) in current model.
Here is the line of code that I ran, where "dummy_source" and "dummy_target" are our input and output languages.
$ python train.py data-bin/dummy -a fconv_self_att_wp --lr 0.25 \
--clip-norm 0.1 --max-tokens 1500 --lr-scheduler reduce_lr_on_plateau \
--decoder-attention True --encoder-attention False --criterion \
label_smoothed_cross_entropy --weight-decay .0000001 --label-smoothing 0 \
--source-lang dummy_source --target-lang dummy_target --gated-attention True \
--self-attention True --project-input True --pretrained True \
--pretrained-checkpoint data-bin/models/pretrained_checkpoint.pt
But when I attempted to use a checkpoint created by training on our new dataset (checkpoint_best.pt) I got:
Exception: Cannot load model parameters from checkpoint, please ensure that the architectures match
Here is the line of code I ran for that:
$ python train.py data-bin/dummy -a fconv_self_att_wp --lr 0.25 \
--clip-norm 0.1 --max-tokens 1500 --lr-scheduler reduce_lr_on_plateau \
--decoder-attention True --encoder-attention False --criterion \
label_smoothed_cross_entropy --weight-decay .0000001 --label-smoothing 0 \
--source-lang dummy_source --target-lang dummy_target --gated-attention True \
--self-attention True --project-input True --pretrained True \
--pretrained-checkpoint checkpoints/checkpoint_best.pt
How can we go about training a usable pretrained checkpoint on a new dataset?
Hi all,
Some classmates and I are trying to train a fusion text generation model using the fairseq code, but on a new dataset that we have scraped and cleaned. We are struggling with how to train a usable pretrained checkpoint.
When I used the checkpoint (pretrained_checkpoint.pt) provided in the README, I got the following error:
Here is the line of code that I ran, where "dummy_source" and "dummy_target" are our input and output languages.
But when I attempted to use a checkpoint created by training on our new dataset (checkpoint_best.pt) I got:
Here is the line of code I ran for that:
How can we go about training a usable pretrained checkpoint on a new dataset?