Model become 3 times larger after finetune? #63

wangwei7175878 · 2018-11-06T07:51:20Z

A pretrained bert large model's ckpt file is about 1.3GB, after finetuning on downstream task, the saved ckpt file become 3.8GB. How did this happen?

artemisart · 2018-11-06T10:34:55Z

I have the same problem with BERT base which becomes ~1.3 GB.

jacobdevlin-google · 2018-11-06T19:24:16Z

The distributed checkpoints only include the actual model weights, but the checkpoints written during training include the Adam momentum and variance variables for each weight variable, which are not actually part of the model are needed to be able to pause and resume training in the middle. So the training checkpoints are 3x the size of the distributed checkpoint.

zhezhaoa · 2018-12-07T14:08:18Z

The distributed checkpoints only include the actual model weights, but the checkpoints written during training include the Adam momentum and variance variables for each weight variable, which are not actually part of the model are needed to be able to pause and resume training in the middle. So the training checkpoints are 3x the size of the distributed checkpoint.

Thank you for your advice. Could you tell me how to only save model weights (not include momentum and variance), just like the pretreated model you provide?

ymcdull · 2018-12-10T11:32:43Z

The distributed checkpoints only include the actual model weights, but the checkpoints written during training include the Adam momentum and variance variables for each weight variable, which are not actually part of the model are needed to be able to pause and resume training in the middle. So the training checkpoints are 3x the size of the distributed checkpoint.

Thank you for your advice. Could you tell me how to only save model weights (not include momentum and variance), just like the pretreated model you provide?

@zhezhaoa I have a solution here: #99
I guess there must be some better and tidier solutions, but at least this one works for me, and the size of the weight file drops from 1.3GB to 400MB.

jacobdevlin-google closed this as completed Nov 6, 2018

xwzhong mentioned this issue Nov 14, 2018

How to get distributed checkpoints to reduce the size of model only for prediction #99

Closed

manueltonneau mentioned this issue Mar 24, 2020

Explain the variables in the checkpoint #1019

Closed

antoniolanza1996 mentioned this issue Aug 14, 2020

Integrate ORQA and REALM for Open Domain Question Answering deepset-ai/haystack#312

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model become 3 times larger after finetune? #63

Model become 3 times larger after finetune? #63

wangwei7175878 commented Nov 6, 2018 •

edited

Loading

artemisart commented Nov 6, 2018

jacobdevlin-google commented Nov 6, 2018 •

edited

Loading

zhezhaoa commented Dec 7, 2018

ymcdull commented Dec 10, 2018

Model become 3 times larger after finetune? #63

Model become 3 times larger after finetune? #63

Comments

wangwei7175878 commented Nov 6, 2018 • edited Loading

artemisart commented Nov 6, 2018

jacobdevlin-google commented Nov 6, 2018 • edited Loading

zhezhaoa commented Dec 7, 2018

ymcdull commented Dec 10, 2018

wangwei7175878 commented Nov 6, 2018 •

edited

Loading

jacobdevlin-google commented Nov 6, 2018 •

edited

Loading