Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model become 3 times larger after finetune? #63

Closed
wangwei7175878 opened this issue Nov 6, 2018 · 4 comments
Closed

Model become 3 times larger after finetune? #63

wangwei7175878 opened this issue Nov 6, 2018 · 4 comments

Comments

@wangwei7175878
Copy link

wangwei7175878 commented Nov 6, 2018

A pretrained bert large model's ckpt file is about 1.3GB, after finetuning on downstream task, the saved ckpt file become 3.8GB. How did this happen?

@artemisart
Copy link
Contributor

I have the same problem with BERT base which becomes ~1.3 GB.

@jacobdevlin-google
Copy link
Contributor

jacobdevlin-google commented Nov 6, 2018

The distributed checkpoints only include the actual model weights, but the checkpoints written during training include the Adam momentum and variance variables for each weight variable, which are not actually part of the model are needed to be able to pause and resume training in the middle. So the training checkpoints are 3x the size of the distributed checkpoint.

@zhezhaoa
Copy link

zhezhaoa commented Dec 7, 2018

The distributed checkpoints only include the actual model weights, but the checkpoints written during training include the Adam momentum and variance variables for each weight variable, which are not actually part of the model are needed to be able to pause and resume training in the middle. So the training checkpoints are 3x the size of the distributed checkpoint.

Thank you for your advice. Could you tell me how to only save model weights (not include momentum and variance), just like the pretreated model you provide?

@ymcdull
Copy link

ymcdull commented Dec 10, 2018

The distributed checkpoints only include the actual model weights, but the checkpoints written during training include the Adam momentum and variance variables for each weight variable, which are not actually part of the model are needed to be able to pause and resume training in the middle. So the training checkpoints are 3x the size of the distributed checkpoint.

Thank you for your advice. Could you tell me how to only save model weights (not include momentum and variance), just like the pretreated model you provide?

@zhezhaoa I have a solution here: #99
I guess there must be some better and tidier solutions, but at least this one works for me, and the size of the weight file drops from 1.3GB to 400MB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants