Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output model files compatible with Official Bert's pre-trained models? #18

Closed
1e0ng opened this issue Sep 17, 2019 · 9 comments
Closed

Comments

@1e0ng
Copy link

1e0ng commented Sep 17, 2019

Hi, I tried to pre-train a Bert model with this project. I find the output of the model is not compatible with the official Bert's pre-trained model. Is it easy to make it compatible?

For example, I can use pytorch_transformers to read the official Bert's pre-trained models, but when I do this same for the model trained by this project, I get some errors about some shape sizes are not the same.

RuntimeError: Error(s) in loading state_dict for BertForMultiLabelSequenceClassification:
	size mismatch for bert.encoder.layer.0.intermediate.dense.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([256, 128]).
	size mismatch for bert.encoder.layer.0.output.dense.weight: copying a param with shape torch.Size([256, 128]) from checkpoint, the shape in current model is torch.Size([128, 256]).
	size mismatch for bert.encoder.layer.1.intermediate.dense.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([256, 128]).
	size mismatch for bert.encoder.layer.1.output.dense.weight: copying a param with shape torch.Size([256, 128]) from checkpoint, the shape in current model is torch.Size([128, 256]).
	size mismatch for bert.encoder.layer.2.intermediate.dense.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([256, 128]).
	size mismatch for bert.encoder.layer.2.output.dense.weight: copying a param with shape torch.Size([256, 128]) from checkpoint, the shape in current model is torch.Size([128, 256]).
	size mismatch for bert.encoder.layer.3.intermediate.dense.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([256, 128]).
	size mismatch for bert.encoder.layer.3.output.dense.weight: copying a param with shape torch.Size([256, 128]) from checkpoint, the shape in current model is torch.Size([128, 256]).
	size mismatch for bert.encoder.layer.4.intermediate.dense.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([256, 128]).
	size mismatch for bert.encoder.layer.4.output.dense.weight: copying a param with shape torch.Size([256, 128]) from checkpoint, the shape in current model is torch.Size([128, 256]).
	size mismatch for bert.encoder.layer.5.intermediate.dense.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([256, 128]).
	size mismatch for bert.encoder.layer.5.output.dense.weight: copying a param with shape torch.Size([256, 128]) from checkpoint, the shape in current model is torch.Size([128, 256]).
@guotong1988
Copy link
Owner

guotong1988 commented Sep 17, 2019

@guotong1988
Copy link
Owner

@1e0ng
Copy link
Author

1e0ng commented Sep 17, 2019

Hi @guotong1988 thanks for the reply. Actually I'm using the the run_pretraining_gpu_v2.py script from the beginning.

@guotong1988
Copy link
Owner

Could you find the error tensor corresponding to the code?

@1e0ng
Copy link
Author

1e0ng commented Sep 17, 2019

Hi @guotong1988 I don't know how to find the error tensor...

The above error was from this code (if it helps):


/usr/local/lib/python3.6/dist-packages/pytorch_transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    592         if len(error_msgs) > 0:
    593             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 594                                model.__class__.__name__, "\n\t".join(error_msgs)))
    595 
    596         if hasattr(model, 'tie_weights'):

RuntimeError: Error(s) in loading state_dict for BertForMultiLabelSequenceClassification:
	size mismatch for bert.encoder.layer.0.intermediate.dense.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([256, 128]).

@guotong1988
Copy link
Owner

guotong1988 commented Sep 17, 2019

try https://github.com/guotong1988/BERT-multi-gpu/blob/master/modeling_lastest.py
edit the code to import it.
I copy it from https://github.com/google-research/bert 10 minutes ago.
hope you can give me feedback.

@1e0ng
Copy link
Author

1e0ng commented Sep 17, 2019

Hi, I assume you mean to change the run_pretraining_gpu_v2.py script by changing

the line

import modeling

to

import modeling_lastest as modeling

I'll have a try and let you know.

@guotong1988
Copy link
Owner

Yes

@1e0ng
Copy link
Author

1e0ng commented Sep 19, 2019

Hi @guotong1988 It works.
Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants