-
Notifications
You must be signed in to change notification settings - Fork 26.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why we need the init_weight function in BERT pretrained model #4701
Comments
Have a look at the code for
This ensure that layers were not pretrained (e.g. in some cases the final classification layer) do get initialised in |
Great. Thanks. I also read through the code and that really clears my confusion. |
Good. If the answer was sufficient on Stack Overflow as well, please close that too. |
when we construct BertForSequenceClassification from pre-trained model, didn't we overwrite the loaded weights with random initialisation? |
@sunersheng No, the random initialization happens first and then the existing weights are loaded into it. |
❓ Questions & Help
I have already tried asking the question is SO, which you can find the link here.
Details
In the code by Hugginface transformers, there are many fine-tuning models have the function
init_weight
.For example(here), there is a
init_weight
function at last. Even though we usefrom_pretrained
, it will still call the constructor and callinit_weight
function.As I know, it will call the following code
My question is If we are loading the pre-trained language model, why do we need to initialize the weight for every module?
I guess I must be misunderstanding something here.
The text was updated successfully, but these errors were encountered: