Why we need the init_weight function in BERT pretrained model #4701

allanj · 2020-06-01T02:38:22Z

❓ Questions & Help

I have already tried asking the question is SO, which you can find the link here.

Details

In the code by Hugginface transformers, there are many fine-tuning models have the function init_weight.
For example(here), there is a init_weight function at last. Even though we use from_pretrained, it will still call the constructor and call init_weight function.

class BertForSequenceClassification(BertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels

        self.bert = BertModel(config)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

        self.init_weights()

As I know, it will call the following code

def _init_weights(self, module):
    """ Initialize the weights """
    if isinstance(module, (nn.Linear, nn.Embedding)):
        # Slightly different from the TF version which uses truncated_normal for initialization
        # cf https://github.com/pytorch/pytorch/pull/5617
        module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
    elif isinstance(module, BertLayerNorm):
        module.bias.data.zero_()
        module.weight.data.fill_(1.0)
    if isinstance(module, nn.Linear) and module.bias is not None:
        module.bias.data.zero_()

My question is If we are loading the pre-trained language model, why do we need to initialize the weight for every module?

I guess I must be misunderstanding something here.

The text was updated successfully, but these errors were encountered:

BramVanroy · 2020-06-01T10:24:14Z

Have a look at the code for .from_pretrained(). What actually happens is something like this:

find the correct base model class to initialise
initialise that class with pseudo-random initialisation (by using the _init_weights function that you mention)
find the file with the pretrained weights
overwrite the weights of the model that we just created with the pretrained weightswhere applicable

This ensure that layers were not pretrained (e.g. in some cases the final classification layer) do get initialised in _init_weights but don't get overridden.

allanj · 2020-06-01T11:06:17Z

Great. Thanks. I also read through the code and that really clears my confusion.

BramVanroy · 2020-06-01T12:48:09Z

Good. If the answer was sufficient on Stack Overflow as well, please close that too.

sunersheng · 2021-11-26T02:41:36Z

Have a look at the code for .from_pretrained(). What actually happens is something like this:

find the correct base model class to initialise

initialise that class with pseudo-random initialisation (by using the _init_weights function that you mention)

find the file with the pretrained weights

overwrite the weights of the model that we just created with the pretrained weightswhere applicable

This ensure that layers were not pretrained (e.g. in some cases the final classification layer) do get initialised in _init_weights but don't get overridden.

when we construct BertForSequenceClassification from pre-trained model, didn't we overwrite the loaded weights with random initialisation?

BramVanroy · 2021-11-26T07:50:54Z

@sunersheng No, the random initialization happens first and then the existing weights are loaded into it.

BramVanroy added the Usage General questions about the library label Jun 1, 2020

allanj closed this as completed Jun 1, 2020

This was referenced Sep 7, 2021

Pip install cryptacular failing at scons dholth/cryptacular#1

Closed

self.init_weights()是重新初始化了bert的权重吗？ lonePatient/BERT-NER-Pytorch#37

Open

sm354 mentioned this issue Apr 18, 2022

How is T5 randomly initialised and NOT using pretrained LM weights? apoorvumang/kgt5#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why we need the init_weight function in BERT pretrained model #4701

Why we need the init_weight function in BERT pretrained model #4701

allanj commented Jun 1, 2020

BramVanroy commented Jun 1, 2020 •

edited

Loading

allanj commented Jun 1, 2020

BramVanroy commented Jun 1, 2020

sunersheng commented Nov 26, 2021

BramVanroy commented Nov 26, 2021

Why we need the init_weight function in BERT pretrained model #4701

Why we need the init_weight function in BERT pretrained model #4701

Comments

allanj commented Jun 1, 2020

❓ Questions & Help

Details

BramVanroy commented Jun 1, 2020 • edited Loading

allanj commented Jun 1, 2020

BramVanroy commented Jun 1, 2020

sunersheng commented Nov 26, 2021

BramVanroy commented Nov 26, 2021

BramVanroy commented Jun 1, 2020 •

edited

Loading