Join GitHub today
Difference between the PyTorch-converted pre-trained BERT parameters released on Google Drive and the one obtained using HuggingFace conversion script #1
I tried to get the Pytorch pre-trained BERT checkpoint using the conversion script provided by HuggingFace. The script executed without any problem and I was able to obtain a binary converted file.
However, I noticed a few differences between this file compared with the PyTorch-converted pre-trained BERT parameters released on Google Drive.
First, the two files has different variable naming. The HuggingFace converted file has the prefix
I was able to map most variables in these two files by manipulating the naming and verify their equivalence, but I cannot find a mapping of the following tensors in the HuggingFace conversion to the Google Drive release, most of them related to layer normalization.
May I understand what causes the above differences? Is layer normalization removed from the BERT architecture on purpose? Thanks.
I found huggingface team has changed the variables names in their "updated" code.
For the compatibility, please use the old