different embedding weights for base-uncased with different transformers versions #8866

aleksandra-sp · 2020-12-01T09:40:28Z

Environment info

transformers version: 4.0.0, 3.4.0 and 2.9.0
Platform:
Python version: 3.7.0
PyTorch version: 1.4.0
Tensorflow version: 2.2.0

Information

Model I am using: Bert

The problem arises when using my own scripts. I trained a LayoutLM model by using the original Unilm repo (https://github.com/microsoft/unilm/tree/master/layoutlm) and obtained pretty good results (± 0.9 f1 score). When the Huggingface implementation came out, I retrained the model with the same dataset, parameters and seed and got rubbish results (less then 0.2 F1 score). After investigating, I found that the weights of the embeddings of the pretrained model, loaded at the beginning of training are different for different transformers versions. The weights are also different for the final trained model: a model trained with the original implementation gives different predict results for the same data when predicting using the Huggingface implementation, due to the weights being different after loading.

To reproduce

Steps to reproduce the behavior:

Huggingface code:

from transformers import LayoutLMConfig, LayoutLMForTokenClassification

pretrained_model_path = "models/base-uncased"

config = LayoutLMConfig.from_pretrained(pretrained_model_path, num_labels=len(25))

model = LayoutLMForTokenClassification.from_pretrained(
        pretrained_model_path, from_tf=bool(".ckpt" in pretrained_model_path), config=config
    )

print(model.base_model._modules["embeddings"]._modules["word_embeddings"].weight)


"""transformers 4.0.0:
Parameter containing:
tensor([[-0.0211, -0.0056,  0.0198,  ...,  0.0119,  0.0074, -0.0048],
        [-0.0268,  0.0006,  0.0310,  ..., -0.0195, -0.0534,  0.0284],
        [ 0.0234,  0.0026, -0.0024,  ..., -0.0074, -0.0015, -0.0212],
        ...,
        [-0.0274, -0.0074,  0.0161,  ..., -0.0256,  0.0189, -0.0328],
        [-0.0350, -0.0304,  0.0087,  ..., -0.0349, -0.0086,  0.0229],
        [-0.0068, -0.0077, -0.0084,  ..., -0.0181, -0.0111,  0.0385]],
       requires_grad=True)
"""

"""transformers 3.4.0:
Parameter containing:
tensor([[ 0.0298, -0.0229, -0.0033,  ...,  0.0097, -0.0179, -0.0065],
        [-0.0098,  0.0150, -0.0283,  ..., -0.0424, -0.0031, -0.0135],
        [ 0.0122,  0.0038, -0.0066,  ..., -0.0261,  0.0167,  0.0176],
        ...,
        [ 0.0037,  0.0001,  0.0096,  ..., -0.0037, -0.0018,  0.0067],
        [ 0.0274,  0.0076,  0.0065,  ...,  0.0084, -0.0230, -0.0011],
        [-0.0155, -0.0155, -0.0028,  ..., -0.0140,  0.0084, -0.0016]],
       requires_grad=True)
"""

With original Layoutlm implementation, transformers 2.9.0:

from unilm.layoutlm.layoutlm import LayoutlmConfig, LayoutlmForTokenClassification

pretrained_model_path = "models/base-uncased"



config = LayoutlmConfig.from_pretrained(
        pretrained_model_path,
        num_labels=len(25),
    )
model = LayoutlmForTokenClassification.from_pretrained(
        pretrained_model_path,
        from_tf=bool(".ckpt" in pretrained_model_path),
        config=config,
    )

print(model.base_model._modules["embeddings"]._modules["word_embeddings"].weight)

"""
Parameter containing:
tensor([[-0.0111, -0.0777,  0.0293,  ..., -0.0323, -0.0190,  0.0403],
        [-0.0579, -0.0331, -0.0399,  ..., -0.0248, -0.0278, -0.0398],
        [-0.0261, -0.0383, -0.0225,  ...,  0.0011, -0.0803, -0.0019],
        ...,
        [-0.0186, -0.0593, -0.0167,  ..., -0.0243, -0.0096,  0.0050],
        [-0.0555, -0.0274,  0.0049,  ..., -0.0206, -0.0172, -0.0241],
        [-0.0328, -0.0788, -0.0211,  ..., -0.0187, -0.0497,  0.0444]],
       requires_grad=True)
"""

Expected behavior

Get the same weights regardless the transformers version used.

The text was updated successfully, but these errors were encountered:

rtanaka-lab · 2020-12-08T14:54:59Z

Facing the same issue. A reply on this is highly appreciated.

hasansalimkanmaz · 2021-01-04T09:59:10Z

can this be your solution? Hope it helps...

github-actions · 2021-04-15T15:06:22Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

antondelafuente · 2021-12-08T04:42:01Z

I had the same issue for GPT2LMHeadModel. In my case, I found the solution. Let hf3_model_dir be the Hugging Face v3.x model directory that you give to load_pretrained. Inside this directory is the saved pytorch model file called pytorch_model.bin. Let's load this file directly using pytorch:

state_dict = torch.load('pytorch_model.bin')`

Now check the values of these two entries:

state_dict['transformer.wte.weight']
state_dict['lm_head.weight']

I found that they were different. However, they should be the same vocab_size x embedding_size matrix. Indeed, let's actually load the model:

model = transformers.GPT2LMHeadModel.from_pretrained(hf3_model)`

And check the following values:

model.transformer.wte.weight
model.lm_head.weight

You will find that they are the same. However,
in Hugging Face v3.x, they are both equal to state_dict['lm_head.weight']
in Hugging Face v4.x, they are both equal to state_dict['transformer.wte.weight'].

So that's the cause of the problem. To get the same behavior in Hugging Face v4.x as you get in Hugging Face v3.x, I manually set both equal to state_dict['lm_head.weight'].

antondelafuente · 2021-12-08T04:56:05Z

As a further comment, for models saved under Hugging Face v4.x, state_dict['transformer.wte.weight'] and state_dict['lm_head.weight'] are both equal as they should be.

For models saved under Hugging Face v3.x, state_dict['transformer.wte.weight'] ends up being (I believe) just random garbage that is harmless if reloaded using Hugging Face v3.x but can be very harmful if reloaded using Hugging Face v4.x

NielsRogge mentioned this issue Jan 8, 2021

Improve LayoutLM #9476

Merged

4 tasks

github-actions bot closed this as completed Apr 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different embedding weights for base-uncased with different transformers versions #8866

different embedding weights for base-uncased with different transformers versions #8866

aleksandra-sp commented Dec 1, 2020 •

edited

rtanaka-lab commented Dec 8, 2020

hasansalimkanmaz commented Jan 4, 2021

github-actions bot commented Apr 15, 2021

antondelafuente commented Dec 8, 2021 •

edited

antondelafuente commented Dec 8, 2021

different embedding weights for base-uncased with different transformers versions #8866

different embedding weights for base-uncased with different transformers versions #8866

Comments

aleksandra-sp commented Dec 1, 2020 • edited

Environment info

Information

To reproduce

Expected behavior

rtanaka-lab commented Dec 8, 2020

hasansalimkanmaz commented Jan 4, 2021

github-actions bot commented Apr 15, 2021

antondelafuente commented Dec 8, 2021 • edited

antondelafuente commented Dec 8, 2021

aleksandra-sp commented Dec 1, 2020 •

edited

antondelafuente commented Dec 8, 2021 •

edited