Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different embedding weights for base-uncased with different transformers versions #8866

Closed
aleksandra-sp opened this issue Dec 1, 2020 · 5 comments

Comments

@aleksandra-sp
Copy link

aleksandra-sp commented Dec 1, 2020

Environment info

  • transformers version: 4.0.0, 3.4.0 and 2.9.0
  • Platform:
  • Python version: 3.7.0
  • PyTorch version: 1.4.0
  • Tensorflow version: 2.2.0

Information

Model I am using: Bert

The problem arises when using my own scripts. I trained a LayoutLM model by using the original Unilm repo (https://github.com/microsoft/unilm/tree/master/layoutlm) and obtained pretty good results (± 0.9 f1 score). When the Huggingface implementation came out, I retrained the model with the same dataset, parameters and seed and got rubbish results (less then 0.2 F1 score). After investigating, I found that the weights of the embeddings of the pretrained model, loaded at the beginning of training are different for different transformers versions. The weights are also different for the final trained model: a model trained with the original implementation gives different predict results for the same data when predicting using the Huggingface implementation, due to the weights being different after loading.

To reproduce

Steps to reproduce the behavior:

Huggingface code:

from transformers import LayoutLMConfig, LayoutLMForTokenClassification

pretrained_model_path = "models/base-uncased"

config = LayoutLMConfig.from_pretrained(pretrained_model_path, num_labels=len(25))

model = LayoutLMForTokenClassification.from_pretrained(
        pretrained_model_path, from_tf=bool(".ckpt" in pretrained_model_path), config=config
    )

print(model.base_model._modules["embeddings"]._modules["word_embeddings"].weight)


"""transformers 4.0.0:
Parameter containing:
tensor([[-0.0211, -0.0056,  0.0198,  ...,  0.0119,  0.0074, -0.0048],
        [-0.0268,  0.0006,  0.0310,  ..., -0.0195, -0.0534,  0.0284],
        [ 0.0234,  0.0026, -0.0024,  ..., -0.0074, -0.0015, -0.0212],
        ...,
        [-0.0274, -0.0074,  0.0161,  ..., -0.0256,  0.0189, -0.0328],
        [-0.0350, -0.0304,  0.0087,  ..., -0.0349, -0.0086,  0.0229],
        [-0.0068, -0.0077, -0.0084,  ..., -0.0181, -0.0111,  0.0385]],
       requires_grad=True)
"""

"""transformers 3.4.0:
Parameter containing:
tensor([[ 0.0298, -0.0229, -0.0033,  ...,  0.0097, -0.0179, -0.0065],
        [-0.0098,  0.0150, -0.0283,  ..., -0.0424, -0.0031, -0.0135],
        [ 0.0122,  0.0038, -0.0066,  ..., -0.0261,  0.0167,  0.0176],
        ...,
        [ 0.0037,  0.0001,  0.0096,  ..., -0.0037, -0.0018,  0.0067],
        [ 0.0274,  0.0076,  0.0065,  ...,  0.0084, -0.0230, -0.0011],
        [-0.0155, -0.0155, -0.0028,  ..., -0.0140,  0.0084, -0.0016]],
       requires_grad=True)
"""

With original Layoutlm implementation, transformers 2.9.0:

from unilm.layoutlm.layoutlm import LayoutlmConfig, LayoutlmForTokenClassification

pretrained_model_path = "models/base-uncased"



config = LayoutlmConfig.from_pretrained(
        pretrained_model_path,
        num_labels=len(25),
    )
model = LayoutlmForTokenClassification.from_pretrained(
        pretrained_model_path,
        from_tf=bool(".ckpt" in pretrained_model_path),
        config=config,
    )

print(model.base_model._modules["embeddings"]._modules["word_embeddings"].weight)

"""
Parameter containing:
tensor([[-0.0111, -0.0777,  0.0293,  ..., -0.0323, -0.0190,  0.0403],
        [-0.0579, -0.0331, -0.0399,  ..., -0.0248, -0.0278, -0.0398],
        [-0.0261, -0.0383, -0.0225,  ...,  0.0011, -0.0803, -0.0019],
        ...,
        [-0.0186, -0.0593, -0.0167,  ..., -0.0243, -0.0096,  0.0050],
        [-0.0555, -0.0274,  0.0049,  ..., -0.0206, -0.0172, -0.0241],
        [-0.0328, -0.0788, -0.0211,  ..., -0.0187, -0.0497,  0.0444]],
       requires_grad=True)
"""

Expected behavior

Get the same weights regardless the transformers version used.

@rtanaka-lab
Copy link

Facing the same issue. A reply on this is highly appreciated.

@hasansalimkanmaz
Copy link
Contributor

can this be your solution? Hope it helps...

@NielsRogge NielsRogge mentioned this issue Jan 8, 2021
4 tasks
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@antondelafuente
Copy link

antondelafuente commented Dec 8, 2021

I had the same issue for GPT2LMHeadModel. In my case, I found the solution. Let hf3_model_dir be the Hugging Face v3.x model directory that you give to load_pretrained. Inside this directory is the saved pytorch model file called pytorch_model.bin. Let's load this file directly using pytorch:

state_dict = torch.load('pytorch_model.bin')`

Now check the values of these two entries:

state_dict['transformer.wte.weight']
state_dict['lm_head.weight']

I found that they were different. However, they should be the same vocab_size x embedding_size matrix. Indeed, let's actually load the model:

model = transformers.GPT2LMHeadModel.from_pretrained(hf3_model)`

And check the following values:

model.transformer.wte.weight
model.lm_head.weight

You will find that they are the same. However,
in Hugging Face v3.x, they are both equal to state_dict['lm_head.weight']
in Hugging Face v4.x, they are both equal to state_dict['transformer.wte.weight'].

So that's the cause of the problem. To get the same behavior in Hugging Face v4.x as you get in Hugging Face v3.x, I manually set both equal to state_dict['lm_head.weight'].

@antondelafuente
Copy link

As a further comment, for models saved under Hugging Face v4.x, state_dict['transformer.wte.weight'] and state_dict['lm_head.weight'] are both equal as they should be.

For models saved under Hugging Face v3.x, state_dict['transformer.wte.weight'] ends up being (I believe) just random garbage that is harmless if reloaded using Hugging Face v3.x but can be very harmful if reloaded using Hugging Face v4.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants