Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting RuntimeError for LukeRelationClassification #57

Closed
akshayparakh25 opened this issue Mar 22, 2021 · 13 comments
Closed

Getting RuntimeError for LukeRelationClassification #57

akshayparakh25 opened this issue Mar 22, 2021 · 13 comments

Comments

@akshayparakh25
Copy link

While trying to replicate results using pre-trained model for Relation Classification, I am getting the following error. I looked at the function load_state_dict(), strict argument is set to False.

Traceback (most recent call last):


  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)

  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)

  File "/home/akshay/re_rc/luke/examples/cli.py", line 132, in <module>
    cli()

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)

  File "/home/akshay/re_rc/luke/examples/utils/trainer.py", line 32, in wrapper
    return func(*args, **kwargs)

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context().obj, *args, **kwargs)

  File "/home/akshay/re_rc/luke/examples/relation_classification/main.py", line 110, in run
    model.load_state_dict(torch.load(args.checkpoint_file, map_location="cpu"))

  File "/home/akshay/re_rc/luke/luke/model.py", line 236, in load_state_dict
    super(LukeEntityAwareAttentionModel, self).load_state_dict(new_state_dict, *args, **kwargs)

  File "/home/akshay/pyTorch-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))

RuntimeError: Error(s) in loading state_dict for LukeForRelationClassification:
	size mismatch for embeddings.word_embeddings.weight: copying a param with shape torch.Size([50266, 1024]) from checkpoint, the shape in current model is torch.Size([50267, 1024]).

	size mismatch for entity_embeddings.entity_embeddings.weight: copying a param with shape torch.Size([2, 256]) from checkpoint, the shape in current model is torch.Size([3, 256]).

I cannot understand the reason behind this. Can somebody please explain!

@ikuyamada
Copy link
Member

Hi!
The current implementation adds special words and entities and their embeddings to the model, which changes the shapes of embeddings. Maybe you need to be aware of this when modifying the code.

https://github.com/studio-ousia/luke/blob/master/examples/relation_classification/main.py#L47
https://github.com/studio-ousia/luke/blob/master/examples/relation_classification/main.py#L56

@akshayparakh25
Copy link
Author

akshayparakh25 commented Mar 25, 2021

@ikuyamada Thanks for the heads up, I fine-tuned the model, and generated checkpoint is not throwing errors as expected. However, the results I get from fine-tuning and using the generated retained differs vastly.
After fine-tuning:

"test_f1": 0.7204502814258913,

"test_precision": 0.6925638179800222,

"test_recall": 0.7506766917293233

After using the generated pre-trained:

"test_f1": 0.6183343319352906,

"test_precision": 0.6159355416293644,

"test_recall": 0.6207518796992482

Any comments or is there something I am missing?

@ikuyamada
Copy link
Member

@akshayparakh25 Would you provide commands used to run the fine-tuning and the inference based on the checkpoint?

@akshayparakh25
Copy link
Author

Command Used for fine-tuning:

 python -m examples.cli \                                               
--model-file=luke_large_500k.tar.gz \
--output-dir=output \
relation-classification run \
--data-dir=./../dataset/tacred/tacred \
--train-batch-size=4 \
--gradient-accumulation-steps=8 \
--learning-rate=1e-5 \
--num-train-epochs=5

For Inference based on checkpoint:

python -m examples.cli \
 --model-file=luke_large_500k.tar.gz \
--output-dir=output \
relation-classification run \
--data-dir=./../dataset/tacred/tacred \
--checkpoint-file=output/pytorch_model.bin   \
--no-train

@ikuyamada
Copy link
Member

Thanks for your prompt reply! Can you reproduce the scores based on the publicized checkpoint file by running the same command for inference?

@akshayparakh25
Copy link
Author

I tried to follow your comment, but I wasn't sure about the special token you mentioned earlier. So I thought pre-training won't create the issue and went ahead with that.

@ikuyamada
Copy link
Member

Regarding the error mentioned in the first comment, the released checkpoint file of the relation classification task contains a word embedding with shape (50267, 1024) and an entity embedding with shape (3, 256). I think your checkpoint file is different from the publicized checkpoint file.

>>> model_data = torch.load('pytorch_model.bin')
>>> model_data['embeddings.word_embeddings.weight'].shape
torch.Size([50267, 1024])
>>> model_data['entity_embeddings.entity_embeddings.weight'].shape
torch.Size([3, 256])

RuntimeError: Error(s) in loading state_dict for LukeForRelationClassification:
size mismatch for embeddings.word_embeddings.weight: copying a param with shape torch.Size([50266, 1024]) from checkpoint, the shape in current model is torch.Size([50267, 1024]).
size mismatch for entity_embeddings.entity_embeddings.weight: copying a param with shape torch.Size([2, 256]) from checkpoint, the shape in current model is torch.Size([3, 256]).

@akshayparakh25
Copy link
Author

Regarding the error mentioned in the first comment, the released checkpoint file of the relation classification task contains a word embedding with shape (50267, 1024) and an entity embedding with shape (3, 256). I think your checkpoint file is different from the publicized checkpoint file.

>>> model_data = torch.load('pytorch_model.bin')
>>> model_data['embeddings.word_embeddings.weight'].shape
torch.Size([50267, 1024])
>>> model_data['entity_embeddings.entity_embeddings.weight'].shape
torch.Size([3, 256])

RuntimeError: Error(s) in loading state_dict for LukeForRelationClassification:
size mismatch for embeddings.word_embeddings.weight: copying a param with shape torch.Size([50266, 1024]) from checkpoint, the shape in current model is torch.Size([50267, 1024]).
size mismatch for entity_embeddings.entity_embeddings.weight: copying a param with shape torch.Size([2, 256]) from checkpoint, the shape in current model is torch.Size([3, 256]).

Do you mean the checkpoint file that I have downloaded is different from the publicized one?

@ikuyamada
Copy link
Member

Do you mean the checkpoint file that I have downloaded is different from the publicized one?

I do not know why this happens. I have downloaded the checkpoint file to my local computer and confirmed that the shapes are different from those shown in the error message.

@akshayparakh25
Copy link
Author

Thanks for your prompt reply! Can you reproduce the scores based on the publicized checkpoint file by running the same command for inference?

Thanks for your response. The link shared in this comment is working for me. And the results for test set,

"test_f1": 0.6442931771410481,
"test_precision": 0.6801470588235294,
"test_recall": 0.6120300751879699

@ikuyamada
Copy link
Member

ikuyamada commented Mar 25, 2021

I can reproduce the reported results based on the checkpoint file... Did you use poetry to create the environment? This may be related to the mismatch of library versions.
Also, the above link is the same as the link on README.

@akshayparakh25
Copy link
Author

I can reproduce the reported results based on the checkpoint file... Did you use poetry to create the environment? This may be related to the mismatch of library versions.
Also, the above link points to the same URL as the link on README.

With reference to your first point. Possibly that could be the reason.
for 2nd point, I am not why it didn't work the first time.

Thanks for your efforts in resolving the issue.

@lshowway
Copy link

lshowway commented May 5, 2022

@akshayparakh25
However, the results I get from fine-tuning and using the generated retained differs vastly. After fine-tuning:

"test_f1": 0.7204502814258913,

"test_precision": 0.6925638179800222,

"test_recall": 0.7506766917293233

After using the generated pre-trained:

"test_f1": 0.6183343319352906,

"test_precision": 0.6159355416293644,

"test_recall": 0.6207518796992482

Any comments or is there something I am missing?

I got a similar problem, the expected f1 is 72, but I got 64. I have checked the data loading utils and evaluation metrics, but I didn't solve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants