Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?



LUKE (Language Understanding with Knowledge-based Embeddings) is a new pretrained contextualized representation of words and entities based on transformer. It was proposed in our paper LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention. It achieves state-of-the-art results on important NLP benchmarks including SQuAD v1.1 (extractive question answering), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), TACRED (relation classification), and Open Entity (entity typing).

This repository contains the source code to pretrain the model and fine-tune it to solve downstream tasks.


November 9, 2022: The large version of LUKE-Japanese is available

The large version of LUKE-Japanese is available on the Hugging Face Model Hub:

This model achieves state-of-the-art results on three datasets in JGLUE.

Model MARC-ja JSTS JNLI JCommonsenseQA
acc Pearson/Spearman acc acc
LUKE Japanese large 0.965 0.932/0.902 0.927 0.893
Tohoku BERT large 0.955 0.913/0.872 0.900 0.816
Waseda RoBERTa large (seq128) 0.954 0.930/0.896 0.924 0.907
Waseda RoBERTa large (seq512) 0.961 0.926/0.892 0.926 0.891
XLM RoBERTa large 0.964 0.918/0.884 0.919 0.840

October 27, 2022: The Japanese version of LUKE is available

The Japanese version of LUKE is now available on the Hugging Face Model Hub:

This model outperforms other base-sized models on four datasets in JGLUE.

Model MARC-ja JSTS JNLI JCommonsenseQA
acc Pearson/Spearman acc acc
LUKE Japanese base 0.965 0.916/0.877 0.912 0.842
Tohoku BERT base 0.958 0.909/0.868 0.899 0.808
NICT BERT base 0.958 0.910/0.871 0.902 0.823
Waseda RoBERTa base 0.962 0.913/0.873 0.895 0.840
XLM RoBERTa base 0.961 0.877/0.831 0.893 0.687

April 13, 2022: The mLUKE fine-tuning code is available

The example code is updated. Now it is based on allennlp and transformers. You can reproduce the experiments in the LUKE and mLUKE papers with this implementation. For the details, please see under each example directory. The older code used in the LUKE paper has been moved to examples/legacy.

April 13, 2022: The detailed instructions for pretraining LUKE models are available

For those interested in pretraining LUKE models, we explain how to prepare datasets and run the pretraining code on

November 24, 2021: Entity disambiguation example is available

The example code of entity disambiguation based on LUKE has been added to this repository. This model was originally proposed in our paper, and achieved state-of-the-art results on five standard entity disambiguation datasets: AIDA-CoNLL, MSNBC, AQUAINT, ACE2004, and WNED-WIKI.

For further details, please refer to examples/entity_disambiguation.

August 3, 2021: New example code based on Hugging Face Transformers and AllenNLP is available

New fine-tuning examples of three downstream tasks, i.e., NER, relation classification, and entity typing, have been added to LUKE. These examples are developed based on Hugging Face Transformers and AllenNLP. The fine-tuning models are defined using simple AllenNLP's Jsonnet config files!

The example code is available in examples.

May 5, 2021: LUKE is added to Hugging Face Transformers

LUKE has been added to the master branch of the Hugging Face Transformers library. You can now solve entity-related tasks (e.g., named entity recognition, relation classification, entity typing) easily using this library.

For example, the LUKE-large model fine-tuned on the TACRED dataset can be used as follows:

from transformers import LukeTokenizer, LukeForEntityPairClassification
model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
text = "Beyoncé lives in Los Angeles."
entity_spans = [(0, 7), (17, 28)]  # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
predicted_class_idx = int(logits[0].argmax())
print("Predicted class:", model.config.id2label[predicted_class_idx])
# Predicted class: per:cities_of_residence

We also provide the following three Colab notebooks that show how to reproduce our experimental results on CoNLL-2003, TACRED, and Open Entity datasets using the library:

Please refer to the official documentation for further details.

November 5, 2021: LUKE-500K (base) model

We released LUKE-500K (base), a new pretrained LUKE model which is smaller than existing LUKE-500K (large). The experimental results of the LUKE-500K (base) and LUKE-500K (large) on SQuAD v1 and CoNLL-2003 are shown as follows:

Task Dataset Metric LUKE-500K (base) LUKE-500K (large)
Extractive Question Answering SQuAD v1.1 EM/F1 86.1/92.3 90.2/95.4
Named Entity Recognition CoNLL-2003 F1 93.3 94.3

We tuned only the batch size and learning rate in the experiments based on LUKE-500K (base).

Comparison with State-of-the-Art

LUKE outperforms the previous state-of-the-art methods on five important NLP tasks:

Task Dataset Metric LUKE-500K (large) Previous SOTA
Extractive Question Answering SQuAD v1.1 EM/F1 90.2/95.4 89.9/95.1 (Yang et al., 2019)
Named Entity Recognition CoNLL-2003 F1 94.3 93.5 (Baevski et al., 2019)
Cloze-style Question Answering ReCoRD EM/F1 90.6/91.2 83.1/83.7 (Li et al., 2019)
Relation Classification TACRED F1 72.7 72.0 (Wang et al. , 2020)
Fine-grained Entity Typing Open Entity F1 78.2 77.6 (Wang et al. , 2020)

These numbers are reported in our EMNLP 2020 paper.


LUKE can be installed using Poetry:

poetry install

# If you want to run pretraining for LUKE
poetry install --extras "pretraining opennlp"
# If you want to run pretraining for mLUKE
poetry install --extras "pretraining icu"

The virtual environment automatically created by Poetry can be activated by poetry shell.

A note on installing torch

The pytorch installed via poetry install does not necessarily match your hardware. In such case, see the official site and reinstall the correct version with the pip command.

poetry run pip3 uninstall torch torchvision torchaudio
# Example for Linux with CUDA 11.3
poetry run pip3 install torch torchvision torchaudio --extra-index-url

Released Models

Our pretrained models can be used with the transformers library. The model documentations can be found in the following links: LUKE and mLUKE.

Currently, the following models are available on the Hugging Face Model Hub.

Name model_name Entity Vocab Size Params
LUKE (base) studio-ousia/luke-base 500K 253 M
LUKE (large) studio-ousia/luke-large 500K 484 M
mLUKE (base) studio-ousia/mluke-base 1.2M 586 M
mLUKE (large) studio-ousia/mluke-large 1.2M 868 M
LUKE Japanese (base) studio-ousia/luke-japanese-base 570K 281 M
LUKE Japanese (large) studio-ousia/luke-japanese-large 570K 562 M

Lite Models

The entity embeddings cause a large memory footprint as they contain all the Wikipedia entities that we used in pretraining. However, in some downstream tasks (e.g., entity typing, named entity recognition, and relation classification), we only need special entity embeddings such as [MASK]. Also, you may want to only use the word representations.

With such use-cases in mind, to make our models easier to use, we have uploaded lite models only with special entity embeddings. These models perform exactly the same as the full models but have much fewer parameters, which enable fine-tuning the model with small GPUs.

Name model_name Params
LUKE (base) studio-ousia/luke-base-lite 125 M
LUKE (large) studio-ousia/luke-large-lite 356 M
mLUKE (base) studio-ousia/mluke-base-lite 279 M
mLUKE (large) studio-ousia/mluke-large-lite 561 M
LUKE Japanese (base) studio-ousia/luke-japanese-base-lite 134 M
LUKE Japanese (large) studio-ousia/luke-japanese-large-lite 415 M

Fine-tuning LUKE models

We release the fine-tuning code based on allennlp and transformers under examples. You can run fine-tuning experiments very easily with pre-defined config files and the allennlp train command. For the details and example commands for each task, please see the task directory under examples.

Pretraining LUKE models

The detailed instructions for pretraining luke models can be found on


If you use LUKE in your work, please cite the original paper.

    title = "{LUKE}: Deep Contextualized Entity Representations with Entity-aware Self-attention",
    author = "Yamada, Ikuya  and
      Asai, Akari  and
      Shindo, Hiroyuki  and
      Takeda, Hideaki  and
      Matsumoto, Yuji",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "",
    doi = "10.18653/v1/2020.emnlp-main.523",

For mLUKE, please cite this paper.

    title = "m{LUKE}: {T}he Power of Entity Representations in Multilingual Pretrained Language Models",
    author = "Ri, Ryokan  and
      Yamada, Ikuya  and
      Tsuruoka, Yoshimasa",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    year = "2022",
    publisher = "Association for Computational Linguistics",
    url = "",