# **[Hugging Face: Masked Language Modeling](https://huggingface.co/tasks/fill-mask)**



## **Fill-Mask**

**Masked language modeling** is the task of masking some of the words in a sentence and predicting which words should replace those masks. These models are useful when we want to get a statistical understanding of the language in which the model is trained in.

### **Use Cases**

**Domain Adaptation** 🧐

Masked language models do not require labelled data! They are trained by masking a couple of words in sentences and the model is expected to guess the masked word. This makes it very practical!

For example, masked language modeling is used to train large models for domain-specific problems. If you have to work on a domain-specific task, such as retrieving information from medical research papers, you can train a masked language model using those papers. 📄

The resulting model has a statistical understanding of the language used in medical research papers, and can be further trained in a process called fine-tuning to solve different tasks, such as [Text Classification](https://huggingface.co/tasks/text-classification) or [Question Answering](https://huggingface.co/tasks/question-answering) to build a medical research papers information extraction system. 👩‍⚕️ Pre-training on domain-specific data tends to yield better results (see [this paper](https://arxiv.org/abs/2007.15779) for an example).

If you don't have the data to train a masked language model, you can also use an existing [domain-specific masked language model](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext) from the Hub and fine-tune it with your smaller task dataset. That's the magic of Open Source and sharing your work! 🎉

### **Example**

In [1]:
!pip install -U transformers --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m52.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.8/236.8 kB[0m [31m26.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m97.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m44.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
from transformers import pipeline

model = pipeline("fill-mask")

text = "The capital of France is <mask>."
model(text)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/331M [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

[{'score': 0.270371675491333,
  'token': 2201,
  'token_str': ' Paris',
  'sequence': 'The capital of France is Paris.'},
 {'score': 0.055883653461933136,
  'token': 12790,
  'token_str': ' Lyon',
  'sequence': 'The capital of France is Lyon.'},
 {'score': 0.029898103326559067,
  'token': 4612,
  'token_str': ' Barcelona',
  'sequence': 'The capital of France is Barcelona.'},
 {'score': 0.023081662133336067,
  'token': 12696,
  'token_str': ' Monaco',
  'sequence': 'The capital of France is Monaco.'},
 {'score': 0.020979885011911392,
  'token': 5459,
  'token_str': ' Berlin',
  'sequence': 'The capital of France is Berlin.'}]

In [4]:
# Download used models
!git clone https://huggingface.co/distilroberta-base

Cloning into 'distilroberta-base'...
remote: Enumerating objects: 54, done.[K
remote: Total 54 (delta 0), reused 0 (delta 0), pack-reused 54[K
Unpacking objects: 100% (54/54), 1.30 MiB | 4.48 MiB/s, done.
Filtering content: 100% (5/5), 1.82 GiB | 44.27 MiB/s, done.


In [6]:
from transformers import pipeline

model = pipeline("fill-mask", model="distilroberta-base")

text = "The capital of France is <mask>."

output = model(text)
masked_word = output[0]["token_str"]

print(masked_word)

 Paris


### **Additional Resources**
- [Hugging Face | Models for Fill Mask](https://huggingface.co/models?pipeline_tag=fill-mask)
- [Hugging Face | Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines)
- [Hugging Face | Course Chapter on Fine-tuning a Masked Language Model](https://huggingface.co/course/chapter7/3?fw=pt)
- [Hugging Face | BERT 101: State Of The Art NLP Model Explained](https://huggingface.co/blog/bert-101)
- [Hugging Face | Models for Fill Mask in Portuguese](https://huggingface.co/models?pipeline_tag=fill-mask&language=pt&sort=downloads)

### **Example in Portuguese**

In [9]:
# Download used models
!git clone https://huggingface.co/neuralmind/bert-base-portuguese-cased

Cloning into 'bert-base-portuguese-cased'...
remote: Enumerating objects: 39, done.[K
remote: Total 39 (delta 0), reused 0 (delta 0), pack-reused 39[K
Unpacking objects: 100% (39/39), 103.13 KiB | 3.13 MiB/s, done.
Filtering content: 100% (3/3), 1.30 GiB | 41.46 MiB/s, done.


In [15]:
from transformers import pipeline

model = pipeline("fill-mask", model="bert-base-portuguese-cased")

text = "A capital do Brasil é [MASK]."

output = model(text)
masked_word = output[0]["token_str"]

print(masked_word)

Brasília
