In [1]:
from transformers import BertTokenizer, BertForMaskedLM
import torch
import re

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')

sentence = "input [MASK] here"
inputs = tokenizer(sentence, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
predictions = outputs.logits

masked_indices = (inputs.input_ids == tokenizer.mask_token_id).nonzero(as_tuple=True)[1]

def is_word(token):
    return re.match("^[a-zA-Z]+$", token) is not None

for idx, masked_index in enumerate(masked_indices):
    top_predictions = predictions[0, masked_index].topk(20).indices
    top_tokens = [tokenizer.decode(pred_id).strip() for pred_id in top_predictions]
    filtered_tokens = [token for token in top_tokens if is_word(token)]
    
    print(f"Predicted tokens for mask {idx + 1}: {filtered_tokens}")


BertForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another archite

Predicted tokens for mask 1: ['from', 'is', 'to', 'in', 'of', 'for', 'results', 'data', 'found', 'available', 'on', 'set']
