# model_name = "bert-base-uncased"

**1. Model Class (BertForMaskedLM):**

- This refers to the specific type of BERT model architecture you are using for a particular task.


- **BertForMaskedLM** is a class designed for masked language modeling, where the model tries to predict masked words in a sentence.


- These classes wrap the core BERT architecture and add task-specific layers, like classification heads or masked language modeling heads, so that BERT can be fine-tuned for different purposes.

- Think of these model classes as different versions or extensions of BERT tailored for specific NLP tasks.

**2. Model Name ("bert-base-uncased"):**

- the pre-trained weights and configuration you are using for your model.

- "bert-base-uncased" is one of the many pre-trained models available from Hugging Face.

- It is a base version of BERT with 12 layers, and it is "uncased", meaning it does not distinguish between uppercase and lowercase letters (e.g., "Apple" and "apple" are treated the same).

- There are many different pre-trained model names available, such as "bert-large-uncased", "roberta-base", "gpt-2", etc.

- The model name specifies which set of pre-trained weights and tokenizer to load. This means when you use model_name = "bert-base-uncased", you're downloading a pre-trained model and tokenizer trained on a large corpus of data (like Wikipedia and BooksCorpus).

**Model Class (BertForMaskedLM) defines the task-specific architecture.**

**Model Name ("bert-base-uncased") defines which specific set of pre-trained weights and vocabulary you are loading into that architecture.**

In [1]:
from transformers import BertTokenizer, BertForMaskedLM
import torch

#Load pre-trained BERT model and tokenizer
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForMaskedLM.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [2]:
sentence="I want to go [MASK]."
tokenized_input = tokenizer.encode_plus(sentence, return_tensors="pt", add_special_tokens=True)
tokenized_input
# 101 CLS
# 103 MASK
# 102 SEP
# [CLS], I, want, to, go, [MASK],.,[SEP]
# 1,1,1,1,1,1,1,1
# Model will get internally

{'input_ids': tensor([[ 101, 1045, 2215, 2000, 2175,  103, 1012,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}

In [None]:
sentence="I want to go [MASK]."
tokenized_input1 = tokenizer.encode_plus(sentence, return_tensors="tf", add_special_tokens=True)
tokenized_input1

{'input_ids': <tf.Tensor: shape=(1, 8), dtype=int32, numpy=array([[ 101, 1045, 2215, 2000, 2175,  103, 1012,  102]], dtype=int32)>, 'token_type_ids': <tf.Tensor: shape=(1, 8), dtype=int32, numpy=array([[0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(1, 8), dtype=int32, numpy=array([[1, 1, 1, 1, 1, 1, 1, 1]], dtype=int32)>}

In [None]:
sentence="I want to go [MASK]."
tokenized_input1 = tokenizer.encode_plus(sentence, return_tensors="np", add_special_tokens=True)
tokenized_input1

{'input_ids': array([[ 101, 1045, 2215, 2000, 2175,  103, 1012,  102]]), 'token_type_ids': array([[0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': array([[1, 1, 1, 1, 1, 1, 1, 1]])}

In [3]:
tokenizer.convert_ids_to_tokens(tokenized_input['input_ids'][0])

['[CLS]', 'i', 'want', 'to', 'go', '[MASK]', '.', '[SEP]']

In [4]:
# Get the position of the masked token
masked_index = torch.where(tokenized_input["input_ids"] == tokenizer.mask_token_id)[1]
masked_index

tensor([5])

In [5]:
# It gives probability of each word to become a prediction
with torch.no_grad():
    outputs = model(**tokenized_input)

outputs.logits.shape

torch.Size([1, 8, 30522])

In [6]:
outputs.logits

tensor([[[ -6.6255,  -6.5849,  -6.6058,  ...,  -5.9951,  -5.7598,  -4.0771],
         [-12.7545, -12.4131, -12.5089,  ...,  -9.7995,  -9.0677, -12.1217],
         [-12.4856, -12.3067, -12.7911,  ...,  -9.0879,  -9.8517, -11.9846],
         ...,
         [ -8.1555,  -8.1128,  -8.2111,  ...,  -7.1802,  -8.1095,  -8.0451],
         [-12.2099, -11.7445, -12.2628,  ...,  -9.1541, -10.4449,  -8.7678],
         [ -9.4640,  -9.3422,  -9.4012,  ...,  -8.6115,  -8.2864,  -6.3536]]])

"I want to go [MASK]."

[cls],I,want,to,go,[MASK],'.',[SEP] ===> Model

vocab=30522 words are there
[30522 probabilties]== max prob  ['Cls']
[30522 probabilites]=== max pron should match with  'I'
[30522 probabilites] === max prob word should match with 'want'
[30522 pr] === max prob word:home msk

In [7]:
len(outputs.logits[0])

8

In [15]:
outputs.logits[0][0]  # 1 word vector
np.argmax(np.array(outputs.logits[0][0]))

1012

In [16]:
tokenized_input

{'input_ids': tensor([[ 101, 1045, 2215, 2000, 2175,  103, 1012,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}

In [None]:
outputs.logits[0,5]

tensor([-8.1555, -8.1128, -8.2111,  ..., -7.1802, -8.1095, -8.0451])

In [17]:
# Get the logits for the masked token
predictions = outputs.logits[0, masked_index, :]
predictions

tensor([[-8.1555, -8.1128, -8.2111,  ..., -7.1802, -8.1095, -8.0451]])

In [18]:
import numpy as np
np.argmax(np.array(predictions[0]))

2188

In [19]:
# Get the top predictions
top_indices = torch.topk(predictions, 3, dim=1).indices[0].tolist()
top_indices

[2188, 2067, 2185]

In [20]:
tokenizer.convert_ids_to_tokens([2188,2067,2185])

['home', 'back', 'away']

In [None]:
from transformers import BertTokenizer, BertForMaskedLM
import torch

#Load pre-trained BERT model and tokenizer
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForMaskedLM.from_pretrained(model_name)

# Define a function to predict masked words
def predict_masked_words(sentence):
    # Tokenize the input sentence
    tokenized_input = tokenizer.encode_plus(sentence, return_tensors="pt", add_special_tokens=True)

    # Get the position of the masked token
    masked_index = torch.where(tokenized_input["input_ids"] == tokenizer.mask_token_id)[1]

    # Predict the masked token
    with torch.no_grad():
        outputs = model(**tokenized_input)

    # Get the logits for the masked token
    predictions = outputs.logits[0, masked_index, :]

    # Get the top predictions
    top_indices = torch.topk(predictions, 1, dim=1).indices[0].tolist()

    # Convert token IDs to actual words
    predicted_tokens = [tokenizer.decode([index]) for index in top_indices]

    return predicted_tokens

# Example sentence with a masked word
input_sentence = "I want to go [MASK]."

# Predict masked words
predicted_words = predict_masked_words(input_sentence)

print("Predicted words:", predicted_words)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Predicted words: ['home']


**Using Tensorflow**

In [None]:
from transformers import BertTokenizer, TFBertForMaskedLM
import tensorflow as tf

# Load pre-trained BERT model and tokenizer
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = TFBertForMaskedLM.from_pretrained(model_name)

# Define a function to predict masked words
def predict_masked_words(sentence):
    # Tokenize the input sentence
    tokenized_input = tokenizer.encode_plus(sentence, return_tensors="tf", add_special_tokens=True)

    # Get the position of the masked token
    masked_index = tf.where(tokenized_input["input_ids"] == tokenizer.mask_token_id)

    # Predict the masked token
    outputs = model(tokenized_input)

    # Get the logits for the masked token
    predictions = outputs.logits[0, masked_index[0, 1], :]

    # Get the top predictions
    top_indices = tf.math.top_k(predictions, k=1).indices.numpy()

    # Convert token IDs to actual words
    predicted_tokens = [tokenizer.decode([index]) for index in top_indices]

    return predicted_tokens

# Example sentence with a masked word
input_sentence = "I want to go [MASK]."

# Predict masked words
predicted_words = predict_masked_words(input_sentence)

print("Predicted words:", predicted_words)


All PyTorch model weights were used when initializing TFBertForMaskedLM.

All the weights of TFBertForMaskedLM were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForMaskedLM for predictions without further training.


Predicted words: ['home']


# model_name = "bert-large-uncased"


In [None]:
from transformers import BertTokenizer, BertForMaskedLM
import torch

# Load pre-trained BERT model and tokenizer
model_name = "bert-large-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForMaskedLM.from_pretrained(model_name)

# Define a function to predict masked words
def predict_masked_words(sentence):
    # Tokenize the input sentence
    tokenized_input = tokenizer.encode_plus(sentence, return_tensors="pt", add_special_tokens=True)

    # Get the position of the masked token
    masked_index = torch.where(tokenized_input["input_ids"] == tokenizer.mask_token_id)[1]

    # Predict the masked token
    with torch.no_grad():
        outputs = model(**tokenized_input)

    # Get the logits for the masked token
    predictions = outputs.logits[0, masked_index, :]

    # Get the top 5 predictions
    top_5_indices = torch.topk(predictions, 5, dim=1).indices[0].tolist()

    # Convert token IDs to actual words
    predicted_tokens = [tokenizer.decode([index]) for index in top_5_indices]

    return predicted_tokens

# Example sentence with a masked word
input_sentence = "Are you going to [MASK]."
# Predict masked words
predicted_words = predict_masked_words(input_sentence)

print("Predicted words:", predicted_words)


Some weights of the model checkpoint at bert-large-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Predicted words: ['?', 'die', 'sleep', '...', 'stay']


# model_name = bert-base-multilingual-uncased

In [None]:
from transformers import BertTokenizer, BertForMaskedLM
import torch

# Load pre-trained BERT model and tokenizer
model_name = "bert-base-multilingual-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForMaskedLM.from_pretrained(model_name)

# Define a function to predict masked words
def predict_masked_words(sentence):
    # Tokenize the input sentence
    tokenized_input = tokenizer.encode_plus(sentence, return_tensors="pt", add_special_tokens=True)

    # Get the position of the masked token
    masked_index = torch.where(tokenized_input["input_ids"] == tokenizer.mask_token_id)[1]

    # Predict the masked token
    with torch.no_grad():
        outputs = model(**tokenized_input)

    # Get the logits for the masked token
    predictions = outputs.logits[0, masked_index, :]

    # Get the top 5 predictions
    top_5_indices = torch.topk(predictions, 5, dim=1).indices[0].tolist()

    # Convert token IDs to actual words
    predicted_tokens = [tokenizer.decode([index]) for index in top_5_indices]

    return predicted_tokens

# Example sentence with a masked word
input_sentence = "What is your favorite [MASK]?"
# Predict masked words
predicted_words = predict_masked_words(input_sentence)

print("Predicted words:", predicted_words)


Some weights of the model checkpoint at bert-base-multilingual-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Predicted words: ['game', 'food', 'thing', 'song', 'movie']
