-
Couldn't load subscription status.
- Fork 31k
Description
Use the same sentence in your Usage Section:
# Tokenized input
text = "Who was Jim Henson ? Jim Henson was a puppeteer"
tokenized_text = tokenizer.tokenize(text)
# Mask a token that we will try to predict back with `BertForMaskedLM`
masked_index = 6
tokenized_text[masked_index] = '[MASK]'
Q1.
When we use this sentence as training data,according to your code
if masked_lm_labels is not None:
loss_fct = CrossEntropyLoss(ignore_index=-1)
masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), masked_lm_labels.view(-1))
return masked_lm_loss
seem the loss is a sum of all word in this sentence, not the single word "henson", am I right? But in my opinion, we only need to calculate the masked word's loss, not the whole sentence?
Q2.
It's also a question about masked, "chooses 15% of tokens at random" in the paper, I don't know how to understand it... For each word, a probability of 15% to be masked or just 15% of the sentence is masked?
Hope you could help me fix them.
By the way, the notes in line 731 in: pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py should be : if masked_lm_labels is not None, missed a word "not".