Some questions in Loss Function for MaskedLM

Use the same sentence in your **Usage** Section:
```
# Tokenized input
text = "Who was Jim Henson ? Jim Henson was a puppeteer"
tokenized_text = tokenizer.tokenize(text)

# Mask a token that we will try to predict back with `BertForMaskedLM`
masked_index = 6
tokenized_text[masked_index] = '[MASK]'
```

Q1. 
When we use this sentence as training data，according to your code
```
 if masked_lm_labels is not None:
            loss_fct = CrossEntropyLoss(ignore_index=-1)
            masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), masked_lm_labels.view(-1))
            return masked_lm_loss
```
seem the loss is a sum of all word in this sentence, not the single word "henson", am I  right? But in my opinion, we only need to calculate the **masked** word's loss, not the whole sentence?

Q2.
It's also a question about masked, "chooses 15% of tokens at random" in the paper, I don't know how to understand it... For each word, a probability of 15% to be masked or just 15% of the sentence is masked?
Hope you could help me fix them.

By the way, the  notes in line 731 in: pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py should be :  if `masked_lm_labels` is  not `None`, missed a word "not".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Some questions in Loss Function for MaskedLM #144

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Some questions in Loss Function for MaskedLM #144

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions