Skip to content

Some questions in Loss Function for MaskedLM #144

@wlhgtc

Description

@wlhgtc

Use the same sentence in your Usage Section:

# Tokenized input
text = "Who was Jim Henson ? Jim Henson was a puppeteer"
tokenized_text = tokenizer.tokenize(text)

# Mask a token that we will try to predict back with `BertForMaskedLM`
masked_index = 6
tokenized_text[masked_index] = '[MASK]'

Q1.
When we use this sentence as training data,according to your code

 if masked_lm_labels is not None:
            loss_fct = CrossEntropyLoss(ignore_index=-1)
            masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), masked_lm_labels.view(-1))
            return masked_lm_loss

seem the loss is a sum of all word in this sentence, not the single word "henson", am I right? But in my opinion, we only need to calculate the masked word's loss, not the whole sentence?

Q2.
It's also a question about masked, "chooses 15% of tokens at random" in the paper, I don't know how to understand it... For each word, a probability of 15% to be masked or just 15% of the sentence is masked?
Hope you could help me fix them.

By the way, the notes in line 731 in: pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py should be : if masked_lm_labels is not None, missed a word "not".

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions