Loss mask for fine-tuning GPT2LMHeadModel model #7135

zhujl1991 · 2020-09-15T03:13:52Z

If we use padding for short-sentence fine-tune data, when fine-tuning GPT2LMHeadModel, should we change the code here

transformers/src/transformers/modeling_gpt2.py

Line 744 in 48ff6d5

    
           loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))

to exclude the loss for padding tokens?

@patrickvonplaten @thomwolf

zhujl1991 · 2020-09-15T21:52:30Z

It has already been mentioned here #2001 (see "Bug: Padded tokens are not excluded from the loss" session).
Any plan to fix this?

patil-suraj · 2020-09-16T16:38:38Z

Hi GPT-2 has no pad token so you can either introduce new pad token or set the eos toke as pad token
tokenizer.pad_token_id = tokenizer.eos_token_id

and then set the pad tokens in labels to -100 which is the default ignore index for CrossEntropyLoss
labels[labels == self.tokenizer.pad_token_id] = -100

zhujl1991 · 2020-09-16T17:38:53Z

Hi GPT-2 has no pad token so you can either introduce new pad token or set the eos toke as pad token
tokenizer.pad_token_id = tokenizer.eos_token_id

and then set the pad tokens in labels to -100 which is the default ignore index for CrossEntropyLoss
labels[labels == self.tokenizer.pad_token_id] = -100

Thanks. Just get aware of the ignore_index parameter https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

ttanida · 2022-07-02T21:14:54Z

For fine-tuning the GPT2 model, it's necessary to manually prepend the bos_token and append eos_token to the input, as has been established here: #3311

Setting pad_token = eos_token and running labels[labels == pad_token_id] = -100 would therefore be a problem in my opinion, since we would not only ignore padding tokens, but also eos_tokens at the end of sentences for loss computation.

I solved the problem by first converting the attention_mask to boolean values, and then inverting the boolean attention_mask. Then labels[inv_bool_attention_mask] = -100, such that padding tokens are ignored, but no eos_tokens.

itsnamgyu · 2023-03-14T12:06:50Z

Just to save the hassle for some folk

from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

tokenizer.pad_token = tokenizer.eos_token

print("EOS", tokenizer.convert_tokens_to_ids(tokenizer.eos_token))
print("PAD", tokenizer.convert_tokens_to_ids(tokenizer.pad_token))

string = "Hello World!"
string += tokenizer.eos_token  # manually append eos since this is not done by GPT2Tokenizer
# string = tokenizer.bos_token + string  # optionally prepend bos (which is actually the same as eos for GPT2Tokenizer)

tokenized = tokenizer(string, padding="max_length", max_length=10, return_tensors="pt")
input_ids = tokenized["input_ids"]
attention_mask = tokenized["attention_mask"]

print("INPUT_IDS BEFORE")
print(input_ids)
print("ATTENTION_MASK")
print(attention_mask)

input_ids[~attention_mask.bool()] = -100  # disable loss for padding tokens (i.e., eos tokens meant for padding)

print("INPUT_IDS AFTER")
print(input_ids)

Result:

EOS 50256
PAD 50256
INPUT_IDS BEFORE
tensor([[15496,  2159,     0, 50256, 50256, 50256, 50256, 50256, 50256, 50256]])
ATTENTION_MASK
tensor([[1, 1, 1, 1, 0, 0, 0, 0, 0, 0]])
INPUT_IDS AFTER
tensor([[15496,  2159,     0, 50256,  -100,  -100,  -100,  -100,  -100,  -100]])

zhujl1991 closed this as completed Sep 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss mask for fine-tuning GPT2LMHeadModel model #7135

Loss mask for fine-tuning GPT2LMHeadModel model #7135

zhujl1991 commented Sep 15, 2020 •

edited

zhujl1991 commented Sep 15, 2020

patil-suraj commented Sep 16, 2020

zhujl1991 commented Sep 16, 2020

ttanida commented Jul 2, 2022

itsnamgyu commented Mar 14, 2023

Loss mask for fine-tuning GPT2LMHeadModel model #7135

Loss mask for fine-tuning GPT2LMHeadModel model #7135

Comments

zhujl1991 commented Sep 15, 2020 • edited

zhujl1991 commented Sep 15, 2020

patil-suraj commented Sep 16, 2020

zhujl1991 commented Sep 16, 2020

ttanida commented Jul 2, 2022

itsnamgyu commented Mar 14, 2023

zhujl1991 commented Sep 15, 2020 •

edited