GPT2 past usage #5586

cronoik · 2020-07-07T21:51:27Z

Hello everyone,

I tried to answer this stackoverflow question and stumbled about a strange beheaviour I can't explain.

The following code will calculate the loss for a sentence with different single words injected:

from  transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch

model = GPT2LMHeadModel.from_pretrained("gpt2")
model.eval()
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

def score(sentence):
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    loss = model(tensor_input, labels=tensor_input)
    return -loss[0].item()

candidates = ["watch", "run", "think", "apple", "light"]
sent_template = "I like sitting in my new chair and {} about life"
print({candidate: score(sent_template.format(candidate)) for candidate in candidates})

Output:

{'watch': -5.406847953796387, 'run': -5.533411502838135, 'think': -4.525279521942139, 'apple': -6.158637046813965, 'light': -5.835141658782959}

Now I wanted to use the past parameter according to documentation and expected the same result:

from  transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch

model = GPT2LMHeadModel.from_pretrained("gpt2")
model.eval()
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

past = "I like sitting in my new chair and"
past_tokenize_input = tokenizer.tokenize(past)
past_tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(past_tokenize_input)])

_, _, past = model(past_tensor_input, labels=past_tensor_input)

def score(sentence, past):
    tokenize_input = tokenizer.tokenize(sentence, )
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])

    loss = model(tensor_input, labels=tensor_input, past=past)
    return -loss[0].item()


candidates = ["watch", "run", "think", "apple", "light"]
sent_template = " {} about life"
print({candidate: score(sent_template.format(candidate), past) for candidate in candidates})

but the loss is different:

{'watch': -7.811002731323242, 'run': -6.370519638061523, 'think': -3.460831642150879, 'apple': -9.08120346069336, 'light': -8.28120231628418}

Is this the intended behaviour or am I doing something wrong?

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2020-07-08T09:51:39Z

Hey @cronoik,

Thanks for your issue. It does not really surprise me that the loss is different.
In the first case the following loss is calculated:

loss = CrossEntropy(input_ids: "I like sitting in my new chair and {} about" vs. labels: "like sitting in my new chair and {} about life").

where as in the second case the following loss is calculated:

loss = CrossEntropy(input_ids: "{} about" vs. labels: "about life").

This is simplied - in reality the loss between the tokens of those words are calculated.
The important part to note here is that 1) past should not be used for training. It should be used to speed up inference.
2) When using past only the output embeddings of the input_ids (in your case for "{} about life") are calculated and not also for the "cached" past input_ids.

Hope this answers your question

cronoik · 2020-07-19T13:08:43Z

@patrickvonplaten Thank you a lot for your answer.

chaeyoon-jang · 2022-05-12T14:56:15Z

Hello! I have a question for gpt-2 lmhead model's input 'past_key_values'
I want to use this option for model.generate module but there is an error if I use this option by specifying **model_specific_kwargs={'past':past} in model.generate's inputs... dimension error...
what should I do for using this option for generation...?

patrickvonplaten · 2022-05-13T09:33:07Z

Hey @chaeyoon-jang,

Could you open a new issue for this?

patrickvonplaten self-assigned this Jul 8, 2020

patrickvonplaten closed this as completed Jul 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT2 past usage #5586

GPT2 past usage #5586

cronoik commented Jul 7, 2020

patrickvonplaten commented Jul 8, 2020

cronoik commented Jul 19, 2020 •

edited

chaeyoon-jang commented May 12, 2022

patrickvonplaten commented May 13, 2022

GPT2 past usage #5586

GPT2 past usage #5586

Comments

cronoik commented Jul 7, 2020

patrickvonplaten commented Jul 8, 2020

cronoik commented Jul 19, 2020 • edited

chaeyoon-jang commented May 12, 2022

patrickvonplaten commented May 13, 2022

cronoik commented Jul 19, 2020 •

edited