Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT2 past usage #5586

Closed
cronoik opened this issue Jul 7, 2020 · 4 comments
Closed

GPT2 past usage #5586

cronoik opened this issue Jul 7, 2020 · 4 comments
Assignees

Comments

@cronoik
Copy link
Contributor

cronoik commented Jul 7, 2020

Hello everyone,

I tried to answer this stackoverflow question and stumbled about a strange beheaviour I can't explain.

The following code will calculate the loss for a sentence with different single words injected:

from  transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch

model = GPT2LMHeadModel.from_pretrained("gpt2")
model.eval()
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

def score(sentence):
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    loss = model(tensor_input, labels=tensor_input)
    return -loss[0].item()

candidates = ["watch", "run", "think", "apple", "light"]
sent_template = "I like sitting in my new chair and {} about life"
print({candidate: score(sent_template.format(candidate)) for candidate in candidates})

Output:

{'watch': -5.406847953796387, 'run': -5.533411502838135, 'think': -4.525279521942139, 'apple': -6.158637046813965, 'light': -5.835141658782959}

Now I wanted to use the past parameter according to documentation and expected the same result:

from  transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch

model = GPT2LMHeadModel.from_pretrained("gpt2")
model.eval()
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

past = "I like sitting in my new chair and"
past_tokenize_input = tokenizer.tokenize(past)
past_tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(past_tokenize_input)])

_, _, past = model(past_tensor_input, labels=past_tensor_input)

def score(sentence, past):
    tokenize_input = tokenizer.tokenize(sentence, )
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])

    loss = model(tensor_input, labels=tensor_input, past=past)
    return -loss[0].item()


candidates = ["watch", "run", "think", "apple", "light"]
sent_template = " {} about life"
print({candidate: score(sent_template.format(candidate), past) for candidate in candidates})

but the loss is different:

{'watch': -7.811002731323242, 'run': -6.370519638061523, 'think': -3.460831642150879, 'apple': -9.08120346069336, 'light': -8.28120231628418}

Is this the intended behaviour or am I doing something wrong?

@patrickvonplaten patrickvonplaten self-assigned this Jul 8, 2020
@patrickvonplaten
Copy link
Contributor

Hey @cronoik,

Thanks for your issue. It does not really surprise me that the loss is different.
In the first case the following loss is calculated:

loss = CrossEntropy(input_ids: "I like sitting in my new chair and {} about" vs. labels: "like sitting in my new chair and {} about life").

where as in the second case the following loss is calculated:

loss = CrossEntropy(input_ids: "{} about" vs. labels: "about life").

This is simplied - in reality the loss between the tokens of those words are calculated.
The important part to note here is that 1) past should not be used for training. It should be used to speed up inference.
2) When using past only the output embeddings of the input_ids (in your case for "{} about life") are calculated and not also for the "cached" past input_ids.

Hope this answers your question

@cronoik
Copy link
Contributor Author

cronoik commented Jul 19, 2020

@patrickvonplaten Thank you a lot for your answer.

@chaeyoon-jang
Copy link

Hello! I have a question for gpt-2 lmhead model's input 'past_key_values'
I want to use this option for model.generate module but there is an error if I use this option by specifying **model_specific_kwargs={'past':past} in model.generate's inputs... dimension error...
what should I do for using this option for generation...?

@patrickvonplaten
Copy link
Contributor

Hey @chaeyoon-jang,

Could you open a new issue for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants