usage of past_key_values produces different output than the whole sequence at once #26344

IvanSedykh · 2023-09-22T13:52:32Z

System Info

transformers 4.33.1

Who can help?

@ArthurZucker @younesbelkada @gan

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

when I use past_key_values the model produces not the same logits as when I input the whole sequence at once.

Please, follow the code snippet below for more details.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch


model_name = "codellama/CodeLlama-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype=torch.float16, low_cpu_mem_usage=True, device_map="auto"
)


prompt = """
import json

fname = 'some_file.json'
with open(fname) as f:
    data = json."""

all_input_ids = tokenizer([prompt], return_tensors='pt').input_ids

# process the whole sequence
with torch.no_grad():
    all_outputs = model(all_input_ids)
# get logits for the last token
last_token_logits = all_outputs.logits[0][-1:]

with torch.no_grad():
    # process the sequence except the last token
    kv = model(all_input_ids[:, :-1]).past_key_values
    # input only the last token with previous kv_cache
    new_output = model(all_input_ids[:, -1:], past_key_values=kv)
# extract the last token logits
new_last_token_logits = new_output.logits[0][-1:]

# theese two distributions should be equal, but they are not.
print(torch.dist(last_token_logits, new_last_token_logits))
# tensor(0.4462)
assert torch.allclose(last_token_logits, new_last_token_logits)  #fails

Expected behavior

If I've got the idea of kv_caching correctly the outputs should be exactly the same. This is important because the generate method heavily relies on past_key_values. So if there is a bug somewhere, it affects a lot of applications.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2023-09-22T14:18:17Z

Hey! Thanks for opening and issue. This is pretty much a duplicate of #25420, where we deep dive into this!

gante · 2023-10-23T14:54:52Z

Hey @IvanSedykh 👋

As Arthur wrote, this is a duplicate of #25420 -- you can find a detailed answer here

IvanSedykh · 2023-10-23T15:57:09Z

Hi @gante !
Than you for this investigation, it's much more clear now. 🤗

github-actions · 2023-11-17T08:04:59Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

This comment was marked as spam.

Sign in to view

gante mentioned this issue Oct 23, 2023

Possible Bug with KV Caching in Llama (original) model #25420

Closed

4 tasks

github-actions bot closed this as completed Nov 25, 2023

vdaita mentioned this issue Jul 16, 2024

Generation using cache gives weird sentences romsto/Speculative-Decoding#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

usage of past_key_values produces different output than the whole sequence at once #26344

usage of past_key_values produces different output than the whole sequence at once #26344

IvanSedykh commented Sep 22, 2023

ArthurZucker commented Sep 22, 2023

This comment was marked as spam.

gante commented Oct 23, 2023

IvanSedykh commented Oct 23, 2023

github-actions bot commented Nov 17, 2023

usage of past_key_values produces different output than the whole sequence at once #26344

usage of past_key_values produces different output than the whole sequence at once #26344

Comments

IvanSedykh commented Sep 22, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Sep 22, 2023

This comment was marked as spam.

gante commented Oct 23, 2023

IvanSedykh commented Oct 23, 2023

github-actions bot commented Nov 17, 2023