Why are write operations not executed for newly generated tokens in kv-cache.py

<img width="854" height="136" alt="Image" src="https://github.com/user-attachments/assets/e055d607-9482-4ad5-9211-8c6f44a0c413" />

After reading the benchmark code above, I have two questions:
Why does the benchmark not perform write operations for newly generated tokens?
Why does the benchmark simulate reading the KV cache only once when generating multiple tokens in a batch?

Or am I overthinking it — is it that there is no actual model that can do this, and this implementation is just trying to keep the code as simple as possible to maximize testing of storage pressure and performance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are write operations not executed for newly generated tokens in kv-cache.py #351

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why are write operations not executed for newly generated tokens in kv-cache.py #351

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions