Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experimental][StarCode] KV Cache Injection #2080

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Feb 15, 2024

Feature Description

The results of my experimentation with the tiny_starcoder model.

Findings:

  • the original KV cache is being added not as separate arrays: past_key_values.{attn_block_id}.values and past_key_values.{attn_block_id}.keys, but as a join array of keys and values. Did not get to look into breaking those two down, but by analyzing the onnx graph I do not see why we could not do it
  • the causal mask for this model has different dimensions than what we usually assume. This could be fixed by adding a node after the causal_mask input, that applies the appropriate permutation to the input to patch this.

This is an experimental branch, for which I will, for now, stop the development due to other priorities. To revisit in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant