Skip to content

Conversation

kimishpatel
Copy link
Contributor

@kimishpatel kimishpatel commented Feb 28, 2024

Stack from ghstack (oldest at bottom):

For KV cache with IO tHis results in

  1. allocating kv cache in the memory plan but also allocated by llama runner
  2. Doing actual copy of kv cache

Also we should really make plan_input = false by default. I dont imagine a case
where this does not result in making copies. Planning for output is fine but
still dangerous as people may assume having reference to output tensor is all
good without realizing the underlying memory being shared.

Differential Revision: D54161288

For KV cache with IO tHis results in
1. allocating kv cache in the memory plan but also allocated by llama runner
2. Doing actual copy of kv cache

Also we should really make plan_input = false by default. I dont imagine a case
where this does not result in making copies. Planning for output is fine but
still dangerous as people may assume having reference to output tensor is all
good without realizing the underlying memory being shared.

Differential Revision: [D54161288](https://our.internmc.facebook.com/intern/diff/D54161288/)

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Feb 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2155

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 5682fe7 with merge base a78a07e (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 28, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D54161288

For KV cache with IO tHis results in
1. allocating kv cache in the memory plan but also allocated by llama runner
2. Doing actual copy of kv cache

Also we should really make plan_input = false by default. I dont imagine a case
where this does not result in making copies. Planning for output is fine but
still dangerous as people may assume having reference to output tensor is all
good without realizing the underlying memory being shared.

Differential Revision: [D54161288](https://our.internmc.facebook.com/intern/diff/D54161288/)

[ghstack-poisoned]
For KV cache with IO tHis results in
1. allocating kv cache in the memory plan but also allocated by llama runner
2. Doing actual copy of kv cache

Also we should really make plan_input = false by default. I dont imagine a case
where this does not result in making copies. Planning for output is fine but
still dangerous as people may assume having reference to output tensor is all
good without realizing the underlying memory being shared.

Differential Revision: [D54161288](https://our.internmc.facebook.com/intern/diff/D54161288/)

[ghstack-poisoned]
For KV cache with IO tHis results in
1. allocating kv cache in the memory plan but also allocated by llama runner
2. Doing actual copy of kv cache

Also we should really make plan_input = false by default. I dont imagine a case
where this does not result in making copies. Planning for output is fine but
still dangerous as people may assume having reference to output tensor is all
good without realizing the underlying memory being shared.

Differential Revision: [D54161288](https://our.internmc.facebook.com/intern/diff/D54161288/)

[ghstack-poisoned]
For KV cache with IO tHis results in
1. allocating kv cache in the memory plan but also allocated by llama runner
2. Doing actual copy of kv cache

Also we should really make plan_input = false by default. I dont imagine a case
where this does not result in making copies. Planning for output is fine but
still dangerous as people may assume having reference to output tensor is all
good without realizing the underlying memory being shared.

Differential Revision: [D54161288](https://our.internmc.facebook.com/intern/diff/D54161288/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 76cbfb7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants