[Executorch][Llama] Dont memory plan for inputs #2155

kimishpatel · 2024-02-28T04:22:29Z

Stack from ghstack (oldest at bottom):

For KV cache with IO tHis results in

allocating kv cache in the memory plan but also allocated by llama runner
Doing actual copy of kv cache

Also we should really make plan_input = false by default. I dont imagine a case
where this does not result in making copies. Planning for output is fine but
still dangerous as people may assume having reference to output tensor is all
good without realizing the underlying memory being shared.

Differential Revision: D54161288

For KV cache with IO tHis results in 1. allocating kv cache in the memory plan but also allocated by llama runner 2. Doing actual copy of kv cache Also we should really make plan_input = false by default. I dont imagine a case where this does not result in making copies. Planning for output is fine but still dangerous as people may assume having reference to output tensor is all good without realizing the underlying memory being shared. Differential Revision: [D54161288](https://our.internmc.facebook.com/intern/diff/D54161288/) [ghstack-poisoned]

pytorch-bot · 2024-02-28T04:22:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2155

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 5682fe7 with merge base a78a07e ():

NEW FAILURES - The following jobs have failed:

pull / test-llama-runner-linux (fp16, buck2) / linux-job (gh)
RuntimeError: Command docker exec -t d7e9cb9d4a26e4796265010675f8e08731135e3e1494c06ff0eea53c30b78f17 /exec failed with exit code 8
pull / test-llama-runner-linux (fp16, cmake) / linux-job (gh)
RuntimeError: Command docker exec -t c8b20b0e51238637051fdd6cf1ecebc88445ce14632aca19bf3dfa8f90376bfb /exec failed with exit code 8
pull / test-llama-runner-linux (fp32, buck2) / linux-job (gh)
RuntimeError: Command docker exec -t 97af21a2588ceb81ae05797827472ef9613bac5910422e722497b1bc6d1408bb /exec failed with exit code 8
pull / test-llama-runner-linux (fp32, cmake) / linux-job (gh)
RuntimeError: Command docker exec -t 2a4d2238c0ea858afd74653510247959b5ea36fad4d6889a62d5c4ee0ec0567b /exec failed with exit code 8

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-02-28T04:23:11Z

This pull request was exported from Phabricator. Differential Revision: D54161288

For KV cache with IO tHis results in 1. allocating kv cache in the memory plan but also allocated by llama runner 2. Doing actual copy of kv cache Also we should really make plan_input = false by default. I dont imagine a case where this does not result in making copies. Planning for output is fine but still dangerous as people may assume having reference to output tensor is all good without realizing the underlying memory being shared. Differential Revision: [D54161288](https://our.internmc.facebook.com/intern/diff/D54161288/) [ghstack-poisoned]

facebook-github-bot · 2024-02-29T17:55:12Z

This pull request has been merged in 76cbfb7.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 28, 2024

This was referenced Feb 28, 2024

[Executorch][llama]] Add python impl of sdpa_with_kv_cache #2153

Closed

[Executorch][llama] modify model to use sdpa_with_kv_cache op #2154

Closed

[Executorch][llama] use optimized cpu op lib #2156

Closed

facebook-github-bot added the fb-exported label Feb 28, 2024

mergennachin self-requested a review February 28, 2024 05:02

mergennachin approved these changes Feb 28, 2024

View reviewed changes

kimishpatel added 4 commits February 27, 2024 21:04

facebook-github-bot closed this in 76cbfb7 Feb 29, 2024

facebook-github-bot added the Merged label Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Executorch][Llama] Dont memory plan for inputs #2155

[Executorch][Llama] Dont memory plan for inputs #2155

Uh oh!

kimishpatel commented Feb 28, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 28, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Feb 28, 2024

Uh oh!

facebook-github-bot commented Feb 29, 2024

Uh oh!

Uh oh!

[Executorch][Llama] Dont memory plan for inputs #2155

[Executorch][Llama] Dont memory plan for inputs #2155

Uh oh!

Conversation

kimishpatel commented Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2155

❌ 4 New Failures

Uh oh!

facebook-github-bot commented Feb 28, 2024

Uh oh!

facebook-github-bot commented Feb 29, 2024

Uh oh!

Uh oh!

kimishpatel commented Feb 28, 2024 •

edited

Loading

pytorch-bot bot commented Feb 28, 2024 •

edited

Loading