Add warmup for Llama #5756

digantdesai · 2024-09-30T16:42:42Z

Load the model. Run the everything twice. Reset stats in between two runs. Also decrease logging level for warm up.

Notes:

Tested on Android and Mac. With Llama2 and Llama3 - with temperature=0 produces same output.
This warm up option is disabled by default.
This is inspired from llama.cpp options [1, 2].
Sample runs

pytorch-bot · 2024-09-30T16:42:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5756

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 3668b26 with merge base 905b88c ():

NEW FAILURE - The following job has failed:

trunk / test-pybind-build-macos (cmake) / macos-job (gh)
Library not loaded: @rpath/liblz4.1.dylib

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / test-coreml-delegate / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-09-30T16:43:21Z

@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-30T23:35:43Z

@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mcr229 · 2024-09-30T23:49:31Z

examples/models/llama2/runner/runner.h

+      bool warming = false);
+  ::executorch::runtime::Error warmup(
+      const std::string& prompt,
+      int32_t seq_len = 128);


should we move seq_len out to a constant variable DEFAULT_SEQ_LEN that is shared by both generate and warmup?

so warmup is done on the same prompt/seq_len as a real run today. Default is just something I followed from generate().

mcr229 · 2024-09-30T23:50:31Z

examples/models/llama2/runner/runner.cpp

-  ET_LOG(
-      Info,
+  if (!warmup) {
+    printf("\n");


safe_printf?

mcr229 · 2024-09-30T23:52:40Z

examples/models/llama2/runner/runner.cpp

+  if (warmup) {
+    ET_LOG(Info, "Doing a warmup run...");
+  }


maybe add this before generate in Runner::warmup?

mcr229 · 2024-09-30T23:53:23Z

.ci/scripts/test_llama.sh



-RUNTIME_ARGS="--model_path=${EXPORTED_MODEL_NAME} --tokenizer_path=tokenizer.bin --prompt=Once --temperature=0 --seq_len=10"
+RUNTIME_ARGS="--model_path=${EXPORTED_MODEL_NAME} --tokenizer_path=tokenizer.bin --prompt=Once --temperature=0 --seq_len=10 --warmup=1"


why warm up in CI?

Yeah good question. This is so we don't break things with warmup. I debated this and was thinking about doing two runs and compare outputs w/ and w/o warmup, but CI is expensive so just did w/ warmup and compared output after.

mcr229

take a look at my nits. otherwise is fine

facebook-github-bot · 2024-10-01T02:11:02Z

@digantdesai merged this pull request in 660ef77.

kimishpatel · 2024-10-01T14:46:38Z

examples/models/llama2/runner/runner.cpp

 }

+Error Runner::warmup(const std::string& prompt, int32_t seq_len) {
+  Error err = generate(


It should be prefill, right? not generate? or generate calls prefill? And you are running not warming up by prefill only but entire sequence generation?

also how do you enable this for llava

digantdesai added 3 commits September 30, 2024 10:07

update gitignore

e156657

[llm] add stat reset()

f61d801

[llama] Add warmup

5e5c3fe

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 30, 2024

digantdesai requested a review from kimishpatel September 30, 2024 16:43

digantdesai force-pushed the llama_warmup2 branch 2 times, most recently from 0c5afba to 02efbeb Compare September 30, 2024 22:46

add warm up option for test

3668b26

digantdesai force-pushed the llama_warmup2 branch from 02efbeb to 3668b26 Compare September 30, 2024 23:35

metascroy approved these changes Sep 30, 2024

View reviewed changes

mcr229 reviewed Sep 30, 2024

View reviewed changes

examples/models/llama2/runner/runner.cpp

ET_LOG(

Info,

if (!warmup) {

printf("\n");

Copy link

Contributor

mcr229 Sep 30, 2024 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe_printf?

mcr229 reviewed Sep 30, 2024

View reviewed changes

mcr229 approved these changes Sep 30, 2024

View reviewed changes

facebook-github-bot closed this in 660ef77 Oct 1, 2024

facebook-github-bot added the Merged label Oct 1, 2024

kimishpatel reviewed Oct 1, 2024

View reviewed changes



		RUNTIME_ARGS="--model_path=${EXPORTED_MODEL_NAME} --tokenizer_path=tokenizer.bin --prompt=Once --temperature=0 --seq_len=10"
		RUNTIME_ARGS="--model_path=${EXPORTED_MODEL_NAME} --tokenizer_path=tokenizer.bin --prompt=Once --temperature=0 --seq_len=10 --warmup=1"

Add warmup for Llama #5756

Add warmup for Llama #5756

Uh oh!

Conversation

digantdesai commented Sep 30, 2024

Uh oh!

pytorch-bot bot commented Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5756

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

mcr229 Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

digantdesai Oct 1, 2024

Choose a reason for hiding this comment

Uh oh!

mcr229 Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcr229 Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

mcr229 Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

digantdesai Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcr229 left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 1, 2024

Uh oh!

kimishpatel Oct 1, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pytorch-bot bot commented Sep 30, 2024 •

edited

Loading

mcr229 Sep 30, 2024 •

edited

Loading

digantdesai Oct 1, 2024 •

edited

Loading