Skip to content

Conversation

@digantdesai
Copy link
Contributor

Load the model. Run the everything twice. Reset stats in between two runs. Also decrease logging level for warm up.

Notes:

Tested on Android and Mac. With Llama2 and Llama3 - with temperature=0 produces same output.
This warm up option is disabled by default.
This is inspired from llama.cpp options [1, 2].
Sample runs

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 30, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5756

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 3668b26 with merge base 905b88c (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 30, 2024
@facebook-github-bot
Copy link
Contributor

@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@digantdesai digantdesai force-pushed the llama_warmup2 branch 2 times, most recently from 0c5afba to 02efbeb Compare September 30, 2024 22:46
@facebook-github-bot
Copy link
Contributor

@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

bool warming = false);
::executorch::runtime::Error warmup(
const std::string& prompt,
int32_t seq_len = 128);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move seq_len out to a constant variable DEFAULT_SEQ_LEN that is shared by both generate and warmup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so warmup is done on the same prompt/seq_len as a real run today. Default is just something I followed from generate().

ET_LOG(
Info,
if (!warmup) {
printf("\n");
Copy link
Contributor

@mcr229 mcr229 Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe_printf?

Comment on lines +173 to +175
if (warmup) {
ET_LOG(Info, "Doing a warmup run...");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add this before generate in Runner::warmup?



RUNTIME_ARGS="--model_path=${EXPORTED_MODEL_NAME} --tokenizer_path=tokenizer.bin --prompt=Once --temperature=0 --seq_len=10"
RUNTIME_ARGS="--model_path=${EXPORTED_MODEL_NAME} --tokenizer_path=tokenizer.bin --prompt=Once --temperature=0 --seq_len=10 --warmup=1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why warm up in CI?

Copy link
Contributor Author

@digantdesai digantdesai Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good question. This is so we don't break things with warmup. I debated this and was thinking about doing two runs and compare outputs w/ and w/o warmup, but CI is expensive so just did w/ warmup and compared output after.

Copy link
Contributor

@mcr229 mcr229 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take a look at my nits. otherwise is fine

@facebook-github-bot
Copy link
Contributor

@digantdesai merged this pull request in 660ef77.

}

Error Runner::warmup(const std::string& prompt, int32_t seq_len) {
Error err = generate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be prefill, right? not generate? or generate calls prefill? And you are running not warming up by prefill only but entire sequence generation?

also how do you enable this for llava

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants