Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17222
Note: Links to docs will display an error until the docs builds have been completed. ❌ 17 New FailuresAs of commit 9634244 with merge base 6f780c7 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
| # Module]`. | ||
| model.vocab_size, | ||
| llm_config.base.metadata, | ||
| use_ring_buffer=llm_config.model.local_global_attention is not None, |
There was a problem hiding this comment.
I don't think this is right. local global attention uses a sliding window which may or may not relevant to ring buffer.
There was a problem hiding this comment.
I think the high level seems right, we need to have this metadata serialized into .pte.
| } | ||
|
|
||
| auto error = runner->generate(prompt, config); | ||
| auto error2 = runner->generate(prompt, config); |
|
|
||
| // Resolve max_new_tokens based on config | ||
| // Check if ring buffer is enabled - if so, we can exceed context length | ||
| bool use_ring_buffer = metadata_.at(kUseRingBuffer); |
There was a problem hiding this comment.
I think we need to set this in llm_runner_helper not here.
| int64_t max_context_len = | ||
| metadata_.at(kMaxContextLen) - 0; // No start_pos offset | ||
| int32_t max_new_tokens = config.resolve_max_new_tokens(max_context_len, pos_); | ||
| // When ring buffer is enabled, use a large context length to allow unlimited | ||
| // generation. | ||
| int64_t effective_context_len = | ||
| use_ring_buffer ? INT64_MAX : max_context_len; | ||
| int32_t max_new_tokens = | ||
| config.resolve_max_new_tokens(effective_context_len, pos_); |
There was a problem hiding this comment.
This logic needs to be applied to text_llm_runner as well.
| int64_t effective_context_len = | ||
| use_ring_buffer ? INT64_MAX : max_context_len; | ||
| int max_new_tokens = | ||
| config.resolve_max_new_tokens(max_context_len, num_prompt_tokens); | ||
| config.resolve_max_new_tokens(effective_context_len, num_prompt_tokens); |
There was a problem hiding this comment.
Seems like duplicate code to me.
| {llm::kMaxContextLen, 128}, | ||
| {llm::kUseKVCache, true}, | ||
| {llm::kUseSDPAWithKVCache, false}, | ||
| {llm::kUseRingBuffer, true}, |
There was a problem hiding this comment.
I think we need to set max_context_len to INT MAX here
Refactor position shift handling in attention sink to use torch buffers and dynamic shape conditions.
Summary
Allow exceeding context window
Test plan
CI