-
Notifications
You must be signed in to change notification settings - Fork 754
[llava][16/N] Extract out prefill logic into a new class #4585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llava][16/N] Extract out prefill logic into a new class #4585
Conversation
Depends on whether parallel or sequential prefill is chosen, prefill() calls `TextDecoderRunner.step()` to prefill prompt tokens to LLM. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4585
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit c1d970b with merge base 92edd04 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Depends on whether parallel or sequential prefill is chosen, prefill() calls `TextDecoderRunner.step()` to prefill prompt tokens to LLM. Differential Revision: [D60927756](https://our.internmc.facebook.com/intern/diff/D60927756) [ghstack-poisoned]
|
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Depends on whether parallel or sequential prefill is chosen, prefill() calls `TextDecoderRunner.step()` to prefill prompt tokens to LLM. Differential Revision: [D60927756](https://our.internmc.facebook.com/intern/diff/D60927756) [ghstack-poisoned]
Depends on whether parallel or sequential prefill is chosen, prefill() calls `TextDecoderRunner.step()` to prefill prompt tokens to LLM. Differential Revision: [D60927756](https://our.internmc.facebook.com/intern/diff/D60927756) [ghstack-poisoned]
|
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Depends on whether parallel or sequential prefill is chosen, prefill() calls `TextDecoderRunner.step()` to prefill prompt tokens to LLM. Differential Revision: [D60927756](https://our.internmc.facebook.com/intern/diff/D60927756) [ghstack-poisoned]
|
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Depends on whether parallel or sequential prefill is chosen, prefill() calls `TextDecoderRunner.step()` to prefill prompt tokens to LLM. Differential Revision: [D60927756](https://our.internmc.facebook.com/intern/diff/D60927756) [ghstack-poisoned]
|
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Depends on whether parallel or sequential prefill is chosen, prefill() calls `TextDecoderRunner.step()` to prefill prompt tokens to LLM. Differential Revision: [D60927756](https://our.internmc.facebook.com/intern/diff/D60927756) [ghstack-poisoned]
|
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Depends on whether parallel or sequential prefill is chosen, prefill() calls `TextDecoderRunner.step()` to prefill prompt tokens to LLM. ghstack-source-id: 1af03e2 Pull Request resolved: pytorch/executorch#4585
Stack from ghstack (oldest at bottom):
Depends on whether parallel or sequential prefill is chosen, prefill()
calls
TextDecoderRunner.step()to prefill prompt tokens to LLM.Differential Revision: D60927756