-
Notifications
You must be signed in to change notification settings - Fork 686
Add Echo parameter to llama runner, jni+java layer, and demo app #5011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5011
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d081b02 with merge base c83fd2e ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
linter 😶🌫️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Echo flag is added in the runner side via pytorch#5011 by Chirag. Now, we update the app side to leverage the new echo flag, so that we don't display the user prompt in response. Differential Revision: D62250116
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
if (echo) { | ||
wrapped_callback(ET_UNWRAP(tokenizer_->decode(cur_token, cur_token))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should always call wrapped_callback
, otherwise we are missing the first generated token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my testing, this callback would just return a newline. We also tested this and the output works well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm saying it doesn't matter what that token is. Semantically it is the first token the model generates, by prefilling the prompt, so we should trigger callback on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it! @larryliu0820, I've removed the echo around the first token generation.
public void onResult(String result) { | ||
mResultMessage.appendText(result); | ||
run(); | ||
if(result.equals("\n\n")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Riandy fyi with this check on the app side.
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Echo flag is added in the runner side via #5011 by Chirag. Now, we update the app side to leverage the new echo flag, so that we don't display the user prompt in response. Differential Revision: D62250116
2b7730d
to
a4b63e0
Compare
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
To allow the developer to set whether the response should echo (i.e. include) the input prompt.
We'll default to true since that will maintain the existing flow.