-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for returning alternative tokens #297
Conversation
446455f
to
2f714e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clean and comprehensive PR, just had a question about how we handle multiple requests in the batch.
) | ||
alternative_token_texts.append(alternative_token_text) | ||
all_input_ids.pop() | ||
alternative_tokens = AlternativeTokens( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we're overriding this at every iteration of the loop. Should alternative_tokens
instead be a list of AlternativeTokens
, one for each request in the batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are adding the alternative_tokens
to the Generation
that is created for each request, so I don't see a problem with overriding it. Basically I just copied from what is done with the PrefillTokens
. The variables that go into AlternativeTokens
are kind of badly named though, so I added another commit to fix that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my mistake I overlooked where the Generation
creation was happening in this loop when looking at the PR earlier. Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for this contribution :)
) | ||
alternative_token_texts.append(alternative_token_text) | ||
all_input_ids.pop() | ||
alternative_tokens = AlternativeTokens( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my mistake I overlooked where the Generation
creation was happening in this loop when looking at the PR earlier. Looks good!
request_alternative_token_logprobs = alternative_token_logprobs[i][:num_alternatives] | ||
|
||
# Decode tokens | ||
request_alternative_token_texts = list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super minor nit, but generally I prefer using []
to list()
to avoid symbol lookup. Definitely not a blocker.
Add support for returning alternative tokens
Fixes #298
So far, alternative tokens are only supported on
FlashCausalLM
based models.Before submitting
to it if that's the case: Include logprobs for alternative, less probable tokens in the generation response #298
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.