-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qs on Understanding Lookahead and Jacobi #37
Comments
I'm not one of the authors, but I can answer.
As for your idea, it's certainly viable to use either [MASK] tokens or 0-embeddings. Even if it improves upon the performance, I don't think that the improvement will be huge, but you're welcome to try it out. |
Thanks very much. As an aside, apparently TGI tried look ahead but found little speed up for the added compute. |
You mean Huggingface's |
Yeah makes sense. The comment I was referencing is this one: huggingface/text-generation-inference#1169 (comment) Thanks |
I'm assuming that when Olivier Dehaene mentioned that it was tested internally, that he was referring to Joao Gante's test (Gante works at Huggingface). See huggingface/transformers#27649 (comment) for details. |
Wow yeah that's a great post from Joao, thanks for sharing that. I didn't appreciate FA2 compatibility was a consideration too. |
Incidentally, it seems like the original Jacobi decoding paper uses [PAD] tokens instead of random vocabulary tokens. |
Thanks for putting this blog together.
Jacobi is mentioned a lot in the blog, but I don't really see that as central... Basically we're just randomly guessing a token that is W tokens away, using previous forward passes to improve the quality of guesses within the window, and then using that as an ngram database?
To further improve the quality of guesses, would it be an idea to just mask the input effect completely - rather than guessing the tokens. My sense is that - because attention is so strong for nearby tokens - that guessing the tokens is worse than actually passing through blank information for that guess position. That would allow the decoded output to be based purely on the info from tokens we do know 100%.
The text was updated successfully, but these errors were encountered: