Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qs on Understanding Lookahead and Jacobi #37

Closed
RonanKMcGovern opened this issue Dec 20, 2023 · 7 comments
Closed

Qs on Understanding Lookahead and Jacobi #37

RonanKMcGovern opened this issue Dec 20, 2023 · 7 comments

Comments

@RonanKMcGovern
Copy link

Thanks for putting this blog together.

  1. Regarding simple jacobi decoding:
  • The process starts with some random guesses for future tokens, correct?
  • As the process continues the guesses improve a little bit, but are still quite poor on average as guesses (because they always are based on a previous token that is wrong). So, only occasionally is a guess correct and therefore the next token can also be accepted.
  1. Regarding look ahead
  • Basically, a bank of n-grams is built up. And those n-grams are built up from decoding guessed input tokens. Correct?
  • I suppose adding the prompt itself (and any confirmed generated tokens) as a source of n-grams would probably also improve performance?
  1. Jacobi
    Jacobi is mentioned a lot in the blog, but I don't really see that as central... Basically we're just randomly guessing a token that is W tokens away, using previous forward passes to improve the quality of guesses within the window, and then using that as an ngram database?

To further improve the quality of guesses, would it be an idea to just mask the input effect completely - rather than guessing the tokens. My sense is that - because attention is so strong for nearby tokens - that guessing the tokens is worse than actually passing through blank information for that guess position. That would allow the decoded output to be based purely on the info from tokens we do know 100%.

@shermansiu
Copy link

I'm not one of the authors, but I can answer.

  1. Yes, random guesses are used initially. See Question on Initial guess tokens #8. But by the time we've gone through enough passes (i.e. the length of the window), the guesses are no longer completely random (but still noisy).
  2. Yes, a bank of n-grams is built up during decoding guessed input tokens.
  • Generally, prompts are not regenerated during inference... you start generating tokens after the prompt so you don't get any speedup from that. Using template-guided generation (e.g. see https://github.com/guidance-ai/guidance) is known to speed up inference.
  1. Jacobi is central to the idea because we are using the context from the known prefix to generate the subsequent token guesses. The idea is that because the contribution from the known prefix is larger than the guessed tokens after the first few passes, tokens that are closer to the known prefix will be more accurately guessed. There are no guarantees though.
  • What you're saying is more or less correct, except only recent n-grams are kept.

As for your idea, it's certainly viable to use either [MASK] tokens or 0-embeddings. Even if it improves upon the performance, I don't think that the improvement will be huge, but you're welcome to try it out.

@RonanKMcGovern
Copy link
Author

RonanKMcGovern commented Dec 22, 2023

Thanks very much. As an aside, apparently TGI tried look ahead but found little speed up for the added compute.

@shermansiu
Copy link

You mean Huggingface's transformers, right? Not TGI. But yeah, both Joao Gante and Louis-y-nlp in #19 noticed that you don't get much of a speedup if you don't have the FLOPS to spare.

@RonanKMcGovern
Copy link
Author

Yeah makes sense. The comment I was referencing is this one: huggingface/text-generation-inference#1169 (comment)

Thanks

@shermansiu
Copy link

I'm assuming that when Olivier Dehaene mentioned that it was tested internally, that he was referring to Joao Gante's test (Gante works at Huggingface). See huggingface/transformers#27649 (comment) for details.

@RonanKMcGovern
Copy link
Author

Wow yeah that's a great post from Joao, thanks for sharing that. I didn't appreciate FA2 compatibility was a consideration too.

@shermansiu
Copy link

Incidentally, it seems like the original Jacobi decoding paper uses [PAD] tokens instead of random vocabulary tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants