Skip to content
Discussion options

You must be logged in to vote

You could truncate the prefix tokens by adding something like

decode_options["prompt"] = decode_options["prompt"][-10:]

after:

decode_options["prompt"] = all_tokens[prompt_reset_since:]

That said, prompt tokens affect the inference time only during the first forward pass through the decoder, which is usually not very significant compared to the total autoregressive decoding time which usually involves tens or hundreds of forward passes through the decoder.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@amnike
Comment options

@jongwook
Comment options

Answer selected by jongwook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants