-
|
Every time a new result is created it captures the previous result and as this goes on it gets to the point it's capturing the previous 50 results then the processing time increases tremendously lol. I get that it's done like this for accuracy but, do you have a way or can point me in the direction of reducing the previous tokens or results captured? I'd really appreciate it |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
|
You could truncate the prefix tokens by adding something like after: Line 180 in 0f39c89 That said, prompt tokens affect the inference time only during the first forward pass through the decoder, which is usually not very significant compared to the total autoregressive decoding time which usually involves tens or hundreds of forward passes through the decoder. |
Beta Was this translation helpful? Give feedback.
You could truncate the prefix tokens by adding something like
after:
whisper/whisper/transcribe.py
Line 180 in 0f39c89
That said, prompt tokens affect the inference time only during the first forward pass through the decoder, which is usually not very significant compared to the total autoregressive decoding time which usually involves tens or hundreds of forward passes through the decoder.