[Feature] Add a callback for generation complete in streaming mode or some other signal like is_final_token #2451

JohnDuncanScott · 2024-06-17T17:47:39Z

Feature Request

The generate method in Python allows a callback to be provided so you can control the tokens being produced. However, there is no signal that the generation has completed. There should be some signal received, either a separate callback or perhaps a boolean in the callback like is_final_token.

The workarounds is for whatever is streaming the tokens to then signal it's received everything. However, this muddies the layers.

So for example:
Component A receives streaming input and displays it on screen
Component B is responsible for creating and managing the model and is what you call generate on
With the current API, Component B that owns the model itself doesn't know when the model is done. There are no methods on the model to ask is_generating or any param in the callback or another callback to signal it's done. This makes it difficult for Component B to do clean up or perform logic that should happen after generation is completed.

Hope that makes sense.

This should hopefully be really simple to implement but would make user logic much cleaner.

JohnDuncanScott added the enhancement New feature or request label Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add a callback for generation complete in streaming mode or some other signal like is_final_token #2451

[Feature] Add a callback for generation complete in streaming mode or some other signal like is_final_token #2451

JohnDuncanScott commented Jun 17, 2024

[Feature] Add a callback for generation complete in streaming mode or some other signal like is_final_token #2451

[Feature] Add a callback for generation complete in streaming mode or some other signal like is_final_token #2451

Comments

JohnDuncanScott commented Jun 17, 2024

Feature Request