-
Notifications
You must be signed in to change notification settings - Fork 684
Closed
Description
🚀 The feature, motivation and pitch
Instead of exposing input_pos
in generate_from_pos()
API, we should redesign the API to hide the input_pos
argument as an internal state.
We should support these features:
- generate with an input prompt -> uses the current context, creates the response adds it to context, and adjusts start position of KV caching internally
- Add context - used to hydrate KV cache for loading historical chat, adjusts start position internally when generate is called after it
- clear context - remove prefilled tokens and reset start position
To be more specific,
- Add a private field
pos_
and manage it in all APIs. - Keep the
generate()
API, but instead of assuming a start pos of 0, use thepos_
field. - Add
prefill()
API to be able to take chat history. - Add
reset()
API to resetpos_
to 0.
Alternatives
No response
Additional context
No response
RFC (Optional)
No response
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done