Skip to content

Manage the input_pos internally in text llm runner #12887

@larryliu0820

Description

@larryliu0820

🚀 The feature, motivation and pitch

Instead of exposing input_pos in generate_from_pos() API, we should redesign the API to hide the input_pos argument as an internal state.

We should support these features:

  1. generate with an input prompt -> uses the current context, creates the response adds it to context, and adjusts start position of KV caching internally
  2. Add context - used to hydrate KV cache for loading historical chat, adjusts start position internally when generate is called after it
  3. clear context - remove prefilled tokens and reset start position

To be more specific,

  • Add a private field pos_ and manage it in all APIs.
  • Keep the generate() API, but instead of assuming a start pos of 0, use the pos_ field.
  • Add prefill() API to be able to take chat history.
  • Add reset() API to reset pos_ to 0.

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions