Skip to content

Handle large context lengths in the native LLM runner #201

@chmjkb

Description

@chmjkb

In the v0.6.0 version of the ExecuTorch runtime, a new flag kMaxContextLen was introduced to the LLM runner. You can see it here This flag specifies the length of the context window, so basically the whole context passed to the generate function along with the user prompt. This parameter defaults to 128, meaning that if the conversation exceeds 128 tokens, the app will most likely crash. The value of this parameter is overriden if the exported model does not specify that parameter in its metadata. However, we shouldn't rely on that and the default parameter is pretty low. The scope of this issue is to handle it, so the context window is larger and eventually the error is gracefully handled.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions