Handle large context lengths in the native LLM runner

In the v0.6.0 version of the ExecuTorch runtime, a new flag `kMaxContextLen` was introduced to the LLM runner. You can see it [here](https://github.com/pytorch/executorch/blob/a4be2e4e26284306e03c4b716317eb0fdf8eac09/examples/models/llama/runner/runner.cpp#L49) This flag specifies the length of the context window, so basically the whole context passed to the `generate` function along with the user prompt. This parameter defaults to 128, meaning that if the conversation exceeds 128 tokens, the app will most likely crash. The value of this parameter is overriden if the exported model does not specify that parameter in its metadata. However, we shouldn't rely on that and the default parameter is pretty low. The scope of this issue is to handle it, so the context window is larger and eventually the error is gracefully handled.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle large context lengths in the native LLM runner #201

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handle large context lengths in the native LLM runner #201

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions