In the v0.6.0 version of the ExecuTorch runtime, a new flag kMaxContextLen was introduced to the LLM runner. You can see it here This flag specifies the length of the context window, so basically the whole context passed to the generate function along with the user prompt. This parameter defaults to 128, meaning that if the conversation exceeds 128 tokens, the app will most likely crash. The value of this parameter is overriden if the exported model does not specify that parameter in its metadata. However, we shouldn't rely on that and the default parameter is pretty low. The scope of this issue is to handle it, so the context window is larger and eventually the error is gracefully handled.