You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that the message "Recalculating context" in the chat (or "LLaMA: reached the end of the context window so resizing" during API calls) appears after 2k tokens, regardless of the model used.
When that happens, the models indeed forget the content that preceded the current context window. So e.g. they are unable to answer multiple questions about a chunk of text, as questions and answers add to the conversation and eventually push this chunk outside of the context window. From that point on, the answers become 100% hallucinations.
There are many models now that advertise large context window sizes - Yi LLMs in particular. However, from what I've tried, none of that seems to work in GPT4All: "Recalculating context" always appears at the 2k mark. Is it really supposed to be this way?
(Note this is not the same issue as #1638. I'd argue it is actually worse. Prompt size limitations could be worked around by chunking the input. Unfortunately, since the context window size is the same as max prompt size, chunking the input doesn't help at all.)
The text was updated successfully, but these errors were encountered:
The 2k token context size was hard coded independently of the used model until recently. Now it is fixed, see d1c56b8 and the bug reports #1749 and #1668. But that fix is not yet in the current release (there was not new release after the fix).
It seems that the message "Recalculating context" in the chat (or "LLaMA: reached the end of the context window so resizing" during API calls) appears after 2k tokens, regardless of the model used.
When that happens, the models indeed forget the content that preceded the current context window. So e.g. they are unable to answer multiple questions about a chunk of text, as questions and answers add to the conversation and eventually push this chunk outside of the context window. From that point on, the answers become 100% hallucinations.
There are many models now that advertise large context window sizes - Yi LLMs in particular. However, from what I've tried, none of that seems to work in GPT4All: "Recalculating context" always appears at the 2k mark. Is it really supposed to be this way?
(Note this is not the same issue as #1638. I'd argue it is actually worse. Prompt size limitations could be worked around by chunking the input. Unfortunately, since the context window size is the same as max prompt size, chunking the input doesn't help at all.)
The text was updated successfully, but these errors were encountered: