Allow custom prompt limit (n_ctx=2048) #668

jeffochoa · 2023-05-21T19:52:24Z

Feature request

Currently there is a limitation on the number of characters that can be used in the prompt

GPT-J ERROR: The prompt is 9884 tokens and the context window is 2048!

The error is produced in GPTJ::prompt(). Here, it looks like the prompt n_ctx that arrives from the frontend is not used, but instead the value comes from the model itself... As such, setting the value yourself won't really matter. (see more)

gpt4all/gpt4all-backend/gptj.cpp

Line 920 in 8204c2e

promptCtx.n_ctx = d_ptr->model->hparams.n_ctx;

Motivation

Being able to customise the prompt input limit could allow developers to build more complete plugins to interact with the model, using a more useful context and longer conversation history.

For example, right now, it is almost imposible to build a plugin to browse the web as you can't use a page content (html) as part of the context because it can easily excede the input limit.

Your contribution

.

The text was updated successfully, but these errors were encountered:

zanussbaum · 2023-05-22T00:39:49Z

This is more of a limit on the model's context limit. It's only trained with a context window of 2048 so exceeding that isn't really possible at the moment with the existing models

menelic · 2023-05-22T09:08:50Z

The Mosaic models have a much bigger context window, even their base models are build to exceed smaller context windows: https://www.mosaicml.com/blog/mpt-7b

jeffochoa · 2023-05-22T13:08:15Z

The Mosaic models have a much bigger context window, even their base models are build to exceed smaller context windows: https://www.mosaicml.com/blog/mpt-7b

Intersting. Have you been able to use one of those models with the GPT4ALL library?

zanussbaum · 2023-05-22T13:46:26Z

that's correct, Mosaic models have a context length up to 4096 for the models that have ported to GPT4All. However, GPT-J models are still limited by the 2048 prompt length so using more tokens will not work well.

jpzhangvincent · 2023-05-29T02:17:14Z

I used the mp7-7b-chat model and specified the n_ctx=4096 but still got the error -

llm = GPT4All(model='../models/ggml-mpt-7b-chat.bin',
                            verbose=False,
                            temp=0,
                            top_p=0.95,
                            top_k=40,
                            repeat_penalty=1.1,
                            n_ctx=4096,
                            callback_manager=stream_manager)

Error log:

Found model file.
mpt_model_load: loading model from '../models/ggml-mpt-7b-chat.bin' - please wait ...
mpt_model_load: n_vocab        = 50432
mpt_model_load: n_ctx          = 2048
mpt_model_load: n_embd         = 4096
mpt_model_load: n_head         = 32
mpt_model_load: n_layer        = 32
mpt_model_load: alibi_bias_max = 8.000000
mpt_model_load: clip_qkv       = 0.000000
mpt_model_load: ftype          = 2
mpt_model_load: ggml ctx size = 5653.09 MB
mpt_model_load: kv self size  = 1024.00 MB
mpt_model_load: ........................ done
mpt_model_load: model size =  4629.02 MB / num tensors = 194
INFO:     connection open
ERROR: The prompt size exceeds the context window size and cannot be processed.GPT-J ERROR: The prompt is2115tokens and the context window is2048!

crixue · 2023-05-31T03:25:51Z

I used the mp7-7b-chat model and specified the n_ctx=4096 but still got the error -

llm = GPT4All(model='../models/ggml-mpt-7b-chat.bin',
                            verbose=False,
                            temp=0,
                            top_p=0.95,
                            top_k=40,
                            repeat_penalty=1.1,
                            n_ctx=4096,
                            callback_manager=stream_manager)

Error log:

Found model file.
mpt_model_load: loading model from '../models/ggml-mpt-7b-chat.bin' - please wait ...
mpt_model_load: n_vocab        = 50432
mpt_model_load: n_ctx          = 2048
mpt_model_load: n_embd         = 4096
mpt_model_load: n_head         = 32
mpt_model_load: n_layer        = 32
mpt_model_load: alibi_bias_max = 8.000000
mpt_model_load: clip_qkv       = 0.000000
mpt_model_load: ftype          = 2
mpt_model_load: ggml ctx size = 5653.09 MB
mpt_model_load: kv self size  = 1024.00 MB
mpt_model_load: ........................ done
mpt_model_load: model size =  4629.02 MB / num tensors = 194
INFO:     connection open
ERROR: The prompt size exceeds the context window size and cannot be processed.GPT-J ERROR: The prompt is2115tokens and the context window is2048!

yes, me too!

Chae4ek · 2023-07-03T21:51:10Z

It would be great to have n_ctx in the model constructor, not in the generate method though.

I've been playing around with ggml a bit, trying to implement a growing buffer on the fly, and this is reeeally slow. ggml uses pointers instead of offsets under the hood which means I cannot just realloc and memcpy memory buffers (KV cache) for the model.

niansa · 2023-08-11T13:47:06Z

It's in the settings by now!

niansa · 2023-08-11T14:51:41Z

Nevermind me!

ThiloteE · 2023-10-31T00:19:11Z

Results of today 2023-10-31 (building from source) for setting different values for n_ctx in file "llamamodel.cpp". According to some people in discord, this allows a higher context window size. I use Windows 10 with 32 GB RAM and models loading with CPU.

My prompt, which in total consisted of 5652 characters, was an instruction to summarize a long text.

original value: 2048
new value: 16384
model that was trained for/with 16K context: Response loads endlessly long. I force closed the programm. 👎

original value: 2048
new value: 32768
model that was trained for/with 32K context: Response loads endlessly long. I force closed programm. 👎

original value: 2048
new value: 8192
model that was trained for/with 16K context: Response loads very long, but eventually finishes loading after a few minutes and gives reasonable output 👍

original value: 2048
new value: 8192
model (Mistral Instruct) that was trained for/with 4096 context: Response loads very long, but eventually finishes loading after a few minutes and gives reasonable output 👍

original value: 2048
new value: 8192
Another model that was presumably trained for/with 4096 context: Response loads endlessly long. I force closed the programm 👎
Here a typical graph from while the model was generating the response. Notice how RAM gets emptied and then filled a little later?

I am still experimenting, but I believe so far, success depends on

the model you use.
values for n_ctx
one or multiple other factors, because clearly something gets stuck or is very inefficient.

I would not recommend setting n_ctx to higher values and releasing this new version of gpt4all to the public without extensive testing.

Edit: I have opted to set context size to 4096 by default, because most models I use are designed for that. E.g. the mistral models mostly need 4096 and then use advanced techniques to extend that via sliding window or rope scaling, which I get the feeling works without having to set n_ctx, but I have not done extensive testing on that. If somebody does, might be nice if you could post your finding here.

ThiloteE · 2023-11-28T10:42:11Z

Seems like this issue will be fixed by #1668

cebtenzzre · 2023-12-14T17:55:17Z

Seems like this issue will be fixed by #1668

The OP refers to GPT-J which is the only model that will not be fixed by #1749.

Chae4ek mentioned this issue Jul 3, 2023

Fix GPT4All bug w/ "n_ctx" param langchain-ai/langchain#7093

Merged

owenpmckenna mentioned this issue Jul 24, 2023

GPT-J ERROR The prompt is4233tokens and the context window is2048! zylon-ai/private-gpt#663

Closed

niansa closed this as completed Aug 11, 2023

niansa reopened this Aug 11, 2023

niansa added enhancement New feature or request backend gpt4all-backend issues chat gpt4all-chat issues labels Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow custom prompt limit (n_ctx=2048) #668

Allow custom prompt limit (n_ctx=2048) #668

jeffochoa commented May 21, 2023

zanussbaum commented May 22, 2023

menelic commented May 22, 2023 •

edited

jeffochoa commented May 22, 2023

zanussbaum commented May 22, 2023

jpzhangvincent commented May 29, 2023

crixue commented May 31, 2023

Chae4ek commented Jul 3, 2023

niansa commented Aug 11, 2023

niansa commented Aug 11, 2023

ThiloteE commented Oct 31, 2023 •

edited

ThiloteE commented Nov 28, 2023

cebtenzzre commented Dec 14, 2023

Allow custom prompt limit (n_ctx=2048) #668

Allow custom prompt limit (n_ctx=2048) #668

Comments

jeffochoa commented May 21, 2023

Feature request

Motivation

Your contribution

zanussbaum commented May 22, 2023

menelic commented May 22, 2023 • edited

jeffochoa commented May 22, 2023

zanussbaum commented May 22, 2023

jpzhangvincent commented May 29, 2023

crixue commented May 31, 2023

Chae4ek commented Jul 3, 2023

niansa commented Aug 11, 2023

niansa commented Aug 11, 2023

ThiloteE commented Oct 31, 2023 • edited

ThiloteE commented Nov 28, 2023

cebtenzzre commented Dec 14, 2023

menelic commented May 22, 2023 •

edited

ThiloteE commented Oct 31, 2023 •

edited