Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom prompt limit (n_ctx=2048) #668

Open
jeffochoa opened this issue May 21, 2023 · 12 comments
Open

Allow custom prompt limit (n_ctx=2048) #668

jeffochoa opened this issue May 21, 2023 · 12 comments
Labels
backend gpt4all-backend issues chat gpt4all-chat issues enhancement New feature or request

Comments

@jeffochoa
Copy link

Feature request

Currently there is a limitation on the number of characters that can be used in the prompt

GPT-J ERROR: The prompt is 9884 tokens and the context window is 2048!

The error is produced in GPTJ::prompt(). Here, it looks like the prompt n_ctx that arrives from the frontend is not used, but instead the value comes from the model itself... As such, setting the value yourself won't really matter. (see more)

promptCtx.n_ctx = d_ptr->model->hparams.n_ctx;

Motivation

Being able to customise the prompt input limit could allow developers to build more complete plugins to interact with the model, using a more useful context and longer conversation history.

For example, right now, it is almost imposible to build a plugin to browse the web as you can't use a page content (html) as part of the context because it can easily excede the input limit.

Your contribution

.

@zanussbaum
Copy link
Collaborator

This is more of a limit on the model's context limit. It's only trained with a context window of 2048 so exceeding that isn't really possible at the moment with the existing models

@menelic
Copy link

menelic commented May 22, 2023

The Mosaic models have a much bigger context window, even their base models are build to exceed smaller context windows: https://www.mosaicml.com/blog/mpt-7b

@jeffochoa
Copy link
Author

The Mosaic models have a much bigger context window, even their base models are build to exceed smaller context windows: https://www.mosaicml.com/blog/mpt-7b

Intersting. Have you been able to use one of those models with the GPT4ALL library?

@zanussbaum
Copy link
Collaborator

that's correct, Mosaic models have a context length up to 4096 for the models that have ported to GPT4All. However, GPT-J models are still limited by the 2048 prompt length so using more tokens will not work well.

@jpzhangvincent
Copy link

I used the mp7-7b-chat model and specified the n_ctx=4096 but still got the error -

llm = GPT4All(model='../models/ggml-mpt-7b-chat.bin',
                            verbose=False,
                            temp=0,
                            top_p=0.95,
                            top_k=40,
                            repeat_penalty=1.1,
                            n_ctx=4096,
                            callback_manager=stream_manager)

Error log:

Found model file.
mpt_model_load: loading model from '../models/ggml-mpt-7b-chat.bin' - please wait ...
mpt_model_load: n_vocab        = 50432
mpt_model_load: n_ctx          = 2048
mpt_model_load: n_embd         = 4096
mpt_model_load: n_head         = 32
mpt_model_load: n_layer        = 32
mpt_model_load: alibi_bias_max = 8.000000
mpt_model_load: clip_qkv       = 0.000000
mpt_model_load: ftype          = 2
mpt_model_load: ggml ctx size = 5653.09 MB
mpt_model_load: kv self size  = 1024.00 MB
mpt_model_load: ........................ done
mpt_model_load: model size =  4629.02 MB / num tensors = 194
INFO:     connection open
ERROR: The prompt size exceeds the context window size and cannot be processed.GPT-J ERROR: The prompt is2115tokens and the context window is2048!

@crixue
Copy link

crixue commented May 31, 2023

I used the mp7-7b-chat model and specified the n_ctx=4096 but still got the error -

llm = GPT4All(model='../models/ggml-mpt-7b-chat.bin',
                            verbose=False,
                            temp=0,
                            top_p=0.95,
                            top_k=40,
                            repeat_penalty=1.1,
                            n_ctx=4096,
                            callback_manager=stream_manager)

Error log:

Found model file.
mpt_model_load: loading model from '../models/ggml-mpt-7b-chat.bin' - please wait ...
mpt_model_load: n_vocab        = 50432
mpt_model_load: n_ctx          = 2048
mpt_model_load: n_embd         = 4096
mpt_model_load: n_head         = 32
mpt_model_load: n_layer        = 32
mpt_model_load: alibi_bias_max = 8.000000
mpt_model_load: clip_qkv       = 0.000000
mpt_model_load: ftype          = 2
mpt_model_load: ggml ctx size = 5653.09 MB
mpt_model_load: kv self size  = 1024.00 MB
mpt_model_load: ........................ done
mpt_model_load: model size =  4629.02 MB / num tensors = 194
INFO:     connection open
ERROR: The prompt size exceeds the context window size and cannot be processed.GPT-J ERROR: The prompt is2115tokens and the context window is2048!

yes, me too!

@Chae4ek
Copy link

Chae4ek commented Jul 3, 2023

It would be great to have n_ctx in the model constructor, not in the generate method though.

I've been playing around with ggml a bit, trying to implement a growing buffer on the fly, and this is reeeally slow. ggml uses pointers instead of offsets under the hood which means I cannot just realloc and memcpy memory buffers (KV cache) for the model.

@niansa
Copy link
Collaborator

niansa commented Aug 11, 2023

It's in the settings by now!

@niansa niansa closed this as completed Aug 11, 2023
@niansa
Copy link
Collaborator

niansa commented Aug 11, 2023

Nevermind me!

@niansa niansa reopened this Aug 11, 2023
@niansa niansa added enhancement New feature or request backend gpt4all-backend issues chat gpt4all-chat issues labels Aug 11, 2023
@ThiloteE
Copy link
Collaborator

ThiloteE commented Oct 31, 2023

Results of today 2023-10-31 (building from source) for setting different values for n_ctx in file "llamamodel.cpp". According to some people in discord, this allows a higher context window size. I use Windows 10 with 32 GB RAM and models loading with CPU.

My prompt, which in total consisted of 5652 characters, was an instruction to summarize a long text.

original value: 2048
new value: 16384
model that was trained for/with 16K context: Response loads endlessly long. I force closed the programm. 👎

original value: 2048
new value: 32768
model that was trained for/with 32K context: Response loads endlessly long. I force closed programm. 👎

original value: 2048
new value: 8192
model that was trained for/with 16K context: Response loads very long, but eventually finishes loading after a few minutes and gives reasonable output 👍

original value: 2048
new value: 8192
model (Mistral Instruct) that was trained for/with 4096 context: Response loads very long, but eventually finishes loading after a few minutes and gives reasonable output 👍

original value: 2048
new value: 8192
Another model that was presumably trained for/with 4096 context: Response loads endlessly long. I force closed the programm 👎
Here a typical graph from while the model was generating the response. Notice how RAM gets emptied and then filled a little later?
image

I am still experimenting, but I believe so far, success depends on

  1. the model you use.
  2. values for n_ctx
  3. one or multiple other factors, because clearly something gets stuck or is very inefficient.

I would not recommend setting n_ctx to higher values and releasing this new version of gpt4all to the public without extensive testing.

Edit: I have opted to set context size to 4096 by default, because most models I use are designed for that. E.g. the mistral models mostly need 4096 and then use advanced techniques to extend that via sliding window or rope scaling, which I get the feeling works without having to set n_ctx, but I have not done extensive testing on that. If somebody does, might be nice if you could post your finding here.

@ThiloteE
Copy link
Collaborator

Seems like this issue will be fixed by #1668

@cebtenzzre
Copy link
Member

Seems like this issue will be fixed by #1668

The OP refers to GPT-J which is the only model that will not be fixed by #1749.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend gpt4all-backend issues chat gpt4all-chat issues enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants