-
Notifications
You must be signed in to change notification settings - Fork 7.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow custom prompt limit (n_ctx=2048) #668
Comments
This is more of a limit on the model's context limit. It's only trained with a context window of 2048 so exceeding that isn't really possible at the moment with the existing models |
The Mosaic models have a much bigger context window, even their base models are build to exceed smaller context windows: https://www.mosaicml.com/blog/mpt-7b |
Intersting. Have you been able to use one of those models with the GPT4ALL library? |
that's correct, Mosaic models have a context length up to 4096 for the models that have ported to GPT4All. However, GPT-J models are still limited by the 2048 prompt length so using more tokens will not work well. |
I used the llm = GPT4All(model='../models/ggml-mpt-7b-chat.bin',
verbose=False,
temp=0,
top_p=0.95,
top_k=40,
repeat_penalty=1.1,
n_ctx=4096,
callback_manager=stream_manager) Error log: Found model file.
mpt_model_load: loading model from '../models/ggml-mpt-7b-chat.bin' - please wait ...
mpt_model_load: n_vocab = 50432
mpt_model_load: n_ctx = 2048
mpt_model_load: n_embd = 4096
mpt_model_load: n_head = 32
mpt_model_load: n_layer = 32
mpt_model_load: alibi_bias_max = 8.000000
mpt_model_load: clip_qkv = 0.000000
mpt_model_load: ftype = 2
mpt_model_load: ggml ctx size = 5653.09 MB
mpt_model_load: kv self size = 1024.00 MB
mpt_model_load: ........................ done
mpt_model_load: model size = 4629.02 MB / num tensors = 194
INFO: connection open
ERROR: The prompt size exceeds the context window size and cannot be processed.GPT-J ERROR: The prompt is2115tokens and the context window is2048! |
yes, me too! |
It would be great to have I've been playing around with ggml a bit, trying to implement a growing buffer on the fly, and this is reeeally slow. ggml uses pointers instead of offsets under the hood which means I cannot just realloc and memcpy memory buffers (KV cache) for the model. |
It's in the settings by now! |
Nevermind me! |
Seems like this issue will be fixed by #1668 |
Feature request
Currently there is a limitation on the number of characters that can be used in the prompt
The error is produced in GPTJ::prompt(). Here, it looks like the prompt n_ctx that arrives from the frontend is not used, but instead the value comes from the model itself... As such, setting the value yourself won't really matter. (see more)
gpt4all/gpt4all-backend/gptj.cpp
Line 920 in 8204c2e
Motivation
Being able to customise the prompt input limit could allow developers to build more complete plugins to interact with the model, using a more useful context and longer conversation history.
For example, right now, it is almost imposible to build a plugin to browse the web as you can't use a page content (html) as part of the context because it can easily excede the input limit.
Your contribution
.
The text was updated successfully, but these errors were encountered: