Implement configurable context length #1749

cebtenzzre · 2023-12-12T17:31:15Z

Tested working with the python bindings and the GUI. The other bindings are still hardcoded to 2048 but it shouldn't be hard to expose the context length via their APIs if desired.

For the python bindings, this is the n_ctx parameter of the GPT4All constructor:

class GPT4All:
	# ...

    def __init__(
        self,
        # ...
        n_ctx: int = 2048,
        verbose: bool = False,
    ):

In the UI, this is a per-model parameter:

This doesn't take effect until switching models or restarting. This fact is noted in the tooltip. For now, this is the simplest way to do it, although IMO it would be nice to have a way to reload the model in the future (similar TGWUI's "Reload" button on the model tab).

cebtenzzre · 2023-12-12T22:49:15Z

gpt4all-backend/llmodel.cpp

+                /* TODO(cebtenzzre): after we fix requiredMem, we should change this to happen at
+                 * load time, not construct time. right now n_ctx is incorrectly hardcoded 2048 in
+                 * most (all?) places where this is called, causing underestimation of required
+                 * memory. */


@apage43 Do you think it would be relatively easy to switch this to a load-time check instead of a construct-time one? It doesn't matter so much right now since it's not working anyway (unresolved fallout from the switch to GGUF).

The reason its construct-time is so that we do the fallback to cpu transparently: callers of construct passing "auto" just get the cpu implementation if the mem req is too high for Metal

if its changed to fail at load time callers will have to handle that fallback themselves - which is likely fine, but would need to be done in all the bindings

The chat UI is already doing load-time fallback for Vulkan. And this is really the only way to do it because it's the user code that decides which GPU to use, which is of course initialized after a backend/implementation is available. We should make sure the bindings are capable of this too.

I think it would make sense to only ever dlopen one build of llamamodel-mainline on Apple silicon, as there's nothing we are currently doing that the Metal build isn't capable of.

cebtenzzre commented Dec 12, 2023

View reviewed changes

cebtenzzre marked this pull request as ready for review December 13, 2023 21:21

cebtenzzre requested a review from manyoso December 13, 2023 21:21

cebtenzzre changed the title ~~WIP: configurable n_ctx~~ Implement configurable context length Dec 13, 2023

implement configurable context length

358d619

cebtenzzre force-pushed the configurable-ctx branch from 2054338 to 358d619 Compare December 13, 2023 21:23

cebtenzzre added backend gpt4all-backend issues bindings gpt4all-binding issues chat gpt4all-chat issues labels Dec 13, 2023

cebtenzzre mentioned this pull request Dec 14, 2023

Allow custom prompt limit (n_ctx=2048) #668

Open

cebtenzzre linked an issue Dec 14, 2023 that may be closed by this pull request

Stop hard coding the context size and use the correct size per model #1668

Closed

2 tasks

manyoso approved these changes Dec 16, 2023

View reviewed changes

cebtenzzre merged commit d1c56b8 into main Dec 16, 2023
5 of 16 checks passed

cebtenzzre mentioned this pull request Dec 24, 2023

Please add support for handling large inputs and producing larger outputs #1777

Closed

This was referenced Dec 29, 2023

Is context window size limited to 2k tokens, regardless of the model used? #1781

Closed

Support local models.json file for chat templates (for python bindings, and in general) #1350

Open

dpsalvatierra pushed a commit to dpsalvatierra/gpt4all that referenced this pull request Feb 16, 2024

Implement configurable context length (nomic-ai#1749)

c964e30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement configurable context length #1749

Implement configurable context length #1749

cebtenzzre commented Dec 12, 2023 •

edited

cebtenzzre Dec 12, 2023 •

edited

apage43 Dec 14, 2023

cebtenzzre Dec 14, 2023

Implement configurable context length #1749

Implement configurable context length #1749

Conversation

cebtenzzre commented Dec 12, 2023 • edited

cebtenzzre Dec 12, 2023 • edited

Choose a reason for hiding this comment

apage43 Dec 14, 2023

Choose a reason for hiding this comment

cebtenzzre Dec 14, 2023

Choose a reason for hiding this comment

cebtenzzre commented Dec 12, 2023 •

edited

cebtenzzre Dec 12, 2023 •

edited