Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose n_gpu_layers parameter of llama.cpp #1890

Merged
merged 9 commits into from
Jan 31, 2024
Merged

expose n_gpu_layers parameter of llama.cpp #1890

merged 9 commits into from
Jan 31, 2024

Conversation

cebtenzzre
Copy link
Member

This is the minimal implementation of configurable per-model partial offloading. It is up to the user to know/figure out how many layers the model has, and how many they can to load into VRAM without running out.

Screenshot 2024-01-30 at 1 21 54 PM Screenshot 2024-01-30 at 1 21 23 PM

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Leaving ChatLLM instances around at exit time means global destructors
start running while m_llmThread instances are still running llama.cpp
code. Explicitly destroy these before exit to prevent a
heap-use-after-free.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
@cebtenzzre cebtenzzre merged commit 061d196 into main Jan 31, 2024
6 of 17 checks passed
dpsalvatierra pushed a commit to dpsalvatierra/gpt4all that referenced this pull request Feb 16, 2024
Also dynamically limit the GPU layers and context length fields to the maximum supported by the model.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants