expose n_gpu_layers parameter of llama.cpp #1890

cebtenzzre · 2024-01-30T18:22:08Z

This is the minimal implementation of configurable per-model partial offloading. It is up to the user to know/figure out how many layers the model has, and how many they can to load into VRAM without running out.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Leaving ChatLLM instances around at exit time means global destructors start running while m_llmThread instances are still running llama.cpp code. Explicitly destroy these before exit to prevent a heap-use-after-free. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Also dynamically limit the GPU layers and context length fields to the maximum supported by the model. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre requested a review from manyoso January 30, 2024 18:22

expose n_gpu_layers parameter of llama.cpp

3189617

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre force-pushed the cfg-gpu-layers branch from 823840d to 3189617 Compare January 30, 2024 18:44

manyoso approved these changes Jan 30, 2024

View reviewed changes

cebtenzzre requested a review from manyoso January 31, 2024 19:11

manyoso approved these changes Jan 31, 2024

View reviewed changes

cebtenzzre added 7 commits January 31, 2024 14:14

llamamodel : clean up initializeGPUDevice

84e5a3e

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

modellist : add missing value update code

1d7025a

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

backend: fix ggml_vk_device leaks

56918bf

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

backend: fix LLamaPrivate and constructDefaultLLama() leaks

78730a7

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

expose maxGpuLayers and maxContextLength to QML

291a501

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

chat: impose limits on gpu layers and context length

c9b969e

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre force-pushed the cfg-gpu-layers branch from 1327623 to c9b969e Compare January 31, 2024 19:15

spelling fixes

45da8eb

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre merged commit 061d196 into main Jan 31, 2024
6 of 17 checks passed

cebtenzzre mentioned this pull request Jan 31, 2024

Maxwell/Pascal GPU support and crash fix #1895

Merged

cebtenzzre mentioned this pull request Mar 6, 2024

chat: join ChatLLM threads without calling destructors #2043

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expose n_gpu_layers parameter of llama.cpp #1890

expose n_gpu_layers parameter of llama.cpp #1890

cebtenzzre commented Jan 30, 2024

expose n_gpu_layers parameter of llama.cpp #1890

expose n_gpu_layers parameter of llama.cpp #1890

Conversation

cebtenzzre commented Jan 30, 2024