cuda : refactor to remove global resources #6170

slaren · 2024-03-19T21:49:42Z

Pools and other resources are tied to the ggml_backend instance and are freed along with it.

It should also be thread-safe.

slaren · 2024-03-19T21:55:22Z

There are now ~~220~~ 221 GGML_UNUSED in ggml-cuda.cu.

ggml-ci

slaren · 2024-03-19T22:17:16Z

Is there a way to run the server tests with -ngl 99?

slaren · 2024-03-19T22:26:46Z

The CUDA backend no longer uses ggml_tensor::backend. Other backends should also remove this and instead rely on the buffer type to identify the type of tensor, so that it can be removed from ggml_tensor.

ggerganov · 2024-03-20T12:16:19Z

Is there a way to run the server tests with -ngl 99?

If you rebase on master, should be able to run with:

cd examples/server/tests
N_GPU_LAYERS=99 LLAMA_SERVER_BIN_PATH=../../../build/bin/server ./tests.sh

slaren · 2024-03-20T12:31:50Z

Thanks. Unfortunately I am not able to complete all the server tests on my system because intermittent connection failures to huggingface cause the tests to abort. However, the tests that ran passed.

 llama_load_model_from_url: previous model file found ggml-model-f16.gguf.etag: "ee02c4b071444fa3c751868d68fb2676"
 llama_load_model_from_url: previous model file found ggml-model-f16.gguf.lastModified: Wed, 06 Mar 2024 16:54:46 GMT
 llama_load_model_from_url: curl_easy_perform() failed: SSL connect error
 llama_init_from_gpt_params: error: failed to load model 'ggml-model-f16.gguf'

ggerganov · 2024-03-20T12:47:27Z

The server tests pass on V100

* cuda : refactor to remove global resources

slaren force-pushed the sl/cuda-refactor-1 branch from 0c714f2 to bffb49c Compare March 19, 2024 21:54

cuda : refactor to remove global resources

9c72e1d

ggml-ci

slaren force-pushed the sl/cuda-refactor-1 branch from bffb49c to 9c72e1d Compare March 19, 2024 22:09

slaren added 3 commits March 19, 2024 23:44

fix hip

5b04360

fix leaks

7befc54

fix hip

d9feb41

slaren mentioned this pull request Mar 20, 2024

ggml : become thread-safe #3960

Closed

ggerganov added a commit that referenced this pull request Mar 20, 2024

server : allow to override -ngl in tests (#6170)

bc0baab

ggerganov added the high priority Very important issue label Mar 20, 2024

slaren added 2 commits March 20, 2024 13:22

minor

427f9af

Merge remote-tracking branch 'origin/master' into sl/cuda-refactor-1

37a30aa

ggerganov approved these changes Mar 20, 2024

View reviewed changes

slaren merged commit ccf58aa into master Mar 20, 2024
56 checks passed

slaren deleted the sl/cuda-refactor-1 branch March 20, 2024 13:43

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

server : allow to override -ngl in tests (ggerganov#6170)

5325dd7

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

cuda : refactor to remove global resources (ggerganov#6170)

cf12443

* cuda : refactor to remove global resources

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024

server : allow to override -ngl in tests (ggerganov#6170)

a508b78

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024

cuda : refactor to remove global resources (ggerganov#6170)

df23368

* cuda : refactor to remove global resources

tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024

server : allow to override -ngl in tests (ggerganov#6170)

472f128

tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024

cuda : refactor to remove global resources (ggerganov#6170)

6577fa5

* cuda : refactor to remove global resources

zsogitbe mentioned this pull request Apr 25, 2024

Experiencing 2-3 GB GPU memory use increase compared to llama.cpp version a few weeks ago #6909

Closed

zsogitbe mentioned this pull request Jun 14, 2024

Bug: CUDA error: out of memory - Phi-3 Mini 128k prompted with 20k+ tokens on 4GB GPU #7885

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda : refactor to remove global resources #6170

cuda : refactor to remove global resources #6170

slaren commented Mar 19, 2024 •

edited

Loading

slaren commented Mar 19, 2024

slaren commented Mar 19, 2024

slaren commented Mar 19, 2024

ggerganov commented Mar 20, 2024

slaren commented Mar 20, 2024

ggerganov commented Mar 20, 2024

cuda : refactor to remove global resources #6170

cuda : refactor to remove global resources #6170

Conversation

slaren commented Mar 19, 2024 • edited Loading

slaren commented Mar 19, 2024

slaren commented Mar 19, 2024

slaren commented Mar 19, 2024

ggerganov commented Mar 20, 2024

slaren commented Mar 20, 2024

ggerganov commented Mar 20, 2024

slaren commented Mar 19, 2024 •

edited

Loading