Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda : refactor to remove global resources #6170

Merged
merged 6 commits into from
Mar 20, 2024
Merged

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Mar 19, 2024

Pools and other resources are tied to the ggml_backend instance and are freed along with it.

It should also be thread-safe.

@slaren
Copy link
Collaborator Author

slaren commented Mar 19, 2024

There are now 220 221 GGML_UNUSED in ggml-cuda.cu.

@slaren
Copy link
Collaborator Author

slaren commented Mar 19, 2024

Is there a way to run the server tests with -ngl 99?

@slaren
Copy link
Collaborator Author

slaren commented Mar 19, 2024

The CUDA backend no longer uses ggml_tensor::backend. Other backends should also remove this and instead rely on the buffer type to identify the type of tensor, so that it can be removed from ggml_tensor.

@ggerganov
Copy link
Owner

Is there a way to run the server tests with -ngl 99?

If you rebase on master, should be able to run with:

cd examples/server/tests
N_GPU_LAYERS=99 LLAMA_SERVER_BIN_PATH=../../../build/bin/server ./tests.sh

@ggerganov ggerganov added the high priority Very important issue label Mar 20, 2024
@slaren
Copy link
Collaborator Author

slaren commented Mar 20, 2024

Thanks. Unfortunately I am not able to complete all the server tests on my system because intermittent connection failures to huggingface cause the tests to abort. However, the tests that ran passed.

 llama_load_model_from_url: previous model file found ggml-model-f16.gguf.etag: "ee02c4b071444fa3c751868d68fb2676"
 llama_load_model_from_url: previous model file found ggml-model-f16.gguf.lastModified: Wed, 06 Mar 2024 16:54:46 GMT
 llama_load_model_from_url: curl_easy_perform() failed: SSL connect error
 llama_init_from_gpt_params: error: failed to load model 'ggml-model-f16.gguf'

@ggerganov
Copy link
Owner

The server tests pass on V100

@slaren slaren merged commit ccf58aa into master Mar 20, 2024
56 checks passed
@slaren slaren deleted the sl/cuda-refactor-1 branch March 20, 2024 13:43
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* cuda : refactor to remove global resources
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
* cuda : refactor to remove global resources
tybalex pushed a commit to tybalex/function.cpp that referenced this pull request Apr 17, 2024
tybalex pushed a commit to tybalex/function.cpp that referenced this pull request Apr 17, 2024
* cuda : refactor to remove global resources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority Very important issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants