Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of GPU memory when creating multiple sessions #94

Closed
vuongvmu opened this issue Aug 9, 2023 · 5 comments
Closed

Out of GPU memory when creating multiple sessions #94

vuongvmu opened this issue Aug 9, 2023 · 5 comments

Comments

@vuongvmu
Copy link

vuongvmu commented Aug 9, 2023

I have tested the ability to recover memory when we create a new session and close them again.
Looks like the GPU memory is not freed.
With my GPU 3080 -8G VRAM -->I created and closed the session 4 times then the memory was thrown.

I see in the source code: llama.cpp/ggml-cuda.cu

Can we release the GPU memory?

void ggml_cuda_free_data(struct ggml_tensor * tensor) {
    if (!tensor || (tensor->backend != GGML_BACKEND_GPU && tensor->backend != GGML_BACKEND_GPU_SPLIT) ) {
        return;
    }
    ggml_tensor_extra_gpu * extra = (ggml_tensor_extra_gpu *) tensor->extra;
    for (int id = 0; id < g_device_count; ++id) {
        if (extra->data_device[id] != nullptr) {
            CUDA_CHECK(cudaSetDevice(id));
            CUDA_CHECK(cudaFree(extra->data_device[id]));
        }
        if (extra->events[id] != nullptr) {
            CUDA_CHECK(cudaSetDevice(id));
            CUDA_CHECK(cudaEventDestroy(extra->events[id]));
        }
    }
    delete extra;
}

Please refer to the error information below.
Hope you have a new update.Thank!

llama.cpp: loading model from E:\VR\SW\models\llama2_7b_chat_uncensored.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_head_kv  = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 3070) as main device
llama_model_load_internal: mem required  =  372.42 MB (+  256.00 MB per state)
llama_model_load_internal: not allocating a VRAM scratch buffer due to low VRAM option
llama_model_load_internal: offloading 32 repeating layers to GPU
llama_model_load_internal: offloading non-repeating layers to GPU
llama_model_load_internal: cannot offload v cache to GPU due to low VRAM option
llama_model_load_internal: cannot offload k cache to GPU due to low VRAM option
llama_model_load_internal: offloaded 33/35 layers to GPU
llama_model_load_internal: total VRAM used: 3546 MB
CUDA error 2 at D:/development/llama/forks/llama.cpp/ggml-cuda.cu:5510: out of memory

E:\SW\LLamaSharp-master_0808_vr_build\LLamaSharp-master\LLama.Web\bin\Debug\net7.0\LLama.Web.exe (process 52536) exited with code 1.
To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops.
Press any key to close this window . . .
@vuongvmu
Copy link
Author

vuongvmu commented Aug 9, 2023

The problem is similar to my problem. Can we optimize it?

ggerganov/llama.cpp#1456

@martindevans
Copy link
Member

It sounds like this may be an issue in the upstream llama.cpp library. Do you have this same problem if you use that directly?

@vuongvmu vuongvmu reopened this Aug 10, 2023
@vuongvmu
Copy link
Author

It also get the same error when I repeat session creation and closing on winform app.

@martindevans
Copy link
Member

martindevans commented Nov 2, 2023

Is this still an issue with newer versions of LLamaSharp/llama.cpp?

@martindevans
Copy link
Member

Since there's been no response for a while I'll close this, it should be fixed in newer versions. If it's still a problem please feel free to re-open the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants