Out of GPU memory when creating multiple sessions #94

vuongvmu · 2023-08-09T09:42:47Z

I have tested the ability to recover memory when we create a new session and close them again.
Looks like the GPU memory is not freed.
With my GPU 3080 -8G VRAM -->I created and closed the session 4 times then the memory was thrown.

I see in the source code: llama.cpp/ggml-cuda.cu

Can we release the GPU memory?

void ggml_cuda_free_data(struct ggml_tensor * tensor) {
    if (!tensor || (tensor->backend != GGML_BACKEND_GPU && tensor->backend != GGML_BACKEND_GPU_SPLIT) ) {
        return;
    }
    ggml_tensor_extra_gpu * extra = (ggml_tensor_extra_gpu *) tensor->extra;
    for (int id = 0; id < g_device_count; ++id) {
        if (extra->data_device[id] != nullptr) {
            CUDA_CHECK(cudaSetDevice(id));
            CUDA_CHECK(cudaFree(extra->data_device[id]));
        }
        if (extra->events[id] != nullptr) {
            CUDA_CHECK(cudaSetDevice(id));
            CUDA_CHECK(cudaEventDestroy(extra->events[id]));
        }
    }
    delete extra;
}

Please refer to the error information below.
Hope you have a new update.Thank!

llama.cpp: loading model from E:\VR\SW\models\llama2_7b_chat_uncensored.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_head_kv  = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 3070) as main device
llama_model_load_internal: mem required  =  372.42 MB (+  256.00 MB per state)
llama_model_load_internal: not allocating a VRAM scratch buffer due to low VRAM option
llama_model_load_internal: offloading 32 repeating layers to GPU
llama_model_load_internal: offloading non-repeating layers to GPU
llama_model_load_internal: cannot offload v cache to GPU due to low VRAM option
llama_model_load_internal: cannot offload k cache to GPU due to low VRAM option
llama_model_load_internal: offloaded 33/35 layers to GPU
llama_model_load_internal: total VRAM used: 3546 MB
CUDA error 2 at D:/development/llama/forks/llama.cpp/ggml-cuda.cu:5510: out of memory

E:\SW\LLamaSharp-master_0808_vr_build\LLamaSharp-master\LLama.Web\bin\Debug\net7.0\LLama.Web.exe (process 52536) exited with code 1.
To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops.
Press any key to close this window . . .

The text was updated successfully, but these errors were encountered:

vuongvmu · 2023-08-09T10:17:02Z

The problem is similar to my problem. Can we optimize it?

ggerganov/llama.cpp#1456

martindevans · 2023-08-09T15:57:35Z

It sounds like this may be an issue in the upstream llama.cpp library. Do you have this same problem if you use that directly?

vuongvmu · 2023-08-10T01:39:42Z

It also get the same error when I repeat session creation and closing on winform app.

martindevans · 2023-11-02T14:03:41Z

Is this still an issue with newer versions of LLamaSharp/llama.cpp?

martindevans · 2023-12-02T14:28:02Z

Since there's been no response for a while I'll close this, it should be fixed in newer versions. If it's still a problem please feel free to re-open the issue!

vuongvmu closed this as completed Aug 10, 2023

vuongvmu reopened this Aug 10, 2023

martindevans closed this as completed Dec 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of GPU memory when creating multiple sessions #94

Out of GPU memory when creating multiple sessions #94

vuongvmu commented Aug 9, 2023

vuongvmu commented Aug 9, 2023

martindevans commented Aug 9, 2023

vuongvmu commented Aug 10, 2023

martindevans commented Nov 2, 2023 •

edited

Loading

martindevans commented Dec 2, 2023

Out of GPU memory when creating multiple sessions #94

Out of GPU memory when creating multiple sessions #94

Comments

vuongvmu commented Aug 9, 2023

vuongvmu commented Aug 9, 2023

martindevans commented Aug 9, 2023

vuongvmu commented Aug 10, 2023

martindevans commented Nov 2, 2023 • edited Loading

martindevans commented Dec 2, 2023

martindevans commented Nov 2, 2023 •

edited

Loading