-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Can not release memory in gpu. #14957
Comments
I can see similar behavior [only related to CUDA]. After releasing the model it leaves some memory allocated in the RAM. Although, reloading the model multiple times does not allocate memory again (seems that it is reusing the same allocated memory). Note that the same model's memory loaded for CPU inference is managed nicely. Still, it would be nice to know if it is possible to do the full cleanup in CUDA inference. I am using C++ API |
Seems to be fixed here: #15040 |
There are still some GPU memory(VRAM) not released after "g_ort->ReleaseSession(session)". It will not be released until the program exits. Do you know how to fully release GPU memory in the |
How do you solve this problem? hope your reply |
Describe the issue
How to release entire gpu memory in onnxruntime session create.
I try to release the memory by the bellow two codes are same performance.
Setting three breakpoints at const wchar_t* model_path, g_ort->ReleaseSession(session); and return 0;.
It's seem to be no clear fully.
at const wchar_t* model_path
![image](https://user-images.githubusercontent.com/48681566/223627962-07925fde-4ed2-4d85-aa3b-7b0b039a6dff.png)
at g_ort->ReleaseSession(session);
![image](https://user-images.githubusercontent.com/48681566/223628192-c1435c01-e61e-4e14-aaef-418ee9de2912.png)
at return 0;
![image](https://user-images.githubusercontent.com/48681566/223628274-cb81f932-27bf-45cb-bafd-989f5efd5ff5.png)
To reproduce
inference by bellow code.
or using g_ort->SessionOptionsAppendExecutionProvider_CUDA_V2(session_options, cuda_options);
Urgency
No response
Platform
Windows
OS Version
Windows10
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.12.0 and 1.14.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: